Last updated: April 19, 2026

Application No. 18/492,234

SYSTEMS AND METHODS FOR ENCODING TEMPORAL INFORMATION FOR VIDEO INSTANCE SEGMENTATION AND OBJECT DETECTION

Non-Final OA §103§112

Filed

Oct 23, 2023

Examiner

JOSEPH, DENNIS P

Art Unit

2621

Tech Center

2600 — Communications

Assignee

Samsung Electronics Co., Ltd.

OA Round

3 (Non-Final)

Interview Optional

— +18.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 654 resolved cases, 2023–2026

Examiner Intelligence

JOSEPH, DENNIS P View full profile →

Grants 48% of resolved cases

Career Allow Rate

315 granted / 654 resolved

-13.8% vs TC avg

Strong +18% interview lift

Without

With

+18.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 3m

Avg Prosecution

56 currently pending

Career history

710

Total Applications

across all art units

Statute-Specific Performance

§101

1.0%

-39.0% vs TC avg

§103

60.3%

+20.3% vs TC avg

§102

27.9%

-12.1% vs TC avg

§112

7.9%

-32.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 654 resolved cases

Office Action

§103 §112

DETAILED ACTION
1.	This Office Action is responsive to claims filed for App. 18/492,234 on December 16, 2025. Claims 1-15 are pending.

America Invents Act
2.	The present application is being examined under the pre-AIA  first to invent provisions. 

Continued Examination Under 37 CFR 1.114
3.	A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on December 16, 2025 has been entered.

Claim Rejections - 35 USC § 112
4.	The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

5.	Claims 1-15 rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.

Claim 1 recites therein of “generating, by a template encoder, a temporal encoded second frame by merging a second frame subsequent to the first frame among the plurality of frames and the first color coded template at a predetermined ratio”.

While Applicant has noted [0044] and [0050] for support, Examiner, respectfully, does not see this level of detail having support. While there are aspects of merging frames, Examiner does see aspects of a predetermined ratio being a consideration when merging the frames. It is unclear where support is for the ratio/rate of how the frames are combined. Applicant is kindly asked to provide clarification.

Claim Rejections - 35 USC § 103
6.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
7.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
8.	Claims 1-15 are rejected under 35 U.S.C. 103 as being unpatentable over Oh et al.
( US 2023/0196817 A1 ) in view of Lee et al. ( US 2020/0250436 A1 ) and Wang et al. ( US 2020/0294240 A1 ). Please note these references was supplied in an Information Disclosure Statement. 

	Oh teaches in Claim 1:
	A method for encoding temporal information in an electronic device, the method comprising: 
identifying, by a neural network, at least one region indicative of one or more objects in a first frame by analyzing a first frame among a plurality of frames ( Figure 2, [0054] discloses a digital video 202, made up of a series of video frames 206, and focuses on an object 204 (read as a region indicative of one or more objects) in the current frame, as detailed in [0057] as well. Please note Figure 3, [0061] which details a current and preceding frame of the digital video as well ); 
outputting, by the neural network ( Figures 2 and 9, [0106] disclose a joint-based segmentation system 106 which has a segmentation neural network 914 ), a first prediction template for the first frame including the one or more objects in the first frame ( Figures 2 and 3, [0084], [0099] disclose details of the joint-based segmentation system 106 which can predict masks, key point (joints). Please note these prediction aspects/outputs as a prediction template ); 
generating, by a template generator, a color coded template for the first frame by assigning a color to each of the one or more objects included in the first prediction template ( [0105] discloses the join-based segmentation system 106 modifies the digital video by adding text, color, or other visual effects to the digital video ); but

Oh does not explicitly teach “generating, by a template encoder, a temporal encoded second frame by combining a second frame among the plurality of frames and the first color coded template”.

Initially, Oh teaches: Figure 2, [0059] discloses the joint-based segmentation system 106 modifies the digital video 202 with the additions noted above, those additions including the possibility of color changes. Furthermore, Oh teaches in [0098] of temporal masks and in general, encoding.

To emphasize, in the same field of endeavor, frame encoding, Lee teaches of a plurality of video frames, within a set of consecutive video frames, ( Lee, Figure 1, [0093] ). Notably, an estimated object mask is applied to the first video frame and is used combined with a second video frame to feed into a second encoder (read as merged at a predetermined ratio considering the estimated mask). [0094] provides additional details on the mask and the estimation process, which is part of the predetermined ratio of what is combined. As combined with Oh, who also teaches of a plurality of frames, the modified frame can be combined with an adjacent frame.

Therefore, it would have been obvious to one of ordinary skill in the art, at the effective filed date of the invention, to implement the merging of frames, as taught by Lee, with the motivation that by combining the extracted features over multiple frames, this will allow for fine-tuning neural networks and increase quality, ( Lee, [0033] ).

Oh does not explicitly teach of “supplying a temporal encoded second frame to the neural network for generating a second prediction template for the second frame”.

However, in the same field of endeavor, segmentation models, Wang teaches of a first and second prediction module 401 and 402, respectively, ( Wang, Figure 5, [0068]-[0072] ). Notably, Wang teaches in [0072] that the second prediction module 402 inputs the data from the first prediction module and mask parameters of the second-category objects can be predicted. To clarify, Oh teaches of an initial prediction aspect and Wang teaches of a second prediction template for the same frame modified by the first prediction template, i.e. the interpreted second frame of Oh. It is the same frame that is further analyzed in this situation.

Therefore, it would have been obvious to one of ordinary skill in the art, at the effective filed date of the invention, to implement the second prediction module, as taught by Wang, with the motivation that by using both prediction modules, notably the second prediction module, second-category objects can be segmented, further enhancing the process, ( Wang, [0074] ).

	Oh and Lee teach in Claim 2:
	The method of claim 1, further comprising: identifying, by the neural network, at least one region indicative of one or more objects in the temporal encoded second frame by analyzing the temporal encoded second frame; outputting, by the neural network, the second prediction template including the one or more objects in the temporal encoded second frame; generating, by the template generator, a second color coded template for the temporal encoded second frame by applying at least one color to the second prediction template; generating, by the template encoder, a temporal encoded third frame, by combining a third frame and the second color coded template; and supplying the temporal encoded third frame to the neural network. ( This claim introduces a third frame and essentially repeats the steps found in Claim 1, but between a second and third frame, as opposed to a first and second frame. Lee teaches in [0081] and [0093] of performing five or more recursions, including a specific transition from the second frame to a third frame. As combined, the combined teachings of Oh can be performed for multiple (at least three) frames )

	Oh teaches in Claim 3:
	The method of claim 1, wherein the plurality of frames is from a preview of a capturing device, and wherein the plurality of frames is represented by a red-green-blue (RGB) color model. ( Figure 3, [0067] discloses RGB frame aspects )

	As per Claim 4:
	Oh may not explicitly teach “wherein the combination of the second frame and the first color coded template has a blending fraction value of 0.1.”

Oh teaches in [0027] of modifying the digital video using the joint-based segmentation masks and [0105] discloses such modifications include color aspects, etc.

Respectfully, it is clear that the modification of the frame data needs to be handled smoothly and avoiding segmentation inaccuracies, [0003]. The specific blending fraction value is a design choice/optimization issue as one of ordinary skill in the art would realize the optimized value would result in a more accurate modification when modifying the digital video.

Therefore, it would have been obvious to one of ordinary skill in the art, at the effective filed date of the invention, to implement the proper blending fraction, with the motivation that this is a design choice/optimization issue to result in better accuracy and modification of frame data.

	Oh teaches in Claim 5:
	The method of claim 1, wherein the neural network is one of a segmentation neural network or an object detection neural network. ( [0056] discloses the joint-based segmentation system 106 includes a segmentation neural network )

	Oh teaches in Claim 6:
	The method of claim 5, wherein output of the segmentation neural network includes one or more segmentation masks of the one or more objects in the first frame. ( [0057] discloses masks for each frame o the digital video that portrays the object )

	Oh teaches in Claim 7:
	The method of claim 5, wherein output of the object detection neural network includes one or more bounding boxes of the one or more objects in the first frame. ( [0064] discloses the joint-based segmentation system 106 generate a bounding box, among other aspects for objects portrayed in the frame )

	Oh teaches in Claim 8:
	The method of claim 1, wherein the electronic device includes a smartphone or a wearable device that is equipped with a camera. ( [0054] disclose a computing device, such as a smart phone, with an activated camera to capture the digital video 202 )

	Oh teaches in Claim 9:
	The method of claim 1, wherein the neural network is configured to receive the first frame prior to analyzing the first frame. ( Figure 2 discloses the joint-based segmentation system 106 receives the digital video and its plurality of video frames and then performs analysis )

	Oh teaches in Claim 10:
	An intelligent instance segmentation method in a device, the method comprising: 
receiving, by a neural network, a first frame from among a plurality of frames ( Figure 2, [0054] discloses a digital video 202, made up of a series of video frames 206, and focuses on an object 204 (read as a region indicative of one or more objects, as detailed below) in the current frame, as detailed in [0057] as well. Please note Figure 3, [0061] which details a current and preceding frame of the digital video as well ); 
analyzing, by the neural network ( Figures 2 and 9, [0106] disclose a joint-based segmentation system 106 which has a segmentation neural network 914 ), the first frame to identify a region indicative of one or more objects in the first frame; generating, by the neural network, a template having the one or more instances in the first frame ( Figures 2 and 3, [0084], [0099] disclose details of the joint-based segmentation system 106 which can predict masks, key point (joints). Please note these prediction aspects/outputs as a prediction template ); 
assigning, by a template generator, a color to each of the one or more objects included in the first template ( [0105] discloses the join-based segmentation system 106 modifies the digital video by adding text, color, or other visual effects to the digital video ); 
receiving, by the neural network, a second frame; generating, by a template encoder, a temporal encoded second frame by merging the color coded template with the second frame ( Figure 2, [0059] discloses the joint-based segmentation system 106 modifies the digital video 202 with the additions noted above, those additions including the possibility of color changes. This modification is merging the frame with the color aspects, etc. Furthermore, Oh teaches in [0098] of temporal masks and in general, encoding ); but

Oh does not explicitly teach of receiving a second frame “subsequent to the first frame” and generating, by a template encoder, a temporal encoded second frame by merging the color coded template with the second frame “at a predetermined ratio”.

Initially, Oh teaches: Figure 2, [0059] discloses the joint-based segmentation system 106 modifies the digital video 202 with the additions noted above, those additions including the possibility of color changes. Furthermore, Oh teaches in [0098] of temporal masks and in general, encoding.

To emphasize, in the same field of endeavor, frame encoding, Lee teaches of a plurality of video frames, within a set of consecutive video frames, ( Lee, Figure 1, [0093] ). Notably, an estimated object mask is applied to the first video frame and is used combined with a second video frame to feed into a second encoder (read as merged at a predetermined ratio considering the estimated mask). [0094] provides additional details on the mask and the estimation process, which is part of the predetermined ratio of what is combined. As combined with Oh, who also teaches of a plurality of frames, the modified frame can be combined with an adjacent frame.

Therefore, it would have been obvious to one of ordinary skill in the art, at the effective filed date of the invention, to implement the merging of frames, as taught by Lee, with the motivation that by combining the extracted features over multiple frames, this will allow for fine-tuning neural networks and increase quality, ( Lee, [0033] ).

Oh does not explicitly teach “for generating a second template for segmenting the one or more objects in the temporal encoded second frame”.

However, in the same field of endeavor, segmentation models, Wang teaches of a first and second prediction module 401 and 402, respectively, ( Wang, Figure 5, [0068]-[0072] ). Notably, Wang teaches in [0072] that the second prediction module 402 inputs the data from the first prediction module and mask parameters of the second-category objects can be predicted. To clarify, Oh teaches of an initial prediction aspect and Wang teaches of a second prediction template for the same frame modified by the first prediction template, i.e. the interpreted second frame of Oh. It is the same frame that is further analyzed in this situation.

Therefore, it would have been obvious to one of ordinary skill in the art, at the effective filed date of the invention, to implement the second prediction module, as taught by Wang, with the motivation that by using both prediction modules, notably the second prediction module, second-category objects can be segmented, further enhancing the process, ( Wang, [0074] ).

	Oh teaches in Claim 11:
	An image segmentation method in a camera device ( [0054] disclose a computing device, such as a smart phone, with an activated camera to capture the digital video 202 ), the method comprising: 
receiving, by a neural network ( Figures 2 and 9, [0106] disclose a joint-based segmentation system 106 which has a segmentation neural network 914 ), an image frame including red-green-blue channels ( Figure 2, [0054] discloses a digital video 202, made up of a series of video frames 206, and focuses on an object 204 (read as a region indicative of one or more instances) in the current frame, as detailed in [0057] as well. Please note Figure 3, [0061] which details a current and preceding frame of the digital video as well. [0067] discloses the RGB aspects ); 
generating, by a template generator, a template including one or more color coded objects from the image frame ( Figures 2 and 3, [0084], [0099] disclose details of the joint-based segmentation system 106 which can predict masks, key point (joints). Please note these prediction aspects/outputs as a prediction template. [0105] discloses the join-based segmentation system 106 modifies the digital video by adding text, color, or other visual effects to the digital video. As for instances, please not details above of the object 204 ); and 
merging, by a template encoder, a template including the one or more color coded objects with the red-green-blue channels of image frames subsequent to the image frame as a preprocessed input for image segmentation in the neural network. ( Figure 2, [0059] discloses the joint-based segmentation system 106 modifies the digital video 202 with the additions noted above, those additions including the possibility of color changes )

	Oh teaches in Claim 12:
	A system for encoding temporal information, comprising: 
a capturing device including a camera ( [0054] disclose a computing device, such as a smart phone, with an activated camera to capture the digital video 202 ); 
a neural network ( Figures 2 and 9, [0106] disclose a joint-based segmentation system 106 which has a segmentation neural network 914 ), wherein the neural network is configured to: 
identify at least one region indicative of one or more objects in a first frame by analyzing the first frame among a plurality of frames from the capturing device ( Figure 2, [0054] discloses a digital video 202, made up of a series of video frames 206, and focuses on an object 204 (read as a region indicative of one or more instances) in the current frame, as detailed in [0057] as well. Please note Figure 3, [0061] which details a current and preceding frame of the digital video as well. As noted above, the camera provides the digital video ), and 
output a first prediction template for the first frame including the one or more objects in the first frame ( Figures 2 and 3, [0084], [0099] disclose details of the joint-based segmentation system 106 which can predict masks, key point (joints). Please note these prediction aspects/outputs as a prediction template ), and 
a template generator configured to generate a first color coded template for the first frame by assigning a color to each of the one or more objects included in the first prediction template ( [0105] discloses the join-based segmentation system 106 modifies the digital video by adding text, color, or other visual effects to the digital video ); and 
a template encoder configured to generate a temporal encoded second frame by merging a second frame and the first color coded template ( Figure 2, [0059] discloses the joint-based segmentation system 106 modifies the digital video 202 with the additions noted above, those additions including the possibility of color changes. Furthermore, Oh teaches in [0098] of temporal masks and in general, encoding. Also, please note the combination below as well ); but

Oh does not explicitly teach “generate a temporal encoded second frame by merging a second frame subsequent to the first frame and the first color coded template at a predetermined ratio”.

Initially, Oh teaches: Figure 2, [0059] discloses the joint-based segmentation system 106 modifies the digital video 202 with the additions noted above, those additions including the possibility of color changes. Furthermore, Oh teaches in [0098] of temporal masks and in general, encoding.

To emphasize, in the same field of endeavor, frame encoding, Lee teaches of a plurality of video frames, within a set of consecutive video frames, ( Lee, Figure 1, [0093] ). Notably, an estimated object mask is applied to the first video frame and is used combined with a second video frame to feed into a second encoder (read as merged at a predetermined ratio considering the estimated mask). [0094] provides additional details on the mask and the estimation process, which is part of the predetermined ratio of what is combined. As combined with Oh, who also teaches of a plurality of frames, the modified frame can be combined with an adjacent frame.

Therefore, it would have been obvious to one of ordinary skill in the art, at the effective filed date of the invention, to implement the merging of frames, as taught by Lee, with the motivation that by combining the extracted features over multiple frames, this will allow for fine-tuning neural networks and increase quality, ( Lee, [0033] ).

Oh may not explicitly teach to “supply the temporal encoded second frame to the neural network for generating a second prediction template for the second frame”.

However, in the same field of endeavor, segmentation models, Wang teaches of a first and second prediction module 401 and 402, respectively, ( Wang, Figure 5, [0068]-[0072] ). Notably, Wang teaches in [0072] that the second prediction module 402 inputs the data from the first prediction module and mask parameters of the second-category objects can be predicted. To clarify, Oh teaches of an initial prediction aspect and Wang teaches of a second prediction template for the same frame modified by the first prediction template, i.e. the interpreted second frame of Oh. It is the same frame that is further analyzed in this situation.

Therefore, it would have been obvious to one of ordinary skill in the art, at the effective filed date of the invention, to implement the second prediction module, as taught by Wang, with the motivation that by using both prediction modules, notably the second prediction module, second-category objects can be segmented, further enhancing the process, ( Wang, [0074] ).

	Oh teaches in Claim 13:
	The system of claim 12, wherein the neural network is configured to receive the first frame. ( [0059] discloses the joint-based segmentation system 106 receives the frame data in order to analyze and modify )

	Oh teaches in Claim 14:
	The system of claim 12, wherein the plurality of frames of the capturing device is represented by a red-green-blue (RGB) color model. ( Figure 3, [0067] discloses RGB frame aspects )

	As per Claim 15:
	Oh may not explicitly teach “wherein the merging of the second frame and the first color coded template has a blending fraction value of 0.1.”

Oh teaches in [0027] of modifying the digital video using the joint-based segmentation masks and [0105] discloses such modifications include color aspects, etc.

Respectfully, it is clear that the modification of the frame data needs to be handled smoothly and avoiding segmentation inaccuracies, [0003]. The specific blending fraction value is a design choice/optimization issue as one of ordinary skill in the art would realize the optimized value would result in a more accurate modification when modifying the digital video.

Therefore, it would have been obvious to one of ordinary skill in the art, at the effective filed date of the invention, to implement the proper blending fraction, with the motivation that this is a design choice/optimization issue to result in better accuracy and modification of frame data.

Response to Arguments
9.	Applicant’s arguments considered, but are respectfully moot in view of new grounds of rejection(s).
	Please note the updated rejection in light of the claim amendments.
	Applicant is kindly asked to provide clarification for the new claim amendments, focusing on the predetermined ratio. Examiner does not see this level of support in the specification.
	Furthermore, please note the newly cited Lee reference and as a result, Applicant’s arguments are moot at this time.

Conclusion
10.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS P JOSEPH whose telephone number is (571)270-1459. The examiner can normally be reached Monday - Friday 5:30 - 3:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amr Awad can be reached at 571-272-7764. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENNIS P JOSEPH/Primary Examiner, Art Unit 2621

Read full office action

Prosecution Timeline

Oct 23, 2023

Application Filed

Jun 01, 2025

Non-Final Rejection — §103, §112

Aug 27, 2025

Examiner Interview Summary

Aug 27, 2025

Applicant Interview (Telephonic)

Sep 02, 2025

Response Filed

Sep 11, 2025

Final Rejection — §103, §112

Nov 17, 2025

Response after Non-Final Action

Dec 16, 2025

Request for Continued Examination

Jan 14, 2026

Response after Non-Final Action

Mar 11, 2026

Non-Final Rejection — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/932,421

Patent 12592173

Pseudo Signal Generator And Display Apparatus Including the Same

2y 5m to grant Granted Mar 31, 2026

18/096,536

Patent 12579957

GAMMA CORRECTION METHOD FOR A DISPLAY DEVICE

2y 5m to grant Granted Mar 17, 2026

18/127,399

Patent 12580359

Amplifying Optical Fibers

2y 5m to grant Granted Mar 17, 2026

18/474,312

Patent 12579927

METHOD OF ALIGNING LIGHT EMITTING ELEMENT AND METHOD OF FABRICATING DISPLAY DEVICE

2y 5m to grant Granted Mar 17, 2026

18/410,893

Patent 12572227

STYLUS WITH ADJUSTABLE FEATURES

2y 5m to grant Granted Mar 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

48%

Grant Probability

67%

With Interview (+18.5%)

3y 3m

Median Time to Grant

High

PTA Risk

Based on 654 resolved cases by this examiner. Grant probability derived from career allow rate.

SYSTEMS AND METHODS FOR ENCODING TEMPORAL INFORMATION FOR VIDEO INSTANCE SEGMENTATION AND OBJECT DETECTION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email