Last updated: April 19, 2026
Application No. 18/177,897
Video System with Object Replacement and Insertion Features

Final Rejection §103
Filed
Mar 03, 2023
Examiner
LIN, JASON K
Art Unit
2425
Tech Center
2400 — Computer Networks
Assignee
Roku Inc.
OA Round
4 (Final)
Interview Optional

— +34.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 454 resolved cases, 2023–2026
Examiner Intelligence

LIN, JASON K View full profile →
Grants 49% of resolved cases
Career Allow Rate
221 granted / 454 resolved
-9.3% vs TC avg
Strong +35% interview lift
Without
With
+34.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 7m
Avg Prosecution
28 currently pending
Career history
482
Total Applications
across all art units
Statute-Specific Performance

§101
5.2%
-34.8% vs TC avg
§103
61.2%
+21.2% vs TC avg
§102
16.0%
-24.0% vs TC avg
§112
9.3%
-30.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 454 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This office action is responsive to application No. 18/177,879 filed on 12/04/2025.  Claim(s) 4-5, 8, 13, and 20-21 are canceled. Claim(s) 1-3, 6-7, 9-12, 14-19, and 22-26 is/are pending and have been examined.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1-3, 6-7, 9-12, 14-19, and 22-26 have been considered but are moot in view of the new ground(s) of rejection.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 1-3, 6, 9-11, 14, 16-19, and 22-23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Govil (US 2021/0120286), in view of Cohen et al. (US 10,613,726), and further in view of Zamiska et al. (US 8,910,201).
Consider claims 1, 10, and 19, Govil teaches a method, computing system, and a non-transitory computer-readable medium having stored thereon program instructions that upon execution by a computing system, cause/configured for performing a set of acts (Paragraph 0028-0029, 0051) comprising: 
obtaining video that depicts an object across multiple frames of the video (Paragraph 0018 teaches content ingestion module 102 that detects objects within media content 112 received from a content source 110. Paragraph 0022 teaches object detection module 120 receives the media content 112 provided as input by the content source 110 and parses or otherwise analyzes the media content 112. Object detection module 120 may analyze individual video frames and/or sequential video frames to detect or otherwise identify distinct objects that are captured or otherwise contained within the media content 112);
detecting the object within the obtained video and determining object characteristic data associated with the detected object (Paragraph 0022 teaches object detection module 120 receives the media content 112 provided as input by the content source 110 and parses or otherwise analyzes the media content 112 to detect or otherwise identify objects within the media content 112. Object detection module 120 may analyze individual video frames and/or sequential video frames to detect or otherwise identify distinct objects that are captured or otherwise contained within the media content 112. For each detected object, the object detection module 120 may record or otherwise identify the timestamp or other temporal information that characterizes when the object is present within the media content 112 (e.g., the object's temporal location) along with pixel locations or other spatial information that characterizes where the object is within the media content 112 (e.g., the object's spatial location. Paragraph 0023 teaches information from object detection module 120 is provided to object recognition module 122, which, in turn analyzes the information to identify or otherwise determine one or more attributes associated with the detected object. Detecting and identifying various characteristics of the detected object, such as, for example, an object type associated with the detected object and other visual and/or physical characteristics associated with the detected object, e.g., the size, shape, color, etc.);
determining user profile data associated with a viewer of the video (Paragraph 0039 teaches the characteristics, attributes, user preferences, and/or other user profile information associated with the viewer at the client device 108 may be utilized to identify, from among a set of potential substitute objects. Paragraph 0047 teaches substitution on a viewer-specific basis based on demographic information, user preferences, user behavior, and/or other characteristics of the viewer);
using at least the determined object characteristic data, the determined user profile data as a basis to select a replacement object from among a set of multiple candidate replacement objects (Paragraph 0039 teaches various field of metadata characterizing the original object that was detected in the media content may be provided to the object substitution module, which, in turn, queries the table of substitute object metadata 118 to identify entries for similar substitute objects based on similarities, matching, degree of differences, or other relationships between the metadata 118 associated with those substitute objects and the metadata 142 associated with the original object. Object substitution module 162 may identify potential substitute objects having the same object type or other taxonomic classification(s) or the same or similar size, shape, or other visual characteristics as the original object. Other criteria may be utilized to further limit or reduce the set of potential substitute objects to facilitate arriving at a best or optimal substitute object for the current viewing context. For example, the characteristics, attributes, user preferences, and/or other user profile information associated with the viewer at the client device 108 may be utilized to identify, from among a set of potential substitute objects, a smaller subset of potential substitute objects that are more likely to be relevant, interesting, or influential with respect to the current viewer);
replacing the detected object with the selected replacement object to generate video that is a modified version of the obtained video (Paragraph 0031 teaches object substitution module 162 then utilizes the temporal and spatial information associated with the detected object to overwrite or otherwise insert the obtained substitute object audiovisual content 116 into the appropriate location within the media content 140 in lieu of the detected object. In this regard, the object substitution module 162 effectively cuts or deletes the detected object from its corresponding temporal and spatial location within the media content 140 and intelligently replaces the original object at that corresponding temporal and spatial location with the substitute object content 116 from the data storage element 114, resulting in an augmented version of the media content to be provided to the media player 107), wherein replacing the detected object with the selected replacement object to generate video that is a modified version of the obtained video comprises applying a normalization technique to blend the selected replacement object into the video, blend the selected replacement object into the video (Paragraph 0041, 0048); and
outputting for presentation the generated video (Paragraph 0032 teaches post-processing module 164 transmitting or otherwise providing the augmented media content to the media player 107. Media player 107 receives the augmented version of the media content and then renders or otherwise presents the augmented media content at the client device 108).
Govil does not explicitly teach determining scene attribute data associated with the obtained video;
using at least the determined scene attribute data as a basis to select a replacement object;
normalization technique is a lighting normalization technique, wherein applying the lighting normalization technique to blend the selected replacement object into the content comprises determining a shape of a shadow of the selected replacement object and using the determined shape of the shadow as a basis to modify a shadow of the detected object, wherein the detected object and the selected replacement object differ in at least one object characteristic other than scale, and wherein the shadow of the detected object and the shadow of the selected replacement object differ in at least one characteristic other than scale.
In an analogous art, Cohen teaches normalization technique is a lighting normalization technique, wherein applying the lighting normalization technique to blend the selected replacement object into the content comprises determining a shape of a shadow of the selected replacement object and using the determined shape of the shadow as a basis to modify a shadow of the detected object, wherein the detected object and the selected replacement object differ in at least one object characteristic other than scale, and wherein the shadow of the detected object and the shadow of the selected replacement object differ in at least one characteristic other than scale (Col 2: lines 5-23 teaches a process in which an object may be replaced. Col 4: lines 53-57 teaches the lighting of replacement material may be matched to a section of the image by adjusting a shadow in the replacement material according to a time of day of the image. Col 7: lines 30-36 teaches content may include various combinations of assets, including videos, ads, audio, multi-media streams, animations, images, web documents, etc. Col 12: lines 15-26 teaches a replace request to “Replace the car with a blue truck”. Col 18: lines 8-19 teaches harmonizing module can adjust the lighting locally or globally in an image. Lighting is adjusted in one portion of a composite image to make it match the lighting in another portion of the composite image. Lighting may be adjusted to account for different times of day between the replacement material and an image to be edited, and thus may adjust shadows and highlights in a harmonized image to match times of day. Col 25: lines 23-30 teaches harmonizing module produces a harmonized image by removing compositing artifacts, such as mismatches in light, adjusting or removing shadows, blending replacement or fill material, performing background decontamination, and the like. As taught by Cohen, harmonizing module can either adjust the lighting locally or globally. Shadow(s) from the original object in the original content, would be adjusted by the harmonizing module, when lighting is adjusted after replacing the original object with the replacement object in order to blend the object into the content. Which includes, correcting for mismatching in lighting, adjusting shadows, etc. Thus, in the example of replacing the car with a truck, a car and a truck would obviously have different shape. The previous shadow of the car in the content, would then be adjusted to the new shape of the shadow of the truck. Where both the car and the shadow of the car, would have a different shape from that of the truck and shadow of the truck).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil to include normalization technique is a lighting normalization technique, wherein applying the lighting normalization technique to blend the selected replacement object into the content comprises determining a shape of a shadow of the selected replacement object and using the determined shape of the shadow as a basis to modify a shadow of the detected object, wherein the detected object and the selected replacement object differ in at least one object characteristic other than scale, and wherein the shadow of the detected object and the shadow of the selected replacement object differ in at least one characteristic other than scale, as taught by Cohen, for the advantage of harmonizing content to make them look natural, e.g. so that editing is not easily detected (Cohen – Col 4: lines 49-51), making the content look natural nad unedited (Cohen – Col 18: lines 9-11), allowing for better integration of replacement object(s).
Govil and Cohen do not explicitly teach determining scene attribute data associated with the obtained video;
using at least the determined scene attribute data as a basis to select a replacement object.
In an analogous art, Zamiska teaches determining scene attribute data associated with the obtained video; using at least the determined scene attribute data as a basis to select a replacement object (Col 2: lines 58 – Col 3: line 41, Col 6: lines 15-39, Col 8: lines 48-63, Col 12: lines 35-48, Col 12: line 61 – Col 13: line 9).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil and Cohen to include determining scene attribute data associated with the obtained video; using at least the determined scene attribute data as a basis to select a replacement object, as taught by Zamiska, for the advantage of allowing content providers to effectively market brand names, products and/or services without providing annoying, distracting, and time wasting conventional advertisements or commercials (Zamiska – Col 2: lines 39-42), and utilizing further data/context, in order to provide the most appropriate replacement(s). 

Consider claims 2 and 11, Govil, Cohen, and Zamiska teach wherein the object characteristic data indicates a size, shape, or orientation of the detected object (Govil - Paragraph 0023, 0035).

Consider claims 3, Govil, Cohen, and Zamiska teach wherein detecting the object within the obtained video and determining the object characteristic data associated with the detected object comprises detecting edges and/or boundaries of the object (Govil - Paragraph 0043).


Consider claims 6 and 14, Govil, Cohen, and Zamiska teach wherein using at least the determined object characteristic data, the determined user profile data, and the determined scene attribute data as a basis to select a replacement object from among a set of multiple candidate replacement objects comprises using mapping data to map the determined object characteristic data, the determined user profile data, and the determined scene attribute data to a corresponding replacement object (Govil - Paragraph 0031, 0039; Zamiska - Col 2: lines 58 – Col 3: line 41, Col 6: lines 15-39, Col 8: lines 48-63, Col 12: lines 35-48, Col 12: line 61 – Col 13: line 9).

Consider claims 9 and 16, Govil, Cohen, and Zamiska teach wherein outputting for presentation, the generated video comprises transmitting to a presentation device, video data representing the generated video for display by the presentation device (Govil - Paragraph 0032, 0041, 0048).

Consider claim 17, Govil teaches the presentation device (Govil - Paragraph 0028, 0015).
Zamiska further teaches wherein presentation device is a television (Col 9: lines 58-59, Col 10: lines 15-16, Col 8: lines 25-35).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil, Cohen, and Zamiska to include wherein presentation device is a television, as further taught by Zamiska, for the advantage of enabling viewers to view content on widely available devices, that are already readily associated with entertainment consumption, and that a lot of people may already own.

Consider claim 18, Govil, Cohen, and Zamiska teach wherein outputting for presentation, the generated video comprises displaying the generated video (Govil - Paragraph 0032, 0041).

Consider claim 22, Govil, Cohen, and Zamiska teach wherein the scene attribute data specifies information about one or more people in the obtained video (Zamiska – Col 6: lines 15-39 teaches existing object may represent an entire scene defined by attributes such as characters in the scene and so forth).

Consider claim 23, Govil, Cohen, and Zamiska teach wherein the scene attribute data comprises scene scale data (Zamiska - Col 2: lines 58 – Col 3: line 41, Col 6: lines 15-39, Col 8: lines 48-63, Col 12: lines 35-48, Col 12: line 61 – Col 13: line 9).

Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Govil (US 2021/0120286), in view of Cohen et al. (US 10,613,726), in view of Zamiska et al. (US 8,910,201), in view of Rush et al. (US 2024/0233443), and further in view of Perincherry et al. (US 2022/0327320).
Consider claim 12, Govil, Cohen, and Zamiska teach wherein detecting the object within the obtained video and determining the object characteristic data associated with the detected object comprises: providing video data representing the obtained video to a trained model, wherein the trained model is configured to use at least video data as runtime input-data to generate object characteristic data as runtime output-data; and responsive to providing the video data to the trained model, receiving from the trained model, corresponding generated object characteristic data (Govil - Paragraph 0022-0023, 0034), but do not explicitly teach wherein the model data was trained using (i) at least one training input-data set including video data representing video depicting a training object, and (ii) at least one corresponding training output-data set including object characteristic data of the training object.
In an analogous art, Rush teaches wherein a model data was trained using (i) at least one training input-data set including video data representing video depicting a training object (Abstract, Paragraph 0005, 0025-0026).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil, Cohen, and Zamiska to include wherein a model data was trained using (i) at least one training input-data set including video data representing video depicting a training object, as taught by Rush, for the advantage of enabling the provision of multiple different variances (Rush – Paragraph 0003), where techniques utilized can provide for a full range of scene variability and/or positioning which overcomes limitations with existing techniques that may be limited in what they capture (Rush – Paragraph 0005), allowing the system to be properly trained on desired object(s).
Govil, Cohen, Zamiska, and Rush do not explicitly teach wherein the model data was trained using (ii) at least one corresponding training output-data set including object characteristic data of the training object.
In an analogous art, Perincherry teaches wherein a model data was trained using (ii) at least one corresponding training output-data set including object characteristic data of the training object (Paragraph 0066-0073).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil, Cohen, Zamiska, and Rush to wherein a model data was trained using (ii) at least one corresponding training output-data set including object characteristic data of the training object, as further taught by Perincherry, for the advantage of enabling the system to use a loss function to further train the system by using ground truth data (Perincherry – Paragraph 0066), providing greater accuracy and more tailored responses.

Claim(s) 7 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Govil (US 2021/0120286), in view of Cohen et al. (US 10,613,726), in view of Zamiska et al. (US 8,910,201), in view of Zavesky (US 2013/0141530), and further in view of Chi (US 2005/0225553).
Consider claims 7 and 15, Govil, Cohen, and Zamiska teach wherein replacing the detected object with the selected replacement object to generate video that is a modified version of the obtained video (Govil - Paragraph 0031) further comprises: 
determining object position data associated with the detected object; and at a position indicated by the determined object position data, replacing the detected object with the selected replacement object (Govil - Paragraph 0022, 0031).
In an analogous art, Zavesky teaches obtaining a three-dimensional model of the selected replacement object (134-Fig.1, Paragraph 0015, 0017-0019);
using the obtained three-dimensional model of the selected replacement object and the determined object characteristic data, together with a time-based transform model, to generate a time-based two-dimensional projection of the selected replacement object; and replacing the detected object with the corresponding time-based two-dimensional projection of the selected replacement object (Paragraph 0022-0024, 0043, 0049).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil, Cohen, and Zamiska to include obtaining a three-dimensional model of the selected replacement object; using the obtained three-dimensional model of the selected replacement object and the determined object characteristic data, together with a time-based transform model, to generate a time-based two-dimensional projection of the selected replacement object; and replacing the detected object with the corresponding time-based two-dimensional projection of the selected replacement object, as taught by Zavesky, for the advantage of adjusting to viewing demographics so that content providers do not need to reshoot incurring additional costs to reshoot that includes a new product design, if the product original shot may have change, or accept smaller viewership, as well as enabling a single content to place products that best match a viewing demographic without having to produce or pay for multiple contents with different types of the same product (Zavesky – Paragraph 0004), allowing the system to efficiently, accurately, and convincingly replace/substitute objects seamlessly.
Govil, Cohen, Zamiska, and Zavesky do not explicitly teach transform model is an affine transform model.
In an analogous art, Chi teaches transform model is an affine transform model (Paragraph 0060).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil, Cohen, Zamiska, and Zavesky to include transform model is an affine transform model, as taught by Chi, for the advantage of enabling creation of a map between two affine spaces such that collinearity and ratios between distances are preserved, aiding in efficient transform such that lines may be translated, scaled, rotated, sheared, or squeezed.

Claim(s) 24 is/are rejected under 35 U.S.C. 103 as being unpatentable over Govil (US 2021/0120286), in view of Cohen et al. (US 10,613,726), in view of Zamiska et al. (US 8,910,201), in view of Mok et al. (US 2022/0012520).
Consider claim 24, Govil, Cohen, and Zamiska teach further comprising determining the scene scale, wherein determining the scene scale data (Zamiska - (Col 2: lines 58 – Col 3: line 41, Col 6: lines 15-39, Col 8: lines 48-63, Col 12: lines 35-48, Col 12: line 61 – Col 13: line 9), but do not explicitly teach determining comprises:
using a trained model to obtain scene scale data for the scene, wherein the trained model was trained with data video data and corresponding metadata specifying information about areas and/or objects in the scene as an input data set.
In an analogous art, Mok teaches determining comprises: using a trained model to obtain scene scale data for the scene, wherein the trained model was trained with data video data and corresponding metadata specifying information about areas and/or objects in the scene as an input data set (Paragraph 0059, 0106, Paragraph 0114-0120).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil, Cohen, and Zamiska to include determining comprises: using a trained model to obtain scene scale data for the scene, wherein the trained model was trained with data video data and corresponding metadata specifying information about areas and/or objects in the scene as an input data set, as taught by Mok, for the advantage of enabling the system to intelligently determine and process content, gaining a better understanding and context of content.

Claim(s) 25 is/are rejected under 35 U.S.C. 103 as being unpatentable over Govil (US 2021/0120286), in view of Cohen et al. (US 10,613,726), in view of Zamiska et al. (US 8,910,201), in view of Purdy (US 12,240,445).
Consider claim 25, Govil, Cohen, and Zamiska teach wherein determining object characteristic data associated with the detected object comprises: detecting a brand and/or model of the detected object; and using the detected brand and/or model of the detected (Govil - Paragraph 0035 teaches after detecting an object within the media content, content ingestion process continues by parsing or otherwise analyzing the portion where the object was detected to recognize, characterize, or otherwise identify one or more attributes associated with the detected object. Object recognition module 122 may perform object recognition, object classification, and/or other image processing techniques to characterize various attributes of the detected object, such as, for example, the size of the object, the shape of the object, the color of the object, the make, model and/or manufacturer of the object, a type or other taxonomic classification associated with the detected object, and/or the like. Paragraph 0036 teaches after identifying attributes associated with the detected object, the content ingestion process 200 tags, marks, or otherwise associates the portions of the media content including the detected object with the metadata characterizing attributes of that detected object. Paragraph 0039 teaches selecting a substitute object for replacing the originally detected object using the metadata associated with the original object and other selection or substitution criteria. The various fields of metadata 142 characterizing the original object that was detected in the media content 112 may be provided to the object substitution module 162, which, in turn, queries the table of substitute object metadata 118 to identify entries for similar substitute objects based on similarities, matching, degree of differences, or other relationships between the metadata 118 associated with those substitute objects and the metadata 142 associated with the original object. The object substitution module 162 may identify potential substitute objects having the same object type or other taxonomic classification(s) or the same or similar size, shape, or other visual characteristics as the original object. Paragraph 0046 teaches for the microwave object 404, the object substitution module 162 may identify a different microwave the substitute object that achieves the desired combination of similarity to the original object and temporal or contextual relevance to the current viewing context. The object substitution module 162 may select or otherwise identify a substitute microwave that is not outdated, available at the geographic region where the client device 108 resides, etc. and is also similar in size and shape to the original microwave 404).
Govil, Cohen, and Zamiska do not explicitly teach using detected brand and/or model of detected object to look up size and/or scale data for the detected object (Col 4: lines 35-52 teaches classifying an object as a particular make and model, and size information and other data may be obtained from a database for that particular object. These techniques may be used for classifying and determining object sizes of various objects).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil, Cohen, and Zamiska to include using detected brand and/or model of detected object to look up size and/or scale data for the detected object, as taught by Purdy, for the advantage of enabling the system to easily and accurately determine attribute data pertaining to object(s) of interest.

Claim(s) 26 is/are rejected under 35 U.S.C. 103 as being unpatentable over Govil (US 2021/0120286), in view of Cohen et al. (US 10,613,726), in view of Zamiska et al. (US 8,910,201), in view of Gopalan (US 2015/0310307).
Consider claim 26, Govil, Cohen, and Zamiska teach wherein determining object characteristic data associated with the detected object comprises: detecting a brand and/or or model of the detected object; using the detected brand and/or model of the detected object to look up multiple candidate size and/or scale data sets for the detected object; and based on an analysis within the obtained video, selecting a single size and/or scale data set from among the multiple candidate size and/or scale data sets (Govil - Paragraph 0035 teaches after detecting an object within the media content, content ingestion process continues by parsing or otherwise analyzing the portion where the object was detected to recognize, characterize, or otherwise identify one or more attributes associated with the detected object. Object recognition module 122 may perform object recognition, object classification, and/or other image processing techniques to characterize various attributes of the detected object, such as, for example, the size of the object, the shape of the object, the color of the object, the make, model and/or manufacturer of the object, a type or other taxonomic classification associated with the detected object, and/or the like. Paragraph 0036 teaches after identifying attributes associated with the detected object, the content ingestion process 200 tags, marks, or otherwise associates the portions of the media content including the detected object with the metadata characterizing attributes of that detected object. Paragraph 0039 teaches selecting a substitute object for replacing the originally detected object using the metadata associated with the original object and other selection or substitution criteria. The various fields of metadata 142 characterizing the original object that was detected in the media content 112 may be provided to the object substitution module 162, which, in turn, queries the table of substitute object metadata 118 to identify entries for similar substitute objects based on similarities, matching, degree of differences, or other relationships between the metadata 118 associated with those substitute objects and the metadata 142 associated with the original object. The object substitution module 162 may identify potential substitute objects having the same object type or other taxonomic classification(s) or the same or similar size, shape, or other visual characteristics as the original object. Paragraph 0046 teaches for the microwave object 404, the object substitution module 162 may identify a different microwave the substitute object that achieves the desired combination of similarity to the original object and temporal or contextual relevance to the current viewing context. The object substitution module 162 may select or otherwise identify a substitute microwave that is not outdated, available at the geographic region where the client device 108 resides, etc. and is also similar in size and shape to the original microwave 404. As make, model, and/or manufacturer of original object may be detected and used to query and select substitute objects, having the same taxonomic classification(s), the ideal selected substitute object would be of a similar size, shape, etc, selected from a variety of available substitute objects that may also be of various sizes). 
Govil, Cohen, and Zamiska do not explicitly teach based on an analysis within the obtained video is based on an analysis of multiple objects within the obtained video.
In an analogous art, Gopalan teaches based on an analysis within the obtained video is based on an analysis of multiple objects within the obtained video (Paragraph 0026).
Therefore, it would have been obvious to a person of ordinary skill in the art to modify the system of Govil, Cohen, and Zamiska to based on an analysis within the obtained video is based on an analysis of multiple objects within the obtained video, as taught by Gopalan, for the advantage of enabling the system to determine additional information, ascertaining a fuller and more complete context of objects within the scene/image.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON K LIN whose telephone number is (571)270-1446.  The examiner can normally be reached on Monday-Friday 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Pendleton can be reached on 571-272-7527.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JASON K LIN/Primary Examiner, Art Unit 2425
Read full office action
Prosecution Timeline

Mar 03, 2023
Application Filed
Feb 24, 2024
Non-Final Rejection — §103
Jun 24, 2024
Examiner Interview Summary
Jun 24, 2024
Applicant Interview (Telephonic)
Jun 26, 2024
Response Filed
Sep 23, 2024
Final Rejection — §103
Feb 25, 2025
Request for Continued Examination
Feb 28, 2025
Response after Non-Final Action
Mar 06, 2025
Applicant Interview (Telephonic)
Mar 06, 2025
Examiner Interview Summary
Aug 18, 2025
Non-Final Rejection — §103
Dec 04, 2025
Response Filed
Jan 14, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/733,331
Patent 12604047
JUST IN TIME CONTENT CONDITIONING
2y 5m to grant Granted Apr 14, 2026
18/733,118
Patent 12593082
JUST IN TIME CONTENT CONDITIONING
2y 5m to grant Granted Mar 31, 2026
18/538,255
Patent 12556760
CREDITING EXPOSURE TO MEDIA IDENTIFIED USING SOURCE FILTERING
2y 5m to grant Granted Feb 17, 2026
17/553,353
Patent 12548455
GROUND-BASED CONTENT CURATION PLATFORM DISTRIBUTING GEOGRAPHICALLY-RELEVANT CONTENT TO AIRCRAFT INFLIGHT ENTERTAINMENT SYSTEMS
2y 5m to grant Granted Feb 10, 2026
18/073,358
Patent 12537993
SMART HOME AUTOMATION USING MULTI-MODAL CONTEXTUAL INFORMATION
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
49%
Grant Probability
84%
With Interview (+34.8%)
3y 7m
Median Time to Grant
High
PTA Risk
Based on 454 resolved cases by this examiner. Grant probability derived from career allow rate.