Last updated: April 19, 2026
Application No. 18/449,789
Video System with Scene-Based Object Insertion Feature

Final Rejection §103
Filed
Aug 15, 2023
Examiner
GE, JIN
Art Unit
2619
Tech Center
2600 — Communications
Assignee
Roku Inc.
OA Round
4 (Final)
Interview Optional

— +18.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 520 resolved cases, 2023–2026
Examiner Intelligence

GE, JIN View full profile →
Grants 80% — above average
Career Allow Rate
416 granted / 520 resolved
+18.0% vs TC avg
Strong +18% interview lift
Without
With
+18.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
38 currently pending
Career history
558
Total Applications
across all art units
Statute-Specific Performance

§101
9.0%
-31.0% vs TC avg
§103
60.2%
+20.2% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
11.0%
-29.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 520 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
 	This is in response to applicant’s amendment/response filed on 12/10/2025, which has been entered and made of record.  Claims 1, 10, 15, and 19 have been amended.  Claims 1-20 are pending in the application. 

Response to Arguments
 	Applicant's arguments filed on 12/10/2025 have been fully considered but they are not persuasive. Applicant submitted new amended claims. Accordingly, new grounds of rejection are set forth above. The new grounds of rejection conclusion have been necessitated by Applicant's amendments to the claims. 
Applicants state that “Examiner admitted that Overton as modified by Huawei does not teach    or suggest the second half (the “obtaining” step) of this feature (prior to amendment). Therefore,    Overton as modified by Huawei necessarily also does not teach or suggested the amended version,    which further specifies “obtaining description data associated with the selected object to  be   inserted into the detected area.” Moreover, newly cited reference Cohen-Tidhar does not make up    for this deficiency of Overton and Huawei. Indeed, at best, Cohen-Tidhar focuses on objects    already included in an area of the video, and therefore does not teach or suggest obtaining data    associated with a selected object to  be inserted into a detected area, as recited in amended claim 1.   
  For at least this reason, the rejection of claim 1 should be withdrawn. And for largely the    same reasons, the rejections of the remaining claims should also be withdrawn.”.	The examiner disagrees. Although Cohen-Tidhar et al. do not teach an object to be inserted into the detected area but Overton et al. teach it. So after combine these prior art , they teach obtaining description data associated with the selected object to  be   inserted into the detected area through further modifying an inserted object in the image, as taught by Overton et al. as modified by Huawei, with scale this insert object based on description data associated with this object, as taught by Cohen-Tidhar et al.  because Overton et al. teach an object to be inserted into the detected area (col 1:65-67 and col 2:1-27 and col 7:32-67, identify an area in the image and insert an object) and Cohen-Tidhar et al. further teach obtaining description data associated with the selected object (par 0029, par 0032, par 0078-0080, provide description data associated with content/object to further scale them) and claim 1 would be maintained.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3, 4, 12, 15, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent 7,230,653 to Overton et al. in view of WO2020/016353 to Huawei Telekom (WIPO Pub WO 2020/016353 A1, herein referred to as “Huawei”) in further view of U.S. PGPubs 2021/0334547 to Cohen-Tidhar et al..

Regarding claim 1, Overton et al. teach a method comprising:
obtaining video that depicts an area across multiple frames of the video, wherein the area is part of a scene of the video, and wherein the area is suitable for having an object inserted therein (In col 4, lines 22-37 “The video signal is further decoded by a video decoder/buffer 119 to extract and store a video image from each frame of the video signal … The operation of the image insertion system 100 and the image insertion process of FIG. 2 will be described below in reference to image 400. However, the image insertion process will be repeated for a video image in each successive frame, at least to the extent the image changes between frames” and col 4, lines 52-59 “The area of the site, in which the target image is to be inserted, whether it is a surface of a real object or defined as an imaginary surface or object, is referred to as a target area. Having predefined rules allows a preselected target image to be inserted automatically depending on predefined criteria. For example, a target image may change at predefined times or periods, or based on the status of the event being telecast.”);
detecting the area within the obtained video and determining area characteristic data associated with the detected area (This is also shown in figures 6 and 7 where the area 504 is detected within the obtained video.  Also please see col 5, lines 32-45 “Referring briefly also to FIGS. 4, 5 and 6, image 502 of FIG. 5 is an example of a rendering of a predefined model of the site, in which the video image shown in FIG. 4 was taken … The target area in this example is a predefined area 504 of the surface of the dasher board 402 … A target area surface, real or imaginary, need not be flat.”  In this instance, the determined area characteristic data includes a flat predefined area 504 on a real surface (the dasher board 402));
determining scene attribute data associated with the scene, wherein determining scene attribute data associated with the scene comprises using video data representing the video to determine the scene attribute data (col 4, lines 25-30: “The video signal is further decoded by a video decoder/buffer 119 to extract and store a video image from each frame of the video signal. An example of a video image generated by a camera is illustrated as video image 400 in FIG. 4. This particular example is of an ice hockey game.”, col 1:65-67 and col 2:1-27 and col 4:38-59, identify a surface based on the analysis of the inputted video to select a virtual content to insert into video)
using at least the determined area characteristic data and the determined scene attribute data as a basis to select an object to be inserted into the detected area from among a set of multiple candidate objects (Determined scene attribute data is also used as a basis of what type of object to select and display as the target image, e.g. please see col 2, lines 14-27 “Image insertion can therefore take place downstream, for example, at a local affiliate of a television network that is receiving a video feed for an event that is being broadcast. The downstream system would need to be provided with only the model of the site and could have a database of different target images added to the model. Thus, inserted advertising can be tailored to a local audience … different target images may be inserted when the video signal is re-broadcast at later times. Thus, inserting advertising can be tailored to the time of the broadcast, or re-broadcast.”  In this case, the local audience for the given hockey game makes up the “determined scene attribute data”; col 4: 38-59, “the controller 120 accesses at step 206 predefined image insertion rules in database 122 to determine, based at least in part on a camera identifier embedded in the telemetry data, what image or images--referred to herein as target images--are to be inserted into a particular video image in the frame of a video signal. The target image may be, for example, advertising that will be inserted on a preselected surface--real or imaginary--within the original video image. The area of the site, in which the target image is to be inserted, whether it is a surface of a real object or defined as an imaginary surface or object, is referred to as a target area. Having predefined rules allows a preselected target image to be inserted automatically depending on predefined criteria”); 
inserting the selected object into the detected area to generate video that is a modified version of the obtained video (col : 65-67 and col 2:1-27, “ a three-dimensional model of selected target areas within a site is defined and rendered using computer aided design (CAD) software, based on the position and perspective of a camera that generates the video …. inserted advertising can be tailored to a local audience. In addition, since the information on the perspective of the camera is encoded onto the video signal and is thus available whenever and wherever the video signal is available, different target images may be inserted when the video signal is re-broadcast at later times. Thus, inserting advertising can be tailored to the time of the broadcast, or re-broadcast”, col 7: 32-67 and col 8:1-12, “This permits an image insertion system located downstream to more easily separate occlusions within the image to replace target images inserted upstream with different target images. For example, if the target image is advertising, a local affiliate may insert advertising directed to the particular local market in place of the original advertising ….The masked background image and masked target image are combined, and then the occlusion image is combined with this image to generate a final composite image. The composite image is then inserted into a frame on a video signal for transmission at step 1712” …..selecting the target image advertisement (the selected object) based upon the scene attributes such as the status of events or times or periods with the scene, e.g. Overton in col 4, lines 55-59 recites: “Having predefined rules allows a preselected target image to be inserted automatically depending on predefined criteria. For example, a target image may change at predefined times or periods, or based on the status of the event being telecast); and 
outputting for presentation the generated video (col 7:20-31, “The occlusion image 400c is then combined with image 400d to produce a final image 400e, shown in FIG. 16. The final image includes target image 604 inserted into target area 502”).
But Overton et al. keep silent for teaching determining scene attribute data associated with the scene, wherein determining scene attribute data associated with the scene comprises using video data representing the video to determine the scene attribute data; using at least the determined area characteristic data and the determined scene attribute data as a basis to select an object to be inserted into the detected area, from among a set of multiple candidate objects.
In related endeavor, Huawei teaches determining scene attribute data associated with the scene, wherein determining scene attribute data associated with the scene comprises using video data representing the video to determine the scene attribute data; using at least the determined area characteristic data and the determined scene attribute data as a basis to select an object to be inserted into the detected area from among a set of multiple candidate objects (Figs 3 and 5, abstract, page 4 line 15-22, page 7 line 18-26, page 14 line 1-29, select virtual content from database for video based on semantic data which is obtained from video scene (Fig 3)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the determined area characteristic data as well as a basis to select an object (target image advertisement) from among a set of multiple candidate objects as taught by Huawei with the system of Overton in order to more naturally place the advertisements into the video such that they are not distracting or disturbing by taking into account the surrounding area (Huawei on page 3, lines 10-24).
But Overton et al. as modified by Huawei keep silent for teaching obtaining description data associated with the selected object to be inserted into the detected area, wherein the obtained description data provides an indication of the importance of the object's scale; wherein inserting the selected object comprises scaling the selected object and using the provided indication of the importance of the object's scale as a basis to select an extent of resources to utilize in connection with scaling the object to be inserted into the detected area.
In related endeavor, Cohen-Tidhar et al. teach obtaining description data associated with the selected object, wherein the obtained description data provides an indication of the importance of the object's scale; wherein inserting the selected object comprises scaling the selected object and using the provided indication of the importance of the object's scale as a basis to select an extent of resources to utilize in connection with scaling the object (par 0029, “the content-aware metadata of the present invention may be utilized by a content editing or content authoring program or application or device, in order to automatically perform (or, to propose to perform) one or more media editing operations or video editing operations; for example, thereby enabling a video editing program to perform selective and content-aware cropping or trimming or re-sizing of frame(s) of a video, based on such content-aware metadata “, par 0032, “ the content-aware metadata may include the following fields or parameters or indicators: …. Importance Value, or level of importance, or level of priority to maintain or keep the Object and not to discard it and not to hide it or overwrite it, related to a particular Object in the frame, and indicating the level of importance within a pre-define scale (e.g., from 0 to 100, wherein 0 indicates a non-important Object that may safely be discarded or cropped-out or overwritten, and wherein 100 indicates a crucial or highly important Object that may not or should not be discarded or cropped-out or overwritten); (G) Importance Distribution, indicating the distribution of importance levels within the Object's bounding rectangle or circle (or other pre-defined shape or polygon); for example, indicating a uniform distribution such that same level of importance is assigned to every pixel in the Object's bounding shape, or indicating a Gaussian distribution which assigns a higher level of importance to pixels that are closer to the center of the bounding shape of the Object “, par 0078-0080, “An Object Importance Score Determination Unit 516 determines or assigns an Object Importance Score value, or an object importance level, to each Object; for example, on a scale of 0 to 100, wherein a greater value indicates a greater visual importance and/or contextual importance of the Object from the point-of-view of a human observer  ….taking into account the size and/or the color of an Object as factors that may affect its Importance Score (e.g., some implementations may determine that an Object having very small dimensions would be assigned a lower Importance Score; some implementations may assign a higher Importance Score to an Object having a highly visible color contrast relative to its background, such as a black ball on a white background); and/or by applying other or additional suitable rules or criteria or conditions”).
It would have been obvious to a person of ordinary skill in the art at the time before the effective filing data of the claimed invention to modified Overton et al. as modified by Huawei to include obtaining description data associated with the selected object, wherein the obtained description data provides an indication of the importance of the object's scale; wherein inserting the selected object comprises scaling the selected object and using the provided indication of the importance of the object's scale as a basis to select an extent of resources to utilize in connection with scaling the object as taught by Cohen-Tidhar et al. to further modify an inserted object in the image, as taught by Overton et al. as modified by Huawei, with scale this insert object based on description data associated with this object, as taught by Cohen-Tidhar et al. to provide an indication of the importance of the object's scale from  metadata, as taught by Cohen-Tidhar et al. into the insert object, as taught by Overton et al. as modified by Huawei, to modify and integrate such media item such as resizing, cropping, or trimming based on content-aware metadata related to (or describing, or indicating) one or more spatial and/or temporal properties or objects that are of interest and/or that are of reduced-interest and/or that are of increased-interest, and/or other information or metadata about level of interest of one or more scenes or objects or other content-item within a multimedia file or stream.

Regarding claim 3, Overton et al. as modified by Huawei and Cohen-Tidhar et al. teach all the limitation of claim 1, and further teach wherein the area is a surface of an object within the scene (Overton et al.: In figures 6-8 where it shows the area 604 as an area along a surface of a side-boards within a hockey arena, col 1:65-67 and col 2:1-27 and col 4:38-59, identify a surface based on the analysis of the inputted video to select a virtual content to insert into video, Huawei: Figs 3 and 5, abstract, page 4 line 15-22, page 7 line 18-26, page 14 line 1-29, select virtual content from database for video based on semantic data which is obtained from video scene (Fig 3)).

Regarding claim 4, Overton et al. as modified by Huawei and Cohen-Tidhar et al. teach all the limitation of claim 1, and Overton et al. further teach wherein the area characteristic data indicates a size, shape or orientation of the detected area (col 1:26-41, “More current systems and methods, including the one disclosed by DiCicco, et al, rely on pattern recognition techniques for identifying landmarks within an image. The spatial relationships among the landmarks within the video image are used to locate, size and orient an inserted image”… In figures 4 to 6 where the area has a defined size, shape, and orientation.  Also col 2, lines 12-18 talk about taking the perspective into account where the perspective would include the orientation of the detected area).

Regarding claim 12, Overton et al. as modified by Huawei and Cohen-Tidhar et al. teach all the limitation of claim 1, and Cohen-Tidhar et al. further teach wherein the indication of the importance of the object's scale is represented as a score value within a given range, where a low score in the range corresponds to a low use of resources in connection with scaling the object, and wherein a low score in the range corresponds to a low use of resources in connection with scaling the object (par 0032, “ the content-aware metadata may include the following fields or parameters or indicators: …. Importance Value, or level of importance, or level of priority to maintain or keep the Object and not to discard it and not to hide it or overwrite it, related to a particular Object in the frame, and indicating the level of importance within a pre-define scale (e.g., from 0 to 100, wherein 0 indicates a non-important Object that may safely be discarded or cropped-out or overwritten, and wherein 100 indicates a crucial or highly important Object that may not or should not be discarded or cropped-out or overwritten); (G) Importance Distribution, indicating the distribution of importance levels within the Object's bounding rectangle or circle (or other pre-defined shape or polygon); for example, indicating a uniform distribution such that same level of importance is assigned to every pixel in the Object's bounding shape, or indicating a Gaussian distribution which assigns a higher level of importance to pixels that are closer to the center of the bounding shape of the Object “, par 0078-0080, “An Object Importance Score Determination Unit 516 determines or assigns an Object Importance Score value, or an object importance level, to each Object; for example, on a scale of 0 to 100, wherein a greater value indicates a greater visual importance and/or contextual importance of the Object from the point-of-view of a human observer  ….taking into account the size and/or the color of an Object as factors that may affect its Importance Score (e.g., some implementations may determine that an Object having very small dimensions would be assigned a lower Importance Score; some implementations may assign a higher Importance Score to an Object having a highly visible color contrast relative to its background, such as a black ball on a white background); and/or by applying other or additional suitable rules or criteria or conditions”). This would be obvious for the same reason given in the rejection for claim 1

Regarding claim 15, Overton et al. teach a computing system configured for performing a set of acts (col 1:49-64, an image insert system). The remaining limitations of the claim are similar in scope to claim 1 and rejected under the same rationale.

Regarding claim 19, Overton et al. teach a non-transitory computer-readable medium having stored thereon program instructions that upon execution by a computing system, cause performance of a set of acts (col 4, lines 38-41 teaches of using a controller 120 (a processor) with software and hardware to run their system.  Overton also mentions using a CAD system with software executing in col 5, lines 35-38 to run their system.  The system of Overton would have to have some type of non-transitory machine readable medium present in order to function and run on a computer as described by the reference). The remaining limitations of the claim are similar in scope to claim 1 and rejected under the same rationale.

Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Overton in view of Huawei in further view of Cohen-Tidhar et al. and Dengler et al. (Pub No. US 2005/0001852 A1).

As per claim 2, Overton alone does not explicitly teach the claimed limitations.
However, Overton, Huawei, and Cohen-Tidhar et al. in combination with Dengler teaches the claimed:
2. The method of claim 1, wherein the area is a surface of a floor within the scene (Please see Dengler in figures 8 and 9 where the area is a surface of a floor of a basketball court within a sports game scene.  The claimed feature is taught when Dengler is used with Overton).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention for the area to be a surface of a floor within the scene as taught by Dengler with the system of Overton as modified by Huawei and Cohen-Tidhar et al..  This allows the system of Overton to place virtual advertisements in their television video streams for surfaces along the floor of a basketball court during games.


Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Overton in view of Huawei in further view of Cohen-Tidhar et al. and Wan et al. (Pub No. US 2006/0026628 A1).

As per claim 5, Overton alone does not explicitly teach the claimed limitations.
However, Overton, Huawei, and Cohen-Tidhar et al. in combination with Wan teaches the claimed:
5.The method of claim 1, wherein detecting the area within the obtained video and determining the area characteristic data associated with the detected area comprises: 
providing video data representing the obtained video to a trained model, wherein the trained model is configured to use at least video data as runtime input-data to generate area characteristic data as runtime output-data (Wan in [0130] “Embodiments include an automatic system for insertion of content into a video presentation. Machine learning methods are used to identify suitable frames and regions of a video presentation for implantation automatically, and to select and insert virtual content into the identified frames and regions of a video presentation automatically” and [0133] “Scenes of lesser relevance within a video can be selected. This provides flexibility in assigning target regions in the video presentation for content insertion. Embodiments of the invention can be fully automatic and run in a real-time fashion, and hence are applicable to both video-on-demand and broadcast applications.”); and 
responsive to providing the video data to the trained model, receiving from the trained model, corresponding generated area characteristic data (Wan in [0130] “… The identification of suitable frames and regions of a video presentation for implantation may include the steps of: segmenting video presentation into frames or video segments; determining and calculating distinctive features such as colour, texture, shape and motion, etc. for each frame or video segment; and identifying the frames and regions for implantation by comparing calculated feature parameters obtained from the learning process.”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the learning process as taught by Wan with the system of Overton as modified by Huawei and Cohen-Tidhar et al. in order to more effectively find good places to add virtual advertising automatically during sports games (Wan in [0008]-[0009] and [0059]).

Claim 6, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Overton in view of Huawei in further view of Cohen-Tidhar et al. and Zass et al. (Pub No. US 2023/0052442 A1).

As per claim 6, Overton alone does not explicitly teach the claimed limitations.
However, Overton, Huawei, and Cohen-Tidhar et al. in combination with Zass teaches the claimed:
6. The method of claim 1, wherein the scene attribute data includes object description data for at least one object depicted in the scene, and wherein determining the scene attribute data comprises: 
providing video data (Zass in [0084] “Some non-limiting examples of image data (such as image data 102) may include one or more images, grayscale images, color images, series of images, 2D images, 3D images, videos, 2D videos, 3D videos, frames, footages, or data derived from other image data. In some embodiments, analyzing image data”) representing the obtained video to a trained model, wherein the trained model is configured to use at least video data as runtime input-data to generate object description data as runtime output-data (Zass in the middle of [0174] “a video of the shoot, and the image data may be analyzed using a trained machine learning algorithm” and Zass towards the beginning of [0232] where they refer to: “. For example, a machine learning model may be trained using training examples to determine whether to include descriptions of objects in textual contents based on events associated with the objects and/or the descriptions” and Zass towards the end of [0075] where they refer to “… In some examples, the output may be provided: in real time”); and 
responsive to providing the video data to the trained model, receiving from the trained model, corresponding object description data (Zass in [0232] “… Step 708 may use the trained machine learning model to analyze data associated with the first group of one or more events of Step 704 and determine to include in the textual content the description based on the first group of one or more events of the first object of Step 702 …  For example, a machine learning model may be trained using training examples to determine whether to include descriptions of objects in textual contents based on events and/or the descriptions” and Zass in [0139] “… For example, a machine learning model may be trained using training examples to select adjectives based on images and/or videos. An example of such training example may include a sample image and/or a sample video associated with a sample event, together with a label indicating a sample selection of an adjective for the sample event. Step 456 may use the trained machine learning model to analyze the image data associated with the event and select the adjective for the event.”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the trained model to provide object description data as taught by Zass with the system of Overton as modified by Huawei and Cohen-Tidhar et al. in order to more quickly and easily provide description data for a large amount of real-world objects and their associated events (Zass in [0003]).  

As per claims 16 and 20, these claims are similar in scope to limitations recited in claim 6, and thus are rejected under the same rationale.


Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Overton in view of Huawei in further view of Cohen-Tidhar et al. and Desmond et al. (Patent No. US 10,839,416 B1).

As per claim 7, Overton alone does not explicitly teach the claimed limitations.
However, Overton, Huawei, and Cohen-Tidhar et al. in combination with Desmond teaches the claimed:
7. The method of claim 1, wherein the scene attribute data includes object description data for at least one object depicted in the scene, and wherein determining the scene attribute data comprises: identifying object description data that is stored as metadata associated with the obtained video (Desmond in col 6, lines 43-49 “(128) At 608, the video processing module may generate metadata for the objects detected in the current frame (or image). The metadata may include: a timestamp of a start of the current frame; object identifiers; object edge descriptions; a stop (or end) timestamp of the current frame; locations dimensions, and/or other information describing the detected objects.”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the object description data as taught by Desmond with the system of Overton as modified by Huawei and Cohen-Tidhar et al. in order to help determine which items to advertise (Desmond in col 27, lines 13-26).  

As per claim 17, this claim is similar in scope to limitations recited in claim 7, and thus is rejected under the same rationale.

Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Overton in view of Huawei in further view of Cohen-Tidhar et al. and Arana et al. (Pub No. US 2019/0182486 A1).

As per claim 8, Overton alone does not explicitly teach the claimed limitations.
However, Overton, Huawei, and Cohen-Tidhar et al. in combination with Arana teaches the claimed:
8. The method of claim 1, wherein the scene attribute data includes scene script data for the scene, and wherein determining the scene attribute data comprises: identifying scene script data that is stored as metadata associated with the obtained video (Arana in [0029] “… Frames that make up such a scene may contain thematic and/or cinematic characteristics … a scene may involve dialogue between two actors or characters. In this example, frames in which the two actors are present, and/or audio portions of the media content that contain audio that matches a script or scene metadata between the two actors or characters may be identified”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to identify scene script data as taught by Arana with the system of Overton as modified by Huawei and Cohen-Tidhar et al. in order to allow transcoding to be performed on a per-scene basis rather than on the basis of the media content as a whole (2nd half of of [0022] in Arana).  For example, identifying the scene script data on a per-scene basis allows the system to better analyze the overall video content to determine where different scenes occur.  In turn, may help Overton with virtual advertisement placement due to the increased understanding of the video scene structures.

As per claim 18, this claim is similar in scope to limitations recited in claim 8, and thus is rejected under the same rationale.

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Overton in view of Huawei in further view of Cohen-Tidhar et al. and Dengler et al. (Pub No. US 2005/0001852 A1).

As per claim 9, Overton alone does not explicitly teach the claimed limitations.
However, Overton, Huawei, and Cohen-Tidhar et al. in combination with Dengler teaches the claimed:
9. The method of claim 1, wherein using at least the determined area characteristic data and the determined scene attribute data as a basis to select an object from among a set of multiple candidate objects comprises using mapping data to map the determined area characteristic data and the determined scene attribute data to a corresponding object (Dengler shows this feature in figure 9 where the mapping data is used to transform the graphics image 904 into the correct perspective view in images 912 and 914 (where the “New Logo” appears along the basketball court floor)).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the mapping data as taught by Dengler with the system of Overton as modified by Huawei and Cohen-Tidhar et al. in order to ensure that the virtual advertisement or new logo image appears geometrically correct when inserted into the scene that depicts the sports game being played and recorded over a series of video frames.

Claims 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Overton in view of Huawei in further view of Cohen-Tidhar et al. and Mishra et al. (Pub No. US 2016/0127778 A1).

As per claims 13 and 14, Overton alone does not explicitly teach the claimed limitations.
However, Overton, Huawei, and Cohen-Tidhar et al. in combination with Mishra teaches the claimed:
13. The method of claim 1, wherein outputting for presentation, the generated video comprises a presentation device displaying the generated video and 14. The method of claim 13, wherein the presentation device is a television (Overton in col 2, lines 14-17 teaches that their system is used with a television station, e.g. Overton states “Image insertion can therefore take place downstream, for example, at a local affiliate of a television network that is receiving a video feed for an event that is being broadcast”.  
However, Overton is silent about the presentation device itself.  Mishra teaches that it was known in the art to display the sports programming on a television presentation device, e.g. please see Mishra in [0021] “… The media receiver may display media channels of the channel lineup (e.g., cable television programming, such as sitcom content shows, news content shows, sports content shows, etc.) on a television display based upon the media channel signal. A head end detection component and/or an intermediate multimedia device component, hosted on the television display”).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use the television presentation device as taught by Mishra with the system of Overton as modified by Huawei and Cohen-Tidhar et al..  This would have been obvious because Overton is broadcasting a video feed from an event on a television network.  Thus, it would have been obvious to view the broadcast on the television network on a television device because this is most often how such video data is viewed on a television network.



Allowable Subject Matter
Claims 10-11 are objected to as being dependent upon a rejected base, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: The cited prior art fails to teach the combination of elements recited in claim 10, including " wherein inserting the selected object into the detected area to generate video that is a modified version of the obtained video comprises: obtaining a three-dimensional model of the selected object; using the obtained three-dimensional model of the selected object and the determined area characteristic data, together with a time-based transform model, to generate a time-based two-dimensional projection of the selected object; determining area position data associated with the detected area; at a position indicated by the determined area position data, inserting into the detected area the corresponding time-based two-dimensional projection of the selected object".

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Jin Ge whose telephone number is (571)272-5556. The examiner can normally be reached 8:00 to 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jason Chan can be reached at (571)272-3022. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JIN . GE
Examiner
Art Unit 2619



/JIN GE/Primary Examiner, Art Unit 2619
Read full office action
Prosecution Timeline

Aug 15, 2023
Application Filed
Feb 26, 2024
Non-Final Rejection — §103
Jun 03, 2024
Examiner Interview Summary
Jun 03, 2024
Applicant Interview (Telephonic)
Jun 06, 2024
Response Filed
Aug 17, 2024
Final Rejection — §103
Nov 20, 2024
Applicant Interview (Telephonic)
Nov 20, 2024
Examiner Interview Summary
Dec 23, 2024
Response after Non-Final Action
Feb 19, 2025
Response after Non-Final Action
Feb 19, 2025
Notice of Allowance
Mar 03, 2025
Response after Non-Final Action
Jun 18, 2025
Request for Continued Examination
Jun 20, 2025
Response after Non-Final Action
Jul 28, 2025
Non-Final Rejection — §103
Dec 08, 2025
Examiner Interview Summary
Dec 08, 2025
Applicant Interview (Telephonic)
Dec 10, 2025
Response Filed
Jan 12, 2026
Final Rejection — §103
Apr 13, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/537,024
Patent 12592024
QUANTIFICATION OF SENSOR COVERAGE USING SYNTHETIC MODELING AND USES OF THE QUANTIFICATION
2y 5m to grant Granted Mar 31, 2026
18/597,468
Patent 12586296
METHODS AND PROCESSORS FOR RENDERING A 3D OBJECT USING MULTI-CAMERA IMAGE INPUTS
2y 5m to grant Granted Mar 24, 2026
18/565,927
Patent 12579704
VIDEO GENERATION METHOD AND APPARATUS, DEVICE, AND STORAGE MEDIUM
2y 5m to grant Granted Mar 17, 2026
18/406,280
Patent 12573164
DESIGN DEVICE, PRODUCTION METHOD, AND STORAGE MEDIUM STORING DESIGN PROGRAM
2y 5m to grant Granted Mar 10, 2026
18/469,453
Patent 12573151
PERSONALIZED DEFORMABLE MESH BY FINETUNING ON PERSONALIZED TEXTURE
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
80%
Grant Probability
98%
With Interview (+18.0%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 520 resolved cases by this examiner. Grant probability derived from career allow rate.
Video System with Scene-Based Object Insertion Feature

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email