Last updated: April 19, 2026
Application No. 18/483,067
DETECTING AND REMOVING GRAPHICAL OVERLAY ELEMENTS USING DEEP NEURAL NETWORKS

Final Rejection §103
Filed
Oct 09, 2023
Examiner
BASHIR, ADEEL
Art Unit
2616
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
2 (Final)
Interview Optional

— +7.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 35 resolved cases, 2023–2026
Examiner Intelligence

BASHIR, ADEEL View full profile →
Grants 94% — above average
Career Allow Rate
33 granted / 35 resolved
+32.3% vs TC avg
Moderate +7% lift
Without
With
+7.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
14 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
5.0%
-35.0% vs TC avg
§103
85.0%
+45.0% vs TC avg
§102
8.3%
-31.7% vs TC avg
§112
0.8%
-39.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 35 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

DETAILED ACTION
Priority
No foreign or domestic priority is claimed. The effective filing date of U.S. Application No. 18/483,067 is 10/09/2023.


Status of Claims
Claims 1-20 are rejected.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

(Please see the cited paragraphs, sections, pages, or surrounding text in the references for the paraphrased content.)

Claims 1, 3, 9, 13, 14, 15, 16, 17, 18, 19, 20 are rejected under 35 U.S.C. § 103 as being unpatentable over Patel et al. (NPL) in view of Harron et al. (US20200104625A1).
As per Claim 1, Patel teaches the following portion of Claim 1, which recites:
“A computer-implemented method, comprising: identifying, by using a first neural network, one or more regions in a first frame of a video sequence, the one or more regions, including one or more overlay elements;”
Patel et al. (NPL) teaches a neural-network branch that produces a mask (i.e., identifies a region corresponding to the overlay/caption):
“We use an encoder-decoder based CNN model to generate the inpainted frames and the caption detection masks. The network has two branches each for the image generation and the mask generation tasks …” — Patel et al. (NPL), Section 2, p.2.
“The decoder tries to generate … a mask containing ones at the pixels that are captions and zeros everywhere else.” — Patel et al. (NPL), Section 2, p.4.
Patel alone does not explicitly teach all the limitation(s) of the claim. However, when combined with Harron, they collectively teach all the limitation(s).
Harron et al. supports treating these caption regions as “overlay elements” detected by a neural-network overlay detector:
“creating a probability map assigning a probability … where the assigned probability corresponds to a likelihood that the … feature is an overlay” — Harron et al., ¶[0003].
Patel’s mask-generation branch identifies pixel regions for captions (a type of overlay). Harron confirms neural-network identification of overlays as “overlay” features/regions.

Patel teaches the following portion of Claim 1, which recites:
“generating a second frame using a second neural network that receives the first frame as an input, the second frame including generated content in place of the one or more overlay elements;”
Patel et al. (NPL) teaches a second neural-network branch that takes the frame as input and generates inpainted output content:
“We use an encoder-decoder based CNN model to generate the inpainted frames …” — Patel et al. (NPL), Section 2, p.2.
“The input to the Generator module is 128 × 128 × 3 sized tensor corresponding to the frame of the input video to be inpainted.” — Patel et al. (NPL), Section 2.1, p.4.
“The detected caption mask allows us to copy the non-masked region pixels from the original frame and masked pixels from the inpainted frame.” — Patel et al. (NPL), Section 2, p.3.
Patel’s image-generation branch receives the frame and produces an inpainted frame whose pixels replace the masked overlay region (generated/inpainted content in place of the overlay).
Patel teaches the following portion of Claim 1, which recites:
“and providing, for display, the second frame.”
Patel et al. (NPL) is a video decaptioning pipeline that outputs processed frames as the restored video result:
“Inputs to the network are frames from the captioned videos … [and] generate the inpainted frames …” — Patel et al. (NPL), Section 2, p.2–3.
“The network can simultaneously do frame level caption detection and inpainting.” — Patel et al. (NPL), Conclusion, p.9.
The method produces inpainted output frames for use as the resulting video frames (i.e., suitable to provide for display).
Before the effective filing date of the claimed invention, a person of ordinary skill in the art would have been motivated to apply Harron et al.’s neural-network overlay framing to Patel et al.’s joint mask-generation + inpainting pipeline, because doing so predictably broadens the caption-mask concept to general “overlay elements” while preserving the same detect→mask→inpaint workflow. This yields expected improvements in applicability (not limited to captions), while maintaining known benefits of neural-network masking and inpainting for producing cleaned frames without manual per-frame labeling, with predictable results.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

As per Claim 3, Patel teaches Claim 3 which recites:
“The computer-implemented method of claim 1, wherein the second neural network includes a trained inpainting deep neural network.”
Patel et al. discloses an inpainting CNN that is trained and used to generate inpainted frames:
“We use an encoder-decoder based CNN model to generate the inpainted frames …” — Patel et al. (NPL), Section 2, p.2.

“We train our network …” — Patel et al. (NPL), Section 2.2 (“Training”), p.4.

“Generator Module … is an encoder-decoder based CNN model … to generate inpainted images …” — Patel et al. (NPL), Section 2.1, p.3–4.
Patel’s image-generation branch is a deep CNN that is trained and produces inpainted output frames, matching the “second neural network includes a trained inpainting deep neural network” limitation.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

Processor Claim 9 does not include any additional limitations that would significantly distinguish it from claim 1. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above. 

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

Processor Claim 13 does not include any additional limitations that would significantly distinguish it from claim 3. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above. 


    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

As per Claim 14, Patel alone does not explicitly teach all the limitation(s) of the claim. However, when combined with Harron, they collectively teach all the limitation(s).
Harron teaches the limitation(s) of Claim 14 that recites:
"The processor of claim 9, wherein the processor is comprised in at least one of: [...] a system for performing deep learning operations; [...]"
Harron et al. explicitly disclose a processor designed for systems employing deep learning operations:
"The feature map network 402 and probability map network 404 in the example of FIG. 4 are neural networks implemented on one or more computing systems that include one or more processors [...] to implement the methods and systems described herein." — Harron et al., ¶ [0040].
Harron et al., ¶ [0019], further specify that their system employs neural networks explicitly, including "recurrent deep neural networks utilizing a Long Short-Term Memory (LSTM) architecture" to detect overlays.
These disclosures clearly satisfy the limitation regarding "a system for performing deep learning operations" explicitly claimed in Claim 14. Harron et al.’s computing system explicitly employs neural network architectures, including CNN and LSTM-based recurrent neural networks, thereby providing a robust and explicit basis for this rejection.
The rationale and motivation to combine the references as set forth for claim 1 are incorporated herein by reference for the present claim.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

System Claim 15 does not include any additional limitations that would significantly distinguish it from claim 1. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above. 

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

As per Claim 16, Patel alone do not explicitly teach all the limitation(s) of the claim. However, when combined with Harron, they collectively teach all the limitation(s).
Harron teaches the limitation(s) of Claim 16 that recites, among other alternatives:
"The system of claim 15, wherein a first neural network identifies the overlay elements using one or more feature detection models."
Harron et al. (US20200104625A1) explicitly disclose a neural-network-based system for identifying overlay elements through feature extraction and detection techniques:
"The feature map network 402 can be a convolutional neural network (CNN) that extracts, from an image, a grid of vectors, each representing features of a particular portion of the image." — Harron et al., ¶ [0050].
"The probability map network 404 analyzes the grid of vectors generated by the feature map network 402 and assigns a probability to each feature in the image. The assigned probability can indicate a likelihood that the feature is an overlay." — Harron et al., ¶ [0044].
These disclosures clearly describe a first neural network (feature map CNN) explicitly identifying overlay elements through the use of feature detection models, directly corresponding to the limitation of Claim 16.
Patel et al. (NPL) explicitly disclose using CNN-based feature detection models for detecting overlay elements (captions):
"We use an encoder-decoder based CNN model to generate the inpainted frames and the caption detection masks simultaneously." — Patel et al., Section 2 ("Proposed Method"), Page 2.
Patel et al. reinforce the well-known practice of using CNN-based feature detection models for identifying graphical overlays.
The rationale and motivation to combine the references as set forth for claim 1 are incorporated herein by reference for the present claim.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

As per Claim 17, Patel alone does not explicitly teach all the limitation(s) of the claim. However, when combined with Harron, they collectively teach all the limitation(s).
Harron teaches the limitation(s) of Claim 17 that recites:
"The system of claim 16, wherein the one or more processing units are further to generate a mask for the input frame corresponding to the one or more regions."
Harron et al. (US20200104625A1) explicitly disclose generating a mask for an input frame corresponding to regions identified as overlays:
"The masking component 602 filters the probability map 506 to create an overlay mask 702, which can be, for example, a binary mask." — Harron et al., ¶ [0047].
This disclosure clearly aligns with the claim limitation, demonstrating explicitly the step of generating a mask corresponding to regions identified as overlays by the neural network system described in Harron et al.
Claim 17 introduces the generation of a mask based on regions identified by the neural network. Harron et al. explicitly disclose exactly this step—a masking component generating a binary mask corresponding precisely to regions identified by the overlay detection neural networks. Therefore, Harron et al. alone explicitly teach this limitation, rendering the claim obvious. Additionally, because the generation of a binary mask from overlay detection probability maps is an inherent and necessary result of the disclosed overlay-detection pipeline, this represents a predictable and routine practice in the art, further confirming the obviousness of the claimed system.
The rationale and motivation to combine the references as set forth for claim 1 are incorporated herein by reference for the present claim.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

System Claim 18 does not include any additional limitations that would significantly distinguish it from method claim 1. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above. 

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

System Claim 19 does not include any additional limitations that would significantly distinguish it from method claim 3. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above. 

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

As per Claim 20, Patel alone does not explicitly teach all the limitation(s) of the claim. However, when combined with Harron, they collectively teach all the limitation(s).
AHarron teaches the limitation(s) of Claim 20 that recites, among other alternatives:
"The system of claim 15, wherein the system is one of:… a system for performing deep learning operations; …"
Harron et al. explicitly disclose a system that employs deep learning operations:
"The feature map network 402 and probability map network 404 in the example of FIG. 4 are neural networks implemented on one or more computing systems..." — Harron et al., ¶ [0040].
Harron et al., ¶ [0019], also specify explicitly that their disclosed system leverages "machine learning principles include one or more aspects relating to neural networks, such as recurrent deep neural networks utilizing a Long Short-Term Memory (LSTM) architecture".
This explicit disclosure robustly aligns with Claim 20’s limitation of "a system for performing deep learning operations," as Harron et al. explicitly describe their system as employing deep neural network architectures for processing images to detect overlays.
The rationale and motivation to combine the references as set forth for claim 1 are incorporated herein by reference for the present claim.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale



Claims 2, 4 are rejected under 35 U.S.C. § 103 as being unpatentable over Patel et al. (NPL) in view of Harron et al. (US20200104625A1), further in view of Oxholm, and still further in view of Wei (NPL).
As per Claim 2, Patel alone does not explicitly teach all the limitation(s) of the claim. However, when combined with Oxholm and Wei (NPL), they collectively teach all the limitation(s).
As per Claim 2, Wei and Oxholm teach the limitation(s) of Claim 2 that recites: “The computer-implemented method of claim 1, further comprising:
receiving a mask for the video sequence, the mask including positions for the one or more overlay elements.”
Wei discloses:
"In the image watermark removal technique, we need to use the binarization mask of the watermark as the ground truth, which is similar to a watermark segmentation problem requiring very fine pixel-level manual extraction" — Wei (NPL), Section 3 ("Data"), Page 2.
This disclosure precisely aligns with the claim limitation, explicitly disclosing the use of a "mask" that directly indicates positions of overlay elements (watermark pixels).
Oxholm et al. further support this limitation, disclosing annotations as positional markers:
"[A] computing system accesses a set of video frames with annotations identifying a target region to be modified" — Oxholm et al., Abstract.
"[A] set of video frames that includes a first frame and a second frame having respective annotations identifying a target region to be modified" — Oxholm et al., ¶ [0004].
The annotated regions in Oxholm et al. inherently imply positional marking of targeted overlay elements, rendering them obvious equivalents to the claimed mask structure.
Combining Wei’s explicit disclosure of masks marking watermark pixels with Oxholm et al.’s annotations for region identification would thus be an obvious improvement, predictable to a person of ordinary skill, resulting in enhanced accuracy and convenience in identifying and processing overlay elements.
Before the effective filing date of the claimed invention, a person of ordinary skill in the art would have been motivated to combine Patel’s video decaptioning pipeline with Oxholm’s use of user-provided region indications (“annotations identifying a target region to be modified”) and Wei’s teaching of using a watermark “binarization mask” because supplying an explicit mask/annotation to mark overlay positions is a straightforward, well-known way to guide and improve overlay removal and inpainting, yielding predictable benefits such as more accurate localization of the overlay region and more reliable replacement of those pixels without changing the overall workflow.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

As per Claim 4, Patel alone does not explicitly teach all the limitation(s) of the claim. However, when combined with Oxholm and Wei (NPL), they collectively teach all the limitation(s).
As per Claim 4, Wei and Oxholm teaches the limitation(s) of Claim 4 that recites:
"The computer-implemented method of claim 1, further comprising:predicting, based on one or more features of the video sequence, a static overlay element position."
Wei describes the general watermark removal scenario broadly:
"Watermarks can be overlaid anywhere on a background image of varying size, shape, color and transparency. Furthermore, watermarks often contain complex patterns... the structure, position and size of these watermarks vary from image to image." — Wei (NPL), Section 1 ("Introduction"), Page 1.
Wei further describes generating synthetic data sets clearly using known positional data (ground truth):
"In the experiment, we synthesize watermark images by embedding visual watermarks into background images, where the visual watermarks are simplified to be random character strings with different opacities." — Wei (NPL), Section 3 ("Data"), Page 2.
The creation of synthetic watermark images involves inherently known or fixed positional information as training ground truth data. Wei’s network, once trained on data with known positions, effectively learns to detect and localize watermarks in subsequent unseen inputs, inherently performing a form of positional prediction. Although Wei addresses general scenarios where watermarks may vary widely, the trained neural network must inherently analyze features from incoming images (or video frames) to localize (predict positions of) watermarks, whether static or dynamic.
Oxholm et al. disclose the broad use of annotations to identify and replace regions in video frames (Oxholm et al., Abstract, ¶ [0004]), but they do not clearly teach the specific limitation of predicting positions based on features of the video sequence.
However, integrating Wei’s robust localization approach into Oxholm et al.’s method to predict the location of static overlays would be a clear and logical step for one skilled in the art, particularly in scenarios where static overlays (like certain watermarks, logos, or timestamps) frequently appear in fixed locations across frames. A skilled artisan, aiming for reliable and consistent watermark detection and removal in video sequences, would readily apply Wei's trained model approach—developed to predict watermark locations based on image features—to improve Oxholm et al.’s video inpainting system. Such a combination would yield predictably enhanced results, improved robustness, and increased reliability in detecting and removing recurring, static overlays.
The rationale and motivation to combine the references as set forth for claim 2 are incorporated herein by reference for the present claim.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale



Claim 5 is rejected under 35 U.S.C. § 103 as being unpatentable over Patel in view of Harron, and further in view of Wang et al. (NPL) ("Wang et al.").
As per Claim 5, Patel and Harron alone do not explicitly teach all the limitation(s) of the claim. However, when combined with Wang et al. (NPL), they collectively teach all the limitation(s).
As per Claim 5, Wang teaches the limitation(s) of Claim 5 that recites:
" The computer-implemented method of claim 1, where the video sequence is for a video game, further comprising:
receiving the first frame in-real time during execution of the video game, wherein generating the second frame and providing the second frame are performed within a period of time corresponding to a display setting for the video game."
Wang et al. clearly teach real-time video inpainting explicitly optimized for low-latency applications:
"We propose a real-time Video Inpainting method to solve this problem with minimal hardware needs," specifically noting the algorithm is "capable of running at 23 frames per second (FPS) on a laptop computer", thus clearly suitable for real-time interactive applications such as video gaming — Wang et al. (NPL), Abstract, Page 1.
Wang et al. further disclose, "To support real-time applications, we need a lightweight optical flow estimation network. FastFlowNet balances well between accuracy and efficiency," reinforcing suitability for demanding real-time interactive environments (e.g., video games) — Wang et al. (NPL), Section III-B ("Method"), Page 4.
Before the effective filing date of the claimed invention, a person of ordinary skill in the art would have been motivated to combine Patel’s CNN-based caption/overlay detection and inpainting pipeline with Wang et al.’s real-time, low-latency video inpainting implementation because doing so predictably enables the same overlay-removal/inpainting workflow to meet interactive frame-time constraints (e.g., a game display budget), yielding expected improvements in throughput and responsiveness (e.g., real-time operation at practical FPS) without changing the underlying inpainting function.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale



Claims 6 and 12 are rejected under 35 U.S.C. § 103 as being unpatentable over Paten in view of Harron and further in view of Srinivasan et al. (NPL).
As per Claim 6, Patel and Harron alone do not explicitly teach all the limitation(s) of the claim. However, when combined with Srinivasan et al. (NPL), they collectively teach all the limitation(s).
Srinivasan et al. (NPL) teaches the following portion of Claim 6, which recites:
"The computer-implemented method of claim 1, further comprising:
decreasing a resolution of the first frame, prior to generating the second frame;"
Srinivasan et al. clearly disclose downsampling input frames before inpainting for computational efficiency:
"The key idea is to first learn and apply a spatial-temporal inpainting network (STA-Net) on the downsampled low resolution videos." — Srinivasan et al. (NPL), Section 1 ("Introduction"), Page 1.
This citation explicitly and accurately describes downsampling input frames prior to inpainting, directly aligning with limitation.
Srinivasan et al. (NPL) teaches the following portion of Claim 6, which recites:
"increasing the resolution of the second frame after generating the second frame, the second frame being initially generated at a lower resolution."
Srinivasan et al. further explicitly disclose upsampling after inpainting frames initially generated at lower resolution:
"Then, we refine the low resolution results by aggregating the learned spatial and temporal image residuals (i.e., high frequency details) to the upsampled inpainted frames." — Srinivasan et al. (NPL), Section 1 ("Introduction"), Page 1.
This quote confirms that inpainted results, initially at a lower resolution, are subsequently increased (upsampled), clearly aligning with the limitation.
Before the effective filing date of the claimed invention, a person of ordinary skill in the art would have been motivated to combine Patel’s CNN-based overlay/caption detection and inpainting pipeline with Srinivasan et al.’s downsample→inpaint→upsample framework because it predictably reduces memory/compute while preserving output quality, enabling faster processing of higher-resolution frames using the same inpainting approach, with expected improvements in efficiency and throughput.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

Processor Claim 12 does not include any additional limitations that would significantly distinguish it from claim 6. Therefore, it is likewise rejected under 35 U.S.C. § 103 in view of the same references and for the same reasons set forth above. 

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale



Claim 7 is rejected under 35 U.S.C. § 103 as being unpatentable over Patel in view of Harron and further in view of Houlberg (US20040227819A1).
As per Claim 7, Patel and Harron alone do not explicitly teach all the limitation(s) of the claim. However, when combined with Houlberg, they collectively teach all the limitation(s).
Houlberg teaches the limitation(s) of Claim 7 that recites:
"The computer-implemented method of claim 1, further comprising:determining an activation status associated with a system to generate the second frame is active."
Houlberg clearly teaches determining system activation status through explicit hardware-based status indicators:
"Solid Green indicates that the Auto Focus and Zoom Controller 20 is functioning normally. The configuration file (config.txt) and the track file have been detected and successfully read, a USB Game Pad or Switch Pad or Remote Switch was detected and all Lens Systems and the Focus Tables are initialized and functioning as specified." — Houlberg, ¶ [0042].
This disclosure directly aligns with the claim’s requirement of "determining an activation status," clearly establishing a reliable, hardware-based indicator confirming that the system components required for generating and providing frames or overlays are active and operational.
Before the effective filing date of the claimed invention, a person of ordinary skill in the art would have been motivated to combine Patel’s frame-level overlay detection/inpainting pipeline with Houlberg’s system activation-status indication (e.g., “functioning normally” status) to provide a predictable reliability/user-control enhancement—i.e., checking that the processing system is active/ready before generating and outputting the processed frame—without altering the underlying inpainting functionality.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale



Claim 8 is rejected under 35 U.S.C. § 103 as being unpatentable over Patel in view of Harron and further in view of Beyabani et al. (US20090328085A1).

As per Claim 8, Patel and Harron alone do not explicitly teach all the limitation(s) of the claim. However, when combined with Beyabani et al., they collectively teach all the limitation(s).
Beyabani teaches the limitation(s) of Claim 8 that recites:
"The computer-implemented method of claim 1, further comprising:determining an application associated with the first frame is on an allow list."
Beyabani et al. clearly teach using a whitelist (allow list) to selectively determine whether specific programming content should be processed or excluded from interruptions:
"Preferred content list 480 may act as a so-called white-list of programming content that should not be interrupted with the overlay." — Beyabani et al., ¶ [0044].
This disclosure clearly aligns with the claimed limitation regarding determining whether content (reasonably interpreted to encompass applications or associated programming content) is on an allow list. Although Beyabani specifically references "programming content," a skilled artisan would readily understand this to extend naturally to software applications delivering video frames or associated content. Hence, determining whether an application or associated programming content is on an allow list is an obvious extension of Beyabani’s clearly disclosed preferred content whitelist.
Before the effective filing date of the claimed invention, a person of ordinary skill in the art would have been motivated to combine Patel’s overlay/caption detection and inpainting pipeline with Beyabani et al.’s allow-list (“white-list”) control (“Preferred content list 480 … white-list … should not be interrupted” — Beyabani et al., ¶[0044]) to achieve predictable policy-based gating - i.e., selectively enabling or bypassing overlay removal for frames associated with approved applications/content—yielding expected improvements in user control and system flexibility without changing the underlying inpainting technique.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale



Claims 10, 11 are rejected under 35 U.S.C. § 103 as being unpatentable over Patel in view of Harron, further in view of Beyabani et al. (US20090328085A1), and still further in view of Li et al. (US20030043172A1).
As per Claim 10, Patel alone do not explicitly teach all the limitation(s) of the claim. However, when combined with Beyabani and Li, they collectively teach all the limitation(s) of Claim 10.
“determine an application associated with the frame”: Beyabani teaches a content-based whitelist policy (“Preferred content list 480 … white-list of programming content …”) — Beyabani et al., ¶[0044].

“receive a mask corresponding to the one or more regions”: Patel teaches use of caption detection masks with frames — Patel et al. (NPL), Section 2, p.2–3.

“wherein the mask is pre-determined for the application”: Li teaches that when an overlay is known beforehand, “a template can be furnished in advance” and may provide location/size — Li et al., ¶[0068] (see also ¶[0069]).

Before the effective filing date of the claimed invention, a person of ordinary skill in the art would have been motivated to combine Patel’s mask-based overlay detection and inpainting workflow with Li’s use of pre-supplied overlay templates when an overlay is known in advance and Beyabani’s allow-list (“white-list”) content control to achieve predictable efficiency and policy-based control - i.e., selectively applying overlay removal using pre-determined masks for approved applications - without altering the underlying inpainting technique.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale

As per Claim 11, Patel and Harron alone do not explicitly teach all the limitation(s) of the claim. However, when combined with Li, they collectively teach all the limitation(s).
As per Claim 11, Harron and Li teach the limitation(s) of Claim 11 that recites:"The processor of claim 10, wherein the mask is pre-determined for the application."
Li et al. (US20030043172A1) disclose explicitly using a pre-determined mask ("template") when graphical overlays are known beforehand:
"If the graphical overlay is known, a priori, then a template can be furnished in advance." — Li et al., ¶ [0068].
"This template can also carry size and location information." — Li et al., ¶ [0016].
Li et al. further explicitly disclose that some overlays remain static and fixed in a known position:
"Some graphical overlays may remain in a single location, such as, for example, a broadcaster logo in the bottom right corner of the video frames." — Li et al., ¶ [0039].
These explicit disclosures precisely demonstrate a mask that is pre-determined for a specific, known application, directly aligning with the limitation requiring a "mask pre-determined for the application."
Harron et al. (US20200104625A1) disclose a neural-network-based processor generating masks for overlay detection and removal:
"The masking component 602 filters the probability map to produce an overlay mask 702, which can be, for example, a binary mask." — Harron et al., ¶ [0047].
This supports the underlying claimed processor and neural network-based mask generation system described in claim 10.
The rationale and motivation to combine the references as set forth for claim 10 are incorporated herein by reference for the present claim.

    PNG
    media_image1.png
    9
    307
    media_image1.png
    Greyscale



Response to Arguments
Applicant’s arguments filed 10/24/2025 have been fully considered but are not persuasive.

Applicant’s arguments are now moot because new prior art has been applied to address the claim amendments"


Conclusion
The prior art made of record and relied upon in this action is as follows:
Patent Literature:
Oxholm et al. (US20200118594A1)
Houlberg (US20040227819A1)
Beyabani et al. (US20090328085A1)
Li et al. (US20030043172A1)
Harron et al. (US20200104625A1)
Non-Patent Literature (NPL):
Wang et al., "Real-Time Video Inpainting for RGB-D Pipeline Reconstruction", (1 Oct 2023). Available at: [https://ieeexplore.ieee.org/abstract/document/10341971]
Wei, "Visual Watermark Removal Based on Deep Learning", (7 Feb 2023). Available at: [https://arxiv.org/pdf/2302.11338]
Srinivasan et al., "Spatial-Temporal Residual Aggregation for High Resolution Video Inpainting", (2021). Available at: [https://arxiv.org/pdf/2111.03574]
Patel & Pandey, "Joint Caption Detection and Inpainting Using Generative Network", (17 Oct 2019, Springer 2019). Available at: [https://www.researchgate.net/profile/Anubha-Pandey/publication/336581194_Joint_Caption_Detection_and_Inpainting_Using_Generative_Network/links/5f8c8def458515b7cf8b35c4/Joint-Caption-Detection-and-Inpainting-Using-Generative-Network.pdf]
Note: A PDF copy of each NPL reference is attached with this Office Action. URLs are included for applicant convenience. If a link becomes unavailable in the future, the citation information may be used to locate the reference or access archived versions via the Wayback Machine.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure and is listed as follows:
Patent Literature:
Non-Patent Literature (NPL):
Kim et al., "Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence", (8 May 2019, CVPR 2019). Available at: [https://arxiv.org/pdf/1905.02949]

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).

A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ADEEL BASHIR whose telephone number is (571) 270-0440. The examiner can normally be reached Monday-Thursday.


Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Hajnik can be reached on (571) 276-7642. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/ADEEL BASHIR/
Examiner, Art Unit 2616


/DANIEL F HAJNIK/Supervisory Patent Examiner, Art Unit 2616
Read full office action
Prosecution Timeline

Oct 09, 2023
Application Filed
Jun 18, 2025
Non-Final Rejection — §103
Sep 04, 2025
Interview Requested
Sep 10, 2025
Examiner Interview Summary
Sep 10, 2025
Applicant Interview (Telephonic)
Oct 24, 2025
Response Filed
Dec 31, 2025
Final Rejection — §103
Apr 06, 2026
Request for Continued Examination
Apr 07, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/345,387
Patent 12597209
USING POLYGON MESH RENDER COMPOSITES DURING NEURAL RADIANCE FIELD (NERF) GENERATION
2y 5m to grant Granted Apr 07, 2026
18/527,263
Patent 12586333
AUTOMATED METHOD FOR GENERATING PROSTHESIS FROM THREE DIMENSIONAL SCAN DATA, APPARATUS GENERATING PROSTHESIS FROM THREE DIMENSIONAL SCAN DATA AND COMPUTER READABLE MEDIUM HAVING PROGRAM FOR PERFORMING THE METHOD
2y 5m to grant Granted Mar 24, 2026
18/605,513
Patent 12586302
RENDERING HAIR
2y 5m to grant Granted Mar 24, 2026
18/332,584
Patent 12573126
SPLIT BOUNDING VOLUMES FOR INSTANCES
2y 5m to grant Granted Mar 10, 2026
18/349,346
Patent 12555280
VECTOR GRAPHICS BASED LIVE SKETCHING METHODS AND SYSTEMS
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
94%
Grant Probability
99%
With Interview (+7.4%)
2y 6m
Median Time to Grant
Moderate
PTA Risk
Based on 35 resolved cases by this examiner. Grant probability derived from career allow rate.