Last updated: April 19, 2026

Application No. 18/217,497

SUBTITLE POSITIONING BASED ON SALIENCY DETECTION

Non-Final OA §101§103

Filed

Jun 30, 2023

Examiner

CHEN, HUO LONG

Art Unit

2682

Tech Center

2600 — Communications

Assignee

Intel Corporation

OA Round

1 (Non-Final)

This examiner grants 53% of cases after interview

— +30.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 590 resolved cases, 2023–2026

Examiner Intelligence

CHEN, HUO LONG View full profile →

Grants 53% of resolved cases

Career Allow Rate

314 granted / 590 resolved

-8.8% vs TC avg

Strong +30% interview lift

Without

With

+30.3%

Interview Lift

resolved cases with interview

Typical timeline

3y 2m

Avg Prosecution

37 currently pending

Career history

627

Total Applications

across all art units

Statute-Specific Performance

§101

11.3%

-28.7% vs TC avg

§103

64.3%

+24.3% vs TC avg

§102

12.5%

-27.5% vs TC avg

§112

8.1%

-31.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 590 resolved cases

Office Action

§101 §103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a Judicial Exception in the form of an Abstract Idea, without significantly more:
Beginning with independent claim 1, a process claim, which recites:
     At least one non-transitory computer-readable medium comprising instructions store thereon, that if executed by one or more processors, cause the one or more processors to: for a frame of a video: identify one or more bounding regions in the frame that correspond to regions of interest; select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions, wherein the text bounding region in the frame is associated with text; and cause the text to be displayed in the text bounding region corresponding to the selected location.
	The claim recites abstract ideas:   
A process that encompass a human performing the steps mentally with or without a physical aid in the form of the “causing the text to display” step, with the “identifying” step and “selecting” step being pre-solution acts of processing information which could be performed visually and/or mentally; and 
A method of organizing human behavior in the form of a social activity of following rules or instructions informing a person to perform the “identifying” step, “selecting” step and the “causing the text to display” step.
     Independent claim 11, a process claim, which recites:
     An apparatus comprising: at least one processor and at least one memory comprising stored thereon, that if executed by the at least one processor, cause the at least one processor to: for a frame of a video file: identify one or more bounding regions in the frame that correspond to regions of interest; select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions, wherein the text bounding region in the frame is associated with text; and cause the text to be displayed in the text bounding region corresponding to the selected location.
“An apparatus, comprising a processor, a memory, and a computer program that is stored in the memory and capable of being executed the processor to perform the “identifying” step, “selecting” step and the “causing the text to display” step is considered being performed by a generic computer. In addition, the limitation does it does not provide any details about how the “identifying” step, “selecting” step and the “causing the text to display” step are performed. Therefore, If the apparatus, processor and memory are removed from the claim, the method can be easily performed by a human being without the need of any of a computer component. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. 
A process that encompass a human performing the steps mentally with or without a physical aid in the form of the “causing the text to display” step, with the “identifying” step and “selecting” step being pre-solution acts of processing information which could be performed visually and/or mentally; and 
A method of organizing human behavior in the form of a social activity of following rules or instructions informing a person to perform the “identifying” step, “selecting” step and the “causing the text to display” step.
     Independent claim 17, a process claim, which recites:
     A method comprising: for frames of a video file: determining a location of subtitle text by: based on a user-input configuration specifying to select a location of display of the subtitle text to avoid images of interest: determining one or more regions of interest in the frame and Docket No.: AE8520-US 21 selecting a location of the subtitle text to avoid overlapping a bounding box surrounding the subtitle text with the one or more regions of interest.
A process that encompass a human performing the steps mentally with or without a physical aid in the form of the “selecting” step, with the “determining” steps being pre-solution acts of processing information which could be performed visually and/or mentally; and 
A method of organizing human behavior in the form of a social activity of following rules or instructions informing a person to perform the “determining” step, “selecting” step and the “selecting” step.

These two abstract ideas will be considered together for analysis as a single abstract idea per MPEP 2106:

    PNG
    media_image1.png
    468
    1527
    media_image1.png
    Greyscale

This judicial exception is not integrated into a practical application because there are no recited additional elements that amount to a practical application, such as but no limited to the following as noted in MPEP 2106: 

    PNG
    media_image2.png
    453
    1451
    media_image2.png
    Greyscale


The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception for the same reason:  There are not additional elements other than the abstract idea.  
	Independent claims 1, 11 and 17 are merely a generic computer implementation of the abstract ideas and likewise do not amount to significantly more.    See MPEP 2106:

    PNG
    media_image3.png
    249
    1434
    media_image3.png
    Greyscale
	Likewise, the following dependent claims have been analyzed and do not recite elements that recite a practical application or significantly more and remain rejected under 35 USC 101:  Claims 2-10, 12-16 and 18-20.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5, 6, 8-11 and 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hu’032 (US 2008/0260032), and further in view of Lenzi’366 (US 2003/0169366).
     With respect to claim 11, Hu’032 teaches an apparatus comprising: 
     at least one processor (Fig.1, item 11) and 
     at least one memory (Fig.1, item 12) comprising stored thereon, that if executed by the at least one processor, cause the at least one processor to: 
     for a frame of a video file [regarding to the received video frames in step 301 shown in Fig.3]: 
     identify one or more bounding regions in the frame that correspond to regions of interest (Fig.3, step 302); 
     wherein the text bounding region in the frame is associated with text (Fig.3, step 302); and 
     Hu’032 does not teach select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions and cause the text to be displayed in the text bounding region corresponding to the selected location.
     Lenzi’366 teaches cause the text to be displayed in the text bounding region corresponding to the selected location (Fig.5).
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Hu’032 according to the teaching of Lenzi’366 to display the video with the subtitle on the desired location because this allow the subtitle to be shown on a video more effectively.
     The combination of Hu’032 and Lenzi’366 does not teach select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions.
     Since Hu’032 teaches detecting the text boxes from the video frames (Fig.3) and Lenzi’366 has suggest displaying the caption not overlaying other text (Fig.5), therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to recognize to detect the text boxes from the video and arrange subtitle to a select text box such that the subtitle would not overlap with other texts on the video (select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions) because this will allow the this allow the subtitle and other information to be shown on a video more effectively.
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Hu’032 and Lenzi’366 to detect the text boxes from the video and arrange subtitle to a select text box such that the subtitle would not overlap with other texts on the video (select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions) because this will allow the this allow the subtitle and other information to be shown on a video more effectively.
     With respect to claim 13, which further limits claim 11, Hu’032 teaches wherein the regions of interest comprise one or more of: a largest image in a frame, a moving image, a centered image, text (Fig.3), or an image of a human.  
     With respect to claim 14, which further limits claim 11, Hu’032 teaches wherein the text to be displayed in the region comprises subtitles or closed captioning (CC) text (paragraphs 14 and 20).  
     With respect to claim 15, which further limits claim 11, Hu’032 does not teach wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions is based on locations of text bounding regions in multiple frames.  
     Lenzi’366 teaches wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions is based on locations of text bounding regions in multiple frames (Fig.5).  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Hu’032 according to the teaching of Lenzi’366 to display the video with the subtitle on the desired location because this allow the subtitle to be shown on a video more effectively.
     With respect to claim 16, which further limits claim 11, Hu’032 does not teach wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions is based on a per-pixel raster order scan of a frame.
     Lenzi’366 teaches wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions is based on a per-pixel raster order scan of a frame (Fig.5).  
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Hu’032 according to the teaching of Lenzi’366 to display the video with the subtitle on the desired location because this allow the subtitle to be shown on a video more effectively.
     With respect to claims 1-3, 5, 6 and 8, they are a method claims that claim how the apparatus of claims 11-16 to arrange text location on video frames.  Claims 1-3, 5, 6 and 8 are rejected for the same manner as described in the rejected claims 11-16.
     With respect to claims 17-19, they are non-transitory computer-readable medium claims that claim how the apparatus of claims 11, 13 and 16 to arrange text location on video frames.  Claims 17-19 are rejected for the same manner as described in the rejected claims 11, 13 and 16. Lenzi’366 further teaches based on a user-input configuration specifying to select a location of display of the subtitle text to avoid images of interest (paragraph 17).
     With respect to claim 4, which further limits claim 1, Hu’032 teaches wherein the regions of interest exclude a solid colored region [as shown in Fig.3, only the text boxes from the video frames in step 302 is being detected].
     With respect to claim 9, which further limit claim 1, the combination of Hu’032 and Lenzi’366 does not teach wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions comprises reduce a size of the bounding region in the frame until identifying a location of a text bounding region in the frame that does not overlap with the one or more bounding regions.  
     Since Hu’032 teaches detecting the text boxes from the video frames (Fig.3) and Lenzi’366 has suggest displaying the caption not overlaying other text (Fig.5), therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to recognize to detect the text boxes from the video and arrange subtitle to a select text box such that the subtitle would not overlap with other texts on the video by reducing the size of the text boxes (wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions comprises reduce a size of the bounding region in the frame until identifying a location of a text bounding region in the frame that does not overlap with the one or more bounding regions) because this will allow the this allow the subtitle and other information to be shown on a video more effectively.
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Hu’032 and Lenzi’366 to detect the text boxes from the video and arrange subtitle to a select text box such that the subtitle would not overlap with other texts on the video by reducing the size of the text boxes (wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions comprises reduce a size of the bounding region in the frame until identifying a location of a text bounding region in the frame that does not overlap with the one or more bounding regions) because this will allow the this allow the subtitle and other information to be shown on a video more effectively.
     With respect to claim 10, which further limits claim 1, Hu’032 teaches wherein the frame of video comprises one or more of: text (Fig.3, step 302), audio, graphics, video, holographic images or video, or audio.
     With respect to claim 20, which further limits claim 17, the combination of Hu’032 and Lenzi’366 does not teaches wherein the selecting a location of the subtitle text to avoid overlapping a bounding box surrounding the subtitle text with the one or more regions of interest comprises reducing a size of the bounding box in the frame until identifying a location of the bounding box in the frame that does not overlap with the one or more regions of interest. 
     Since Hu’032 teaches detecting the text boxes from the video frames (Fig.3) and Lenzi’366 has suggest displaying the caption not overlaying other text (Fig.5), therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to recognize to detect the text boxes from the video and arrange subtitle to a select text box such that the subtitle would not overlap with other texts on the video by reducing the size of the text boxes (wherein the selecting a location of the subtitle text to avoid overlapping a bounding box surrounding the subtitle text with the one or more regions of interest comprises reducing a size of the bounding box in the frame until identifying a location of the bounding box in the frame that does not overlap with the one or more regions of interest) because this will allow the this allow the subtitle and other information to be shown on a video more effectively.
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Hu’032 and Lenzi’366 to detect the text boxes from the video and arrange subtitle to a select text box such that the subtitle would not overlap with other texts on the video by reducing the size of the text boxes (wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions comprises reduce a size of the bounding region in the frame until identifying a location of a text bounding region in the frame that does not overlap with the one or more bounding regions) because this will allow the this allow the subtitle and other information to be shown on a video more effectively.
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Hu’032 (US 2008/0260032), Lenzi’366 (US 2003/0169366) and further in view of JP 3439105.
     With respect to claim 7, which further limits claim 6, Hu’032 does not teach wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions is based on locations of text bounding regions in multiple frames of the video and reduces an amount of movement of the locations of text bounding regions in multiple frames.
     Since Hu’032 teaches detecting the text boxes from the video frames (Fig.3) and Lenzi’366 has suggest displaying the caption not overlaying other text (Fig.5), therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to recognize to detect the text boxes from the video and arrange subtitle to a select text box such that the subtitle would not overlap with other texts on the video (wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions is based on locations of text bounding regions in multiple frames of the video) because this will allow the this allow the subtitle and other information to be shown on a video more effectively.
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Hu’032 and Lenzi’366 to detect the text boxes from the video and arrange subtitle to a select text box such that the subtitle would not overlap with other texts on the video (wherein the select a location of a text bounding region in the frame that does not overlap with the one or more bounding regions is based on locations of text bounding regions in multiple frames of the video) because this will allow the this allow the subtitle and other information to be shown on a video more effectively.
     The modification of the combination of Hu’032 and Lenzi’366 does not teach reduces an amount of movement of the locations of text bounding regions in multiple frames.
     JP 3439105 teaches reduces an amount of movement of the locations of text bounding regions in multiple frames [By calculating and correcting the local deviation, the subtitle characters that are commonly displayed between the frame images can be accurately corresponded to each other, so that even if the subtitle characters are displayed while moving, the movement amount of the subtitle characters can be reduced (page 5)].
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the modification of the combination of Hu’032 and Lenzi’366 according to the teaching of JP 3439105 to reduce the movement amount of the subtitle characters in multiple frames because this allow the subtitle to be shown on a video more effectively.
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Hu’032 (US 2008/0260032), Lenzi’366 (US 2003/0169366) and further in view of Yong’542 (CN 111798542).
     With respect to claim 12, which further limits claim 11, the combination of Hu’032 and Lenzi’366 does not teach wherein the identify one or more bounding regions in the frame that correspond to regions of interest is based on a neural network trained based on regions of interest. 
     Yong’542 teaches wherein the identify one or more bounding regions in the frame that correspond to regions of interest is based on a neural network trained based on regions of interest [specifically, the present example provides a mask-based area convolutional neural network (MaskRCNN, Mask Region CNN) to obtain the priori information, and using the prior information estimation to obtain the text box (i.e., target area) based on the statistical rule, then using CRNN to identify the text content of the text box, the method acts on TV dramas, moving picture and comprehensive program and other video data with subtitle content to obtain a large number of training data for voice recognition model (the model can be applied to a specific language, or applied to different languages), the process is full automation, without any manual data marking, so as to reduce the cost of manual collection and label data (page 15)].
     Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Hu’032 and Lenzi’366 according to the teaching of Yong’542 to use a mask-based area convolutional neural network to detect the text boxes from the video because this will allow the this allow the subtitle and other information to be identified on a video more effectively.
Contact
     Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUO LONG CHEN whose telephone number is (571)270-3759.  The examiner can normally be reached on M-F 9am - 5pm.  If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tieu, Benny can be reached on (571) 272-7490.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.  Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HUO LONG CHEN/Primary Examiner, Art Unit 2682

Read full office action

Prosecution Timeline

Jun 30, 2023

Application Filed

Aug 15, 2023

Response after Non-Final Action

Mar 20, 2026

Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/123,449

Patent 12603178

APPARATUS AND METHODS FOR SUPPORTING MEDICAL DECISIONS

2y 5m to grant Granted Apr 14, 2026

17/721,915

Patent 12597162

SYSTEM CALIBRATION USING REMOTE SENSOR DATA

2y 5m to grant Granted Apr 07, 2026

18/202,975

Patent 12592095

METHOD AND SYSTEM OF DETERMINING SHAPE OF A TABLE IN A DOCUMENT

2y 5m to grant Granted Mar 31, 2026

18/300,016

Patent 12586398

Detecting a Homoglyph in a String of Characters

2y 5m to grant Granted Mar 24, 2026

18/137,884

Patent 12567271

PICTURE RECOGNITION METHOD AND APPARATUS, ELECTRONIC DEVICE, AND MEDIUM

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

53%

Grant Probability

84%

With Interview (+30.3%)

3y 2m

Median Time to Grant

Low

PTA Risk

Based on 590 resolved cases by this examiner. Grant probability derived from career allow rate.