Last updated: April 19, 2026

Application No. 18/365,296

VIDEO PROCESSING METHOD AND APPARATUS, MEDIUM, AND PROGRAM PRODUCT

Final Rejection §103

Filed

Aug 04, 2023

Examiner

SHANG, ANNAN Q

Art Unit

2424

Tech Center

2400 — Computer Networks

Assignee

BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO., LTD.

OA Round

2 (Final)

Interview Optional

— +10.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 821 resolved cases, 2023–2026

Examiner Intelligence

SHANG, ANNAN Q View full profile →

Grants 71% — above average

Career Allow Rate

581 granted / 821 resolved

+12.8% vs TC avg

Moderate +11% lift

Without

With

+10.7%

Interview Lift

resolved cases with interview

Typical timeline

3y 7m

Avg Prosecution

40 currently pending

Career history

861

Total Applications

across all art units

Statute-Specific Performance

§101

3.5%

-36.5% vs TC avg

§103

46.5%

+6.5% vs TC avg

§102

27.4%

-12.6% vs TC avg

§112

8.8%

-31.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 821 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.          The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
2.	Applicant’s arguments with respect to claim(s) 1, 2, 4-12 and 14-22 have been considered but are moot because the new ground of rejection discussed below. The amendments to the claims necessitated new ground(s) of rejection. This office action is made FINAL

Claim Rejections - 35 USC § 103
3.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

4.	Claim(s) 1, 2, 4-12 and 14-22 is/are rejected under 35 U.S.C. 103 as being unpatentable over YANG et al (2022/0239988) in view of WLODKOWSKI et al (2015/0248887)
	As to claims 1-2, YANG discloses display method and apparatus for item information or method and further a video processing method, 
performed in an electronic device, the method comprising: acquiring a first video clip, the first video clip corresponding to a template text in a first text, where the first text comprises a variable text that varies and a template text that is fixed for the video clip is about conveying and speaking the template (Display manner of video clip, image, frame, etc.), and the first video clip comprises a video subclip having a speech pause (figs.3-6), the video subclip being arranged  at a boundary position where the template text adjoins variable text in the first text (figs.1-13, Abstract, [0004-0015], [0041-0046], [0050-0077], [0081-0097], [0103-0108] and [0148-0179]); UT overlays tags or metadata includes information corresponding to the item tag and interactive link information associated with the object  information overlays; the tag or metadata is configured to indicate the display manner of the item; the tag or metadata is configured to indicate the display manner of the item; the products includes multiple links or tags which when interacted displays additional information and other products in multiple levels of transparency;
generating a second video clip corresponding to the variable text, the second video clip is about the object conveying and speaking the variable text and stitching the first video clip with the second video clip in a time domain based on the boundary position to obtain a video about the object conveying and speaking the first text, wherein the video subclip comprises: a speech signal subsegment ofthe speech pause obtained based on a silence signal; and an image subsequence of the virtual object taking the speech pause obtained based on an image sequence of the virtual object in a non-speaking state and pause information corresponding to the boundary position, the pause information being used for representing a speech pause of a predetermined duration; and capturing, from the preset video, the first video clip corresponding to the template text, wherein a virtual object in an image of the video subclip is in a non-speaking state (see figs.2-8, [0050-0077], [0081-0097], [0103-0112], [0141-0143] and [0148-0179], non-speaking objects, products, etc. superimposed or displayed within specific locations of the display), UT or Server is responsive to voice response of the user with a specific location or region of the display; saying or searching keywords of products to generated results tagged to the respective image “brown bear”, “baseball cap”, moves pops up the resource ID on the image (specific display area or as indicated in the display information associated with the tat item); further tag display information is configured to indicate the display manner or position of the item tag and the item keyword included in the item tag that corresponds to the item tag; further be superimposed and displayed in a region other than the object or the face, the products includes multiple links or tags which when interacted displays additional information and other products in multiple levels of transparency
YANG further discloses popup links: to alert the user of other information related to the object and other additional information; BUT appears silent as to the popup links conveying dynamic spoken information to the user  
However, in the same field of endeavor, WLODKOWSKI discloses voice enabled screen reader that dynamic convey voice response messages to a user during interaction (figs.1-5, Abstract, [0003-0009], [0033-0046] and [0044-0049])
Hence it would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to incorporate the teaching of WLODKOWSKI into the system of YANG to include additional interactive response, such as voice, to convey voice response message via the object to the user as desired
As to claim 4, YANG further discloses wherein the video subclip is a subclip obtained by pausing the video subclip, comprising: performing weighting processing on a speech signal subsegment in the first video clip at a stitching position corresponding to the boundary position, and a silence signal to obtain a speech signal subsegment with a speech pause; and performing weighting processing on an image subsequence of the first video clip at the stitching position and an image sequence of a target state feature to obtain the image subsequence where the virtual object is in the non-speaking state, the target state feature being a feature used for representing that the virtual object is in the non-speaking state ([0041-0044], [0061-0069], [0103-0112], [0141-0143] and [0148-0179]), UT or Server is responsive to voice or audio response, dynamic based on the user request or query and generate at a specific location or region of the display; saying or searching keywords of products to generated results tagged to the respective image “brown bear”, “baseball cap”, moves pops up the resource ID on the image (specific display area or as indicated in the display information associated with the tat item); further tag display information is configured to indicate the display manner or position of the item tag and the item keyword included in the item tag that corresponds to the item tag; further be superimposed and displayed in a region other than the object or the face, the products includes multiple links or tags which when interacted displays additional information and other products in multiple levels of transparency.
As to claim 5, YANG further discloses generating a second video clip corresponding to the variable text comprises: determining a corresponding speech parameter and image parameter for a statement where the variable text is located in the first text, the image parameter being used for representing a state feature of a virtual object to appear in the video corresponding to the first text, and the speech parameter being used for representing a parameter corresponding to text to speech;
extracting, from the speech parameter and the image parameter, a target speech parameter and a target image parameter corresponding to the variable text; and generating, according to the target speech parameter and the target image parameter, the second video clip corresponding to the variable text ([0041-0044], [0061-0069], [0103-0112], [0141-0143] and [0148-0179]), note remarks in claims 1-4.
As to claims 6-7, YANG further discloses wherein the generating a second video clip corresponding to the variable text comprises: performing, according to an image parameter of the variable text at the boundary position, smoothing processing on a target image parameter corresponding to the variable text to improve the continuity of the target image parameter and an image parameter of the template text at the boundary position; and generating, according to the target image parameter, the second video clip corresponding to the variable text and wherein the first video clip comprises a first speech segment; the second video clip comprises a second speech segment; the stitching the first video clip to the second video clip comprises: performing smoothing processing on respective speech subsegments of the first speech segment and the second speech segment at a stitching position; and stitching the smoothed first speech segment to the smoothed second speech segment ([0041-0044], [0061-0069], [0103-0112], [0141-0143] and [0148-0179]), note remarks above.
As to claims 8-9, YANG further discloses wherein an image sequence corresponding to the video comprises: a background image sequence and a moving image sequence; the generating a second video clip corresponding to the variable text comprises: generating a target moving image sequence corresponding to the variable text; determining a target background image sequence corresponding to the variable text according to a preset background image sequence; and fusing the target moving image sequence with the target background image sequence to obtain the second video clip corresponding to the variable text and wherein background images in the target background image sequence located at head and tail positions match background images in the preset background image sequence located at the head and tail positions ([0041-0044], [0061-0069], [0103-0112], [0141-0143] and [0148-0179]), UT or server responsive to queries as to various background image(s), to generate a audio or voice text response, the interactive inputs dynamically generates various mask overlays or superimposed with links, comment information, product link control based on the received inputs, note remarks in claims 1-4
As to claim 10, YANG further discloses where the determining a target background image sequence corresponding to the variable text according to a preset background image sequence, comprises: determining the preset background image sequence as the target background image sequence when the number of in the preset background image sequence is equal to the number of images frames in the target moving image sequence ([0041-0044], [0061-0069], [0103-0112], [0141-0143] and [0148-0179]), UT or server responsive to queries as to various background image(s), to generate audio or voice text response, first superimposed transparent resources are responsive to interactions and may further pops up or generates additional window or small image (superimposed another window or image) over the first superimposed image, with transparency higher than the first transparency.
	As to claims 11-12, the claimed “Apparatus…” is composed of the same structural elements that were discussed with respect to claims 1-2
	Claim 14 is meet as previously discussed in claim 4 
	Claim 15 is meet as previously discussed in claim 5 
	Claims 16-17 are meet as previously discussed in claims 6-7
	Claims 18-19 are meet as previously discussed in claims 8-9
	Claim 20 is meet as previously discussed in claim 10
As to claims 21-22, YANG further discloses wherein the determining the target background image sequence corresponding to the variable text comprises: in response to a number of image frames in the preset background image sequence being greater than a number of image frames in the target moving image sequence, discarding one or more first background image frames located in a middle section of the preset background image sequence, to make the number of image frames in the preset background image sequence after the discarding to be the same as the number of image frames in the target moving image sequence, wherein when a quantity of image frames being discarded is greater than 1, the image frames being discarded are not continuous image frames in the preset background image sequence and wherein the determining the target background image sequence corresponding to the variable text comprises: in response to a number of image frames in the preset background image sequence being fewer than a number of image frames in the target moving image sequence, adding one or more- second background image frames to the preset background image sequence, to make the number of image frames in the preset background image sequence after the adding to be the same as the number of image frames in the target moving image sequence ([0041-0044], [0061-0069], [0103-0112], [0141-0143] and [0148-0179]), UT or server responsive to queries as to various background image(s), to generate audio or voice text response, first superimposed transparent resources are responsive to interactions and may further pops up or generates additional window or small image (superimposed another window or image) over the first superimposed image, with transparency higher than the first transparency.

Conclusion
5.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

6.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANNAN Q SHANG whose telephone number is (571)272-7355. The examiner can normally be reached Monday-Friday 7-4.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, BRUCKART BENJAMIN can be reached on 571-272-3982. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ANNAN Q SHANG/Primary Examiner, Art Unit 2424                                                                                                                                                                                                        


ANNAN Q. SHANG

Read full office action

Prosecution Timeline

Aug 04, 2023

Application Filed

Aug 24, 2025

Non-Final Rejection — §103

Oct 29, 2025

Interview Requested

Nov 13, 2025

Examiner Interview Summary

Nov 13, 2025

Applicant Interview (Telephonic)

Nov 24, 2025

Response Filed

Mar 24, 2026

Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/196,504

Patent 12587702

TERMINAL APPARATUS, DELIVERY SYSTEM, AND DELIVERY METHOD

2y 5m to grant Granted Mar 24, 2026

18/230,022

Patent 12587711

SYSTEM AND METHOD FOR CONFIGURING A CONTENT SELECTION INTERFACE

2y 5m to grant Granted Mar 24, 2026

17/364,732

Patent 12579450

Methods, Systems, And Apparatuses For Model Selection And Content Recommendations

2y 5m to grant Granted Mar 17, 2026

18/216,870

Patent 12556784

SYSTEM AND METHODS FOR OBTAINING AUTHORIZED SHORT VIDEO CLIPS FROM STREAMING MEDIA

2y 5m to grant Granted Feb 17, 2026

18/093,661

Patent 12549814

DYNAMIC SYNCING OF AGGREGATED MEDIA FROM STREAMING SERVICES

2y 5m to grant Granted Feb 10, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

71%

Grant Probability

82%

With Interview (+10.7%)

3y 7m

Median Time to Grant

Moderate

PTA Risk

Based on 821 resolved cases by this examiner. Grant probability derived from career allow rate.