Last updated: April 19, 2026

Application No. 17/468,224

SELF-SUPERVISED VIDEO REPRESENTATION LEARNING BY EXPLORING SPATIOTEMPORAL CONTINUITY

Non-Final OA §103

Filed

Sep 07, 2021

Examiner

BRAHMACHARI, MANDRITA

Art Unit

2144

Tech Center

2100 — Computer Architecture & Software

Assignee

Huawei Technologies Co., Ltd.

OA Round

3 (Non-Final)

Interview Optional

— +29.8% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 407 resolved cases, 2023–2026

Examiner Intelligence

BRAHMACHARI, MANDRITA View full profile →

Grants 76% — above average

Career Allow Rate

311 granted / 407 resolved

+21.4% vs TC avg

Strong +30% interview lift

Without

With

+29.8%

Interview Lift

resolved cases with interview

Typical timeline

3y 0m

Avg Prosecution

27 currently pending

Career history

434

Total Applications

across all art units

Statute-Specific Performance

§101

10.5%

-29.5% vs TC avg

§103

54.5%

+14.5% vs TC avg

§102

7.8%

-32.2% vs TC avg

§112

17.9%

-22.1% vs TC avg

Black line = Tech Center average estimate • Based on career data from 407 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission has been entered.

	
	
The action is in response to claims dated 12/1/2025.
Claims pending in the case: 1-5, 7-14, 16-20
Cancelled claims: 6, 15
This is a transferred case. Previous examinations were done by Tamara Kyle.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-5, 7-9, 11-14, 16-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li (“Frame Deletion Detection Based on Optical Flow Orientation Variation,” March 2021), in view of Luo (“Exploring Relations in Untrimmed Videos for Self-Supervised Learning”, August 2020). 

Regarding Claim 1,  Li teaches, A method comprising:
feeding a primary video segment, representative of a concatenation of a first and a second nonadjacent video segments obtained from a video source (Li: Pg. 37198 Fig. 2, Pg. 37203 section VI.A.1: video segments with deleted segments in between) , … extract spatiotemporal representations from the concatenation of the first and the second nonadjacent video segments  (Li: Pg. 37198-37199 section B: optical flow analysis -  involves extracting a small set of stable, repeatable image locations and compute compact descriptors for each; As illustrated in Fig. 3, flow analysis uses spatiotemporal information of each segment (adjacent and non-adjacent));
embedding, …, the primary video segment into a first feature output (Li: Pg. 37198-37199 section B-C: optical flow analysis using vector representation of video feature);
providing the first feature output to a first perception network to generate a first set of … outputs indicating a temporal location of a discontinuous point associated with the primary video segment (Li: Pg. 37203-37204 section B: histogram of descriptor sequence indicates discontinuity);
However, Li does not specifically teach, 
feeding to a deep learning backbone network having a 3-dimensional convolution layer configured to extract;
embedding, via the deep learning backbone network;
perception network to generate a first set of probability distribution outputs indicating a temporal location of a discontinuous point associated with the primary video segment;
generating a first loss function based on the first set of probability distribution outputs; and 
optimizing the deep learning backbone network, by backpropagation of the first loss function;
Luo teaches, 
feeding to a deep learning backbone network having a 3-dimensional convolution layer configured to extract spatiotemporal representations from the video segments (Luo Pg. 4-5 section C: fed to backbone having 3D convolution layers for feature extraction);
embedding, via the deep learning backbone network (Luo Pg. 3 section III, Pg. 4-5 section C: fed to backbone having 3D convolution layers for feature extraction);
perception network to generate a first set of probability distribution outputs indicating a temporal location of a discontinuous point associated with the primary video segment (Luo Pg. 5 col 1 [2]: probability distribution of relations; Pg. 4 section B: relations may be location of a point);
generating a first loss function based on the first set of probability distribution outputs  (Luo Pg. 5 col 1 [2]: loss function based on probability); and 
optimizing the deep learning backbone network, by backpropagation of the first loss function  (Luo Pg. 6 section B and section C: supervised training);
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of  Li and Luo  because the combination would enable using a backbone network for feature extraction and perform video analysis based on a probability distribution. One of ordinary skill in the art would have been motivated to combine the teachings because the combination would enable the use of backbone architecture which are widely used in the art for feature extraction due to their proven effectiveness.
The Examiner further notes that the fact that the video segments are "nonadjacent" is not functionally involved in the steps recited as the limitation broadly claims analyzing the frames in the video.  Hence, this does not distinguish the claimed invention from the prior art in terms of patentability. 

Regarding claim 2, Li and Luo teach the invention as claimed in claim 1 above and, further comprising:
feeding a third video segment, nonadjacent to each of the first video segment and second video segment, obtained from the video source, to the deep learning backbone network (Luo Pg. 4-5 section C: fed to backbone having 3D convolution layers for feature extraction);
embedding, via the deep learning backbone network, the third video segment into a second feature output (Luo Pg. 3 section III, Pg. 4-5 section C: fed to backbone having 3D convolution layers for feature extraction); and
providing the first feature output and the second feature output to a second perception network to generate a second set of probability distribution outputs indicating one or more of a continuity probability and a discontinuity probability associated with the primary and the third video segments (Luo Pg. 5 col 1 [2]: probability distribution of relations; Pg. 4 section B: relations may be location of a point);
generating a second loss function based on the second set of probability distribution outputs (Luo Pg. 5 col 1 [2]: loss function based on probability); and
optimizing the deep learning backbone network, by backpropagation of at least one of the first loss function and the second loss function (Luo Pg. 6 section B and section C: supervised training).

Regarding claim 3, Li and Luo teach the invention as claimed in claim 2 above and, further comprising:
feeding a fourth video segment, obtained from the video source and temporally adjacent to the first and the second video segments, to the deep learning backbone network (Li: Pg. 37198 Fig. 2, Pg. 37203 section VI.A.1: video segments with deleted segments in between) (Luo Pg. 4-5 section C: fed to backbone having 3D convolution layers for feature extraction);
embedding, via the deep learning backbone network, the fourth video segment into a third feature output (Luo Pg. 3 section III, Pg. 4-5 section C: fed to backbone having 3D convolution layers for feature extraction);
providing the first feature output, the second feature output, and the third feature output to a projection network to generate a set of feature embedding outputs comprising:
a first feature embedding output associated with the primary video segment; a second feature embedding output associated with the third video segment; and a third feature embedding output associated with the fourth video segment (Li: Pg. 37198-37199 section B-C: optical flow analysis using vector representation of video feature) (Luo Pg. 3 section III, Pg. 4-5 section C: fed to backbone having 3D convolution layers for feature extraction);
generating a third loss function based on the set of feature embedding outputs (Luo Pg. 5 col 1 [2]: loss function based on probability); and
optimizing the deep learning backbone network by backpropagation of at least one of the first loss function, the second loss function and the third loss function (Luo Pg. 6 section B and section C: supervised training).

Regarding claim 4, Li and Luo teach the invention as claimed in claim 3 above and, wherein: each of the primary video segment and the third video segment is of length n frames, n being an integer equal or greater than two (Li: Pg. 37198 Fig. 2, Pg. 37203 section VI.A.1: video frames) (Luo: Pg. 4 col 1 [2], Pg. 4-5 section C: video frames).

Regarding claim 5, Li and Luo teach the invention as claimed in claim 3 above and, wherein the fourth video segment is of length m frames, m being an integer equal or greater than one (Li: Pg. 37198 Fig. 2, Pg. 37203 section VI.A.1: video frames) (Luo: Pg. 4 col 1 [2], Pg. 4-5 section C: video frames).

Regarding claim 7, Li and Luo teach the invention as claimed in claim 2 above and, wherein each of the first perception network and the second perception network is a multi-layer perception network  (Luo Pg. 3 section III, Pg. 4-5 section C: multilayer network).

Regarding claim 8, Li and Luo teach the invention as claimed in claim 3 above and, wherein the projection network is a light-weight convolutional network comprising one or more of: a 3-dimensional convolution layer, an activation layer, and an average pooling layer (Luo Pg. 3 section III, Pg. 4-5 section C: fed to backbone having 3D convolution layers for feature extraction). It would have been obvious to one killed in the art that a 3D-CNN includes one or more 3D convolution layers, activation layers and pooling layers.

Regarding claim 9, Li and Luo teach the invention as claimed in claim 1 above and, wherein the video source suggests a smooth translation of content and motion across consecutive frame (Li: Pg. 37199 Fig. 3, Pg. 37203 section VI.A.1: video segments with objects in motion)

Regarding Claim(s) 11-14, 16-18, 20 this/these claim(s) is/are similar in scope as claim(s) 1-4, 7-9 and 1 respectively. Therefore, this/these claim(s) is/are rejected under the same rationale.

Allowable Subject Matter
Claims 10 and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 

Response to Arguments
Applicants’ amendments have been fully considered and overcome the 35 U.S.C. § 112b rejection.  These rejections are respectfully withdrawn.
Applicants’ prior art arguments have been fully considered and are not persuasive. Applicant argues that the prior arts do not teach the specifics of the new limitation added by this amendment. The Examiner respectfully disagrees. Since the arguments pertain to the amended sections of the claim, the applicant is requested to refer to the cited sections and explanations in the rejection presented above.	

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure in attached 892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MANDRITA BRAHMACHARI whose telephone number is (571)272-9735. The examiner can normally be reached Monday to Friday, 11 am to 8 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 571 272 4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Mandrita Brahmachari/Primary Examiner, Art Unit 2144

Read full office action

Prosecution Timeline

Sep 07, 2021

Application Filed

Feb 07, 2025

Non-Final Rejection — §103

May 14, 2025

Response Filed

Sep 28, 2025

Final Rejection — §103

Dec 01, 2025

Response after Non-Final Action

Dec 29, 2025

Request for Continued Examination

Jan 17, 2026

Response after Non-Final Action

Mar 09, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/600,603

Patent 12596746

AUDIO PREVIEWING METHOD, APPARATUS AND STORAGE MEDIUM

2y 5m to grant Granted Apr 07, 2026

17/901,221

Patent 12596469

COMBINED DATA DISPLAY WITH HISTORIC DATA ANALYSIS

2y 5m to grant Granted Apr 07, 2026

17/525,683

Patent 12591358

DAMAGE DETECTION PORTAL

2y 5m to grant Granted Mar 31, 2026

17/237,537

Patent 12585979

MANAGING DATA DRIFT AND OUTLIERS FOR MACHINE LEARNING MODELS TRAINED FOR IMAGE CLASSIFICATION

2y 5m to grant Granted Mar 24, 2026

17/894,050

Patent 12585992

MACHINE LEARNING WITH ATTRIBUTE FEEDBACK BASED ON EXPRESS INDICATORS

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

76%

Grant Probability

99%

With Interview (+29.8%)

3y 0m

Median Time to Grant

High

PTA Risk

Based on 407 resolved cases by this examiner. Grant probability derived from career allow rate.