Last updated: April 19, 2026
Application No. 18/574,935
PROCEDURAL VIDEO ASSESSMENT

Non-Final OA §101§102§103§112
Filed
Dec 28, 2023
Examiner
SUMMERS, GEOFFREY E
Art Unit
2669
Tech Center
2600 — Communications
Assignee
Intel Corporation
OA Round
1 (Non-Final)
Interview Optional

— +35.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 348 resolved cases, 2023–2026
Examiner Intelligence

SUMMERS, GEOFFREY E View full profile →
Grants 72% — above average
Career Allow Rate
249 granted / 348 resolved
+9.6% vs TC avg
Strong +35% interview lift
Without
With
+35.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
27 currently pending
Career history
375
Total Applications
across all art units
Statute-Specific Performance

§101
9.6%
-30.4% vs TC avg
§103
41.0%
+1.0% vs TC avg
§102
16.3%
-23.7% vs TC avg
§112
28.6%
-11.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 348 resolved cases
Office Action

§101 §102 §103 §112
DETAILED ACTION
Response to Amendment
The preliminary amendment filed December 28, 2023, has been entered in full.  Claims 26-50 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on December 28, 2023, is being considered by the examiner.

Claim Objections
Claim(s) 26 and 39 is/are objected to because of the following informalities:
In claim 26, fifth line, “the procedure video” should be “the procedural video”
In claim 39, third line, “the procedure video” should be “the procedural video”
Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: the “action-procedure relationship learning module in claim 39.
Regarding Prong (A), claim 39 uses the generic placeholder “module” so this prong of the test is satisfied.
Regarding Prong (B), the generic placeholder is modified by functional language “discovering a relationship between the plurality of action features and a plurality of scoring oriented procedures associated with the procedure video” so this prong of the test is satisfied.
Regarding Prong (C), the claim does not recite any structure, material, or acts for performing the recited function.  For example, claim 39 does not recite any processing circuitry.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.  The corresponding structure of the claimed “action-procedure relationship learning module” includes the action-procedure relationship learning module described at Fig. 6 and par. [0031] et seq. of the published application (i.e., US 2024/0346809 A1).
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
As explained in the rejections under 35 U.S.C. 112(b) below, claim 26 also recites an action-procedure relationship learning module, but it is unclear whether 35 U.S.C. 112(f) is invoked because claim 26 further recites structure in the form of processing circuitry and it is unclear whether this processing circuitry constitutes “sufficient structure, material, or acts for performing the claimed function.”  Claim 50 is similarly indefinite.  For purposes of practicing compact prosecution, claims 26 and 50, and any dependent claims thereof, are interpreted to invoke 35 U.S.C. 112(f).  I.e., the scope of the “action-procedure relationship learning module” of claims 26 and 50 is interpreted to be substantially the same as the scope of the “action-procedure relationship learning module” in claim 39.  See MPEP 2173.06.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 26-38 and 50 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 26, the limitation “action-procedure relationship learning module” has been evaluated under the three-prong test set forth in MPEP § 2181, subsection I, but the result is inconclusive. 
Regarding Prong (A), the claim uses the generic placeholder “module” so this prong of the test is satisfied.
Regarding Prong (B), the generic placeholder is modified by functional language “discovering a relationship between the plurality of action features and a plurality of scoring oriented procedures associated with the procedure video” so this prong of the test is satisfied.
Regarding Prong (C), the claim recites structure in the form of “processor circuitry … configured to:” perform steps.  On the one hand, a processor could potentially be sufficient structure for performing a function of “discovering a relationship between the plurality of action features and a plurality of scoring oriented procedures associated with the procedure video,” which would suggest that Prong C is failed.  On the other hand, it appears that the step that the processor circuity is configured to perform is to “transform the plurality of action features into a plurality of action-procedure features”, rather than to “discover[] a relationship between the plurality of action features and a plurality of scoring oriented procedures associated with the procedure video.”  I.e., it appears that the processor is configured to transform action features, rather than to perform the function of the action-procedure relationship learning module, which would indicate that the processing circuitry is not sufficient structure, material, or acts for performing the claimed function of the action-procedure relationship learning module.  This would suggest that Prong C is passed.
Thus, it is unclear whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because it is unclear whether Prong C of the test is satisfied.  The boundaries of this claim limitation are ambiguous; therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.  Claims 27-38 depend from claim 26 and are also indefinite at least because they include the indefinite limitations of claim 26.  Claim 50 is also indefinite for substantially the same reasons as claim 26 – i.e., claim 50 also recites processing circuity and it is unclear whether the processing circuitry performs the function of the action-procedure relationship learning module recited in claim 39.
In response to this rejection, applicant must clarify whether this limitation should be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. Mere assertion regarding applicant’s intent to invoke or not invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph is insufficient. Applicant may:
(a)	Amend the claim to clearly invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, by reciting “means” or a generic placeholder for means, or by reciting “step.” The “means,” generic placeholder, or “step” must be modified by functional language, and must not be modified by sufficient structure, material, or acts for performing the claimed function;
(b)	Present a sufficient showing that 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, should apply because the claim limitation recites a function to be performed and does not recite sufficient structure, material, or acts to perform that function; 
(c)	Amend the claim to clearly avoid invoking 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, by deleting the function or by reciting sufficient structure, material or acts to perform the recited function; or
(d)	Present a sufficient showing that 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, does not apply because the limitation does not recite a function or does recite a function along with sufficient structure, material or acts to perform that function.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 26-50 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea judicial exception without significantly more. 
The patent subject matter eligibility (SME) test is set forth in MPEP 2106.  It includes multiple steps, sub-steps, and prongs, which are addressed below.

Step 1: Is the claim to a statutory category?
Claims 26-38 are to a machine and/or manufacture.  Claims 39-49 are to a process.  Claim 50 is to a machine and/or manufacture.



Step 2A, Prong One: Does the claim recite a judicial exception?
Claim 26 recites a step to “perform an action segmentation process for a procedural video … to obtain a plurality of action features associated with the procedure video.”  This step recites a mental process type of abstract idea judicial exception.  See MPEP 2106.04(a)(2), Subsection III.  A human can view a procedural video, mentally divide it into different segments/sections for different actions, and recognize different features of the actions portrayed in the video.  For example, a person could view a video of a person washing their hands (a hand washing procedure), recognize that specific parts/segments of the video correspond to particular actions (e.g., actions of applying water, applying soap, scrubbing, rinsing, etc.) and obtain features for those actions (e.g., whether they were performed, whether they were performed in a correct order, whether they were performed well in the subjective opinion of the human, how long they were performed, etc.).
Claim 26 further recites a step to “transform the plurality of action features into a plurality of action-procedure features based on an action-procedure relationship learning module for discovering a relationship between the plurality of action features and a plurality of scoring oriented procedures associated with the procedure video”.  This also recites a mental process.  The claim does not specify what exactly is meant by “action-procedure features” but a person can mentally relate different actions they observe to an overall procedure.  Following the hand washing example, a person can mentally recognize that an action of applying soap is important and should occur relatively early in a handwashing procedure.  The claim further recites that the transformation is “based on an action-procedure relationship learning module for discovering a relationship between the plurality of action features and a plurality of scoring oriented procedures associated with the procedure video.”  Humans can mentally recognize relationships between actions and scores for a procedure.  For example, recognizing that there is an important relationship between an action of applying soap and a correctness “score” of a handwashing procedure.
Claim 26 further recites a step to “perform a procedure classification process to infer the plurality of scoring oriented procedures from the plurality of action-procedure features.”  This also recites a mental process because a person can mentally perform classification (i.e., make a decision or judgement) to infer a plurality of scoring oriented procedures from a plurality of action-procedure features.  For example, a human can make a mental judgement of whether a video shows that a person has correctly washed their hands based on the actions observed in the video and their relationship to the overall handwashing procedure (e.g., handwashing requires application of soap, scrubbing, and rinsing; the video shows all of these actions in the correct order and for at least a minimum amount of time; therefore, the handwashing procedure was successful). 
In summary, claim 26 recites a mental process that could be performed mentally by observing a video of a procedure, recognizing different actions performed in the video, and making a judgement about the procedure based on the recognized actions and their relationship to the procedure.  Claim 39 recites similar limitations.
Claims 27-29 and 40-42 recite extracting a main key frame, which can also be performed mentally by choosing a key frame of video according to certain criteria, such as the key frame being an ending frame of an action.  For example, a human can observe a video and pause it at a particular frame of interest.
Claims 30 and 43 recite “auto-scoring” that can be performed mentally, such as by mentally rating a video on a scale from 1 to 10.
Claims 31 and 44 recite training based on supervision, which is recited at such a high level of generality that it encompasses mentally learning through known examples.
Claims 32 and 45 recite “contextualizing the action features based on action attentions learnt for the action features”, which is recited at such a high level of generality that it encompasses mentally learning what action features are most important and should be paid attention to.  In the handwashing example, this could include mentally learning that the time spent scrubbing is a more-important action feature than the color of the soap when determining whether the handwashing procedure is completed correctly.
Claims 33 and 46 recite “scaling the action features based on a pre-learnt action translation matrix”, which is recited at such a high level of generality that it encompasses basic arithmetic that may be performed mentally (possibly with use of a physical aid, such as pen and paper).  
Claims 34 and 47 recite performing “uniform sampling or average pooling in a temporal dimension after the action segmentation process to obtain sampled action features in the temporal dimension as the plurality of action features.”  This is also recited at such a high level of generality that it encompasses a mental process.  For example, a human could mentally pick one representative frame to review for each action, thereby achieving a uniform sampling.  In another example, a person could mentally rate the intensity of a person’s scrubbing in a handwashing activity as only 3 out of 10 for the first half of the activity, rate the intensity as 7 out of 10 for the second half of the activity, and mentally choose an average rating of 5 out of 10 as a representative score for the entire handwashing activity (i.e., average pooling in a temporal dimension).
Claims 35-36 and 48-49 recite performing visual perception, such as object detection, hand detection, face recognition, or emotion recognition, which can all be performed mentally.
Claims 37-38 further recite training based on supervision, which are recited at such a high level of generality that they can be performed mentally.  For example, a human can mentally learn through examples.

Step 2A, Prong Two: Does the claim recite additional elements that integrate the judicial exception into a practical application?
As discussed above, most of the elements of the claims are directed to a mental process type of abstract idea so they are not considered additional elements.  None of the additional elements that are recited amount to anything more than 
Merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea, as discussed in MPEP § 2106.05(f);
Adding insignificant extra-solution activity to the judicial exception, as discussed in MPEP § 2106.05(g); and
Generally linking the use of a judicial exception to a particular technological environment or field of use, as discussed in MPEP § 2106.05(h),
none of which integrates a judicial exception into a practical application.
Claim 26 recites interface circuitry and processor circuitry.  Reciting an interface configured to receive video and processing circuitry configured to perform a mental process on that video amount to implementing an abstract idea on a generic computer, which “does not integrate the abstract idea into a practical application.”  MPEP 2106.05(f).  Claim 39 similarly recites a non-transitory computer-readable medium and processor circuitry, which also amounts to implementing an abstract idea on a generic computer.  For example, claim 26 recites that the interface circuitry is to receive video data, but does not restrict how this outcome is accomplished, and recites that processing circuitry is configured to perform steps, but does not restrict how these steps are performed or their results are accomplished by using the processing circuitry.

Step 2B: Does the claim recite additional elements that amount to significantly more than the judicial exception?
The identification of additional elements from Step 2A, Prong Two, is carried over.  The conclusions from Step 2A, Prong Two, are carried over.  No additional elements were considered to be insignificant extra-solution activity.
As explained above, the only additional elements relate to computer implementation using interface circuitry to receive video and processor circuitry to process the video.  Especially at the level of generality recited in the claims, these do not amount to significantly more than an abstract idea.  For example, receiving video via interface circuitry could include receiving or transmitting data over a network or retrieving information in memory, both of which have been recognized by the courts as well‐understood, routine, and conventional computer functions.  MPEP 2106.05(d), Subsection II.  Furthermore, as explained above, the interface and processor circuitry, especially at the high level of generality recited in the claims, amount to no more than mere instructions to implement an abstract idea on a computer.

Conclusion
Claims 26-50 are rejected under 35 U.S.C. 101 because they are directed to an abstract idea and do not integrate that abstract idea into a practical application or otherwise recite additional elements that amount to significantly more than the abstract idea.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 39-49 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by ‘Li’ (“Manipulation-skill Assessment from Videos with Spatial Attention Network,” 10 April 2019).

Regarding claim 39, Li discloses a method, comprising:
performing an action segmentation process (e.g., Section 3.1, video depicting actions is divided into                                 
                                    N
                                
                             segments, which falls within the scope of an action segmentation process) for a procedural video (e.g., Sec. 3.1, a video of a procedure is the input, the video including various actions; Secs. 4.1.1-4.1.2, various different types of procedures are considered, including infant grasping, chopstick using, dough rolling, drawing, and surgery) to obtain a plurality of action features associated with the procedure video (e.g., Fig. 1, Feature Encoding; see Sec. 3.2); 
transforming the plurality of action features into a plurality of action-procedure features (e.g., Fig. 1, Sec. 3.3, attention pooling transforms the encoded/action features by adjusting spatial attention based on “the high-level knowledge about the undergoing task”; Thus, the transformed features are within the scope of action-procedure features at least because they consider the relationship between the imaged action and the undergoing task/procedure) based on an action-procedure relationship learning module for discovering a relationship between the plurality of action features and a plurality of scoring oriented procedures associated with the procedure video (Note that the preceding limitation is interpreted under 35 U.S.C. 112(f) – see Claim Interpretation above; Par. [0032] et seq. of the published specification states that the module “may include” an action attention block and an action transition block, but use of the term “may” suggests that neither of the specifically-described configurations is required; Fig. 2 of Li illustrates many different attention blocks and at least the weights                                 
                                    W
                                
                             in equation 5 of Li can be considered an action transition matrix of an action transition block, so Li’s disclosure falls within the scope of the claimed module; Note from Sec. 3.5, last par., that                                 
                                    
                                        
                                            W
                                        
                                        
                                            x
                                            a
                                        
                                    
                                
                             is a 32x512 matrix and                                 
                                    
                                        
                                            W
                                        
                                        
                                            h
                                            a
                                        
                                    
                                
                             is a 32x128 matrix); and 
performing a procedure classification process (e.g., Fig. 1, Sec. 3.4, Temporal aggregation includes an FC layer to classify the video) to infer the plurality of scoring oriented procedures from the plurality of action-procedure features (e.g., Sec. 3.4, the purpose of the classification is to infer whether the procedure was done skillfully – i.e., whether the scoring oriented procedures were performed correctly; For example, Sec. 4.1.1 describes a procedure including grasping a transparent block and putting it into a specified hole, so the score produced by the classifier infers whether the procedures important for determining the score – i.e., the grasping and putting procedures – were performed properly).

Regarding claim 40, Li discloses the method of claim 39, further comprising: performing a key frame extraction process (Sec. 3.1, “randomly sample one frame in each segment to form a sparse sampling of the whole video”) to extract, for each scoring oriented procedure, a main key frame (The randomly sampled frame is an extracted main key frame; They are extracted throughout the video, which includes each scoring oriented procedure) to show completeness of the procedure (as shown in, e.g., Fig. 3, the extracted keyframe images show completeness of the procedure).

Regarding claim 41, Li discloses the method of claim 40, further comprising: performing the key frame extraction process to extract, for each scoring oriented procedure, one or more intermediate key frames (Sec. 3.1, “randomly sample one frame in each segment to form a sparse sampling of the whole video”; The randomly sampled frame is an “intermediate” key frame at least because it falls within the segment) to show one or more important actions or objects in the procedure (e.g., Fig. 3, the images show important actions such as grasping and important object, such as the transparent block being grasped).

Regarding claim 42, Li discloses the method of claim 40, wherein the main key frame is an ending frame of the procedure (Sec. 3.1, “randomly sample one frame in each segment to form a sparse sampling of the whole video”; The randomly-sampled frame is the end of a sub-segment spanning the start of the                                 
                                    N
                                
                            th segment to the sampled frame, so it falls within the scope of an “ending” frame; Also, at least one extracted frame will be the last frame extracted for a given procedure).

Regarding claim 43, Li discloses the method of claim 40, further comprising: performing auto-scoring for the main key frame of each scoring oriented procedure (e.g., Sec. 3.4, score                                 
                                    S
                                
                             is automatically generated for the sequence of extracted key frames) by use of an auto-scoring algorithm (e.g., Sec. 3.4, various algorithm steps including the FC layer) and based on one or more predetermined scoring items associated with the procedure (e.g., Sec. 3.4, eqn. 8 and supporting text, current task state vector).

Regarding claim 44, Li discloses the method of claim 43, wherein the key frame extraction process is trained based on a scoring oriented supervision that labels each frame with a score on each scoring item associated with the frame (e.g., Sec. 3.5, scoring supervision).

Regarding claim 45, Li discloses the method of claim 39, wherein the action-procedure relationship learning module comprises an action attention block (e.g., Fig. 2, various attention blocks) for contextualizing the action features based on action attentions learnt for the action features (Sec. 3.3, throughout).

Regarding claim 46, Li discloses the method of claim 45, wherein the action-procedure relationship learning module further comprises an action transition block for scaling the action features based on a pre-learnt action transition matrix (e.g., Sec. 3.3, eqn. 5, action features                                 
                                    x
                                
                             are scaled by values in weight/action transition matrix                                 
                                    
                                        
                                            W
                                        
                                        
                                            x
                                            a
                                        
                                    
                                
                            ; Note from Sec. 3.5, last par., that                                 
                                    
                                        
                                            W
                                        
                                        
                                            x
                                            a
                                        
                                    
                                
                             is a pre-learnt 32x512 matrix).

Regarding claim 47, Li discloses the method of claim 39, further comprising: performing uniform sampling (Sec. 3.1, “randomly sample one frame in each segment to form a sparse sampling of the whole video”; Randomly sampling one frame from each segment is uniform sampling) or average pooling in a temporal dimension after the action segmentation process to obtain sampled action features in the temporal dimension as the plurality of action features (e.g., Sec. 3.1 and Fig. 1, the sampled frames are input to Feature Encoding to obtain sampled action features).

Regarding claim 48, Li discloses the method of claim 39, further comprising: performing a visual perception on the procedural video before the action segmentation process (e.g., Fig. 3, facial regions have been recognized (as distinguished from other types of image regions) and blurred for privacy; e.g., Fig. 1 and Sec. 3.5, last par., optical flow is calculated, which is a visual perception of motion).

Regarding claim 49, Li discloses the method of claim 48, wherein the visual perception comprises at least one of object detection, hand detection, face recognition (e.g., Fig. 3, face regions have been selectively blurred; This falls within the scope of object detection or face recognition at least because these image regions have been detected/recognized for blurring to protect privacy), or emotion recognition.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 26-38 and 50 is/are rejected under 35 U.S.C. 103 as being unpatentable over Li.

Regarding claim 26, Examiner notes that the claim recites an apparatus comprising interface circuitry and processor circuitry coupled to the interface circuitry and configured to perform a method that is substantially the same as the method of claim 39.
Li discloses the method of claim 39 (see above).
While Li at least implies computer implementation, Li does not explicitly teach the hardware used to implement its video processing method.  In particular, Li does not explicitly teach implementing its method as an apparatus comprising interface circuitry and processor circuitry coupled to the interface circuitry and configured to perform the method.
However, Examiner takes Official Notice that it is old and well-known in the art of image analysis to implement a video processing method as an apparatus comprising interface circuitry and processor circuitry coupled to the interface circuitry and configured to perform the method.  Use of interface circuitry advantageously allows processing circuitry to access input data necessary to perform the method and the processing circuitry advantageously allows the method to be performed quickly and efficiently.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art implement the video processing method of Li as an apparatus comprising interface circuitry and processor circuitry coupled to the interface circuitry and configured to perform the method with the reasonable expectation that such computer implementation would advantageously allow the video processing method to obtain data for performing the method and perform the method quickly and efficiently.  
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Li to obtain the invention as specified in claim 26.	

Examiner notes that claims 27-36 depend from claim 26 and recite apparatuses that perform methods substantially the same as the methods of claims 40-49, respectively.  The invention of claim 26 is obvious over Li (see above).  Li further discloses the features of claims 40-49 (see above).  Accordingly, claims 27-36 are rejected under 35 U.S.C. 103 as being unpatentable over Li for substantially the same reasons as presented in the rejections of claim 26 and claims 40-49, respectively.

Regarding claim 37, Li teaches the apparatus of claim 26, wherein the action segmentation process is trained based on an action level supervision that labels each frame with an action type associated with the frame (e.g., Sec. 4.1.1, each video, and therefore each frame in each video, is labeled as corresponding to “a whole procedure of an infant grasping a transparent block and putting it into a specified hole”, which includes one or more action types; This falls within the scope of the claim; Note the statement that the “process is trained” is significantly broad; For example, no machine learning is required).

Regarding claim 38, Li teaches the apparatus of claim 26, wherein the procedure classification process is trained based on a procedure level supervision that labels each frame with a procedure type associated with the frame (e.g., Sec. 4.1.1, each video, and therefore each frame in each video, is labeled as corresponding to “a whole procedure of an infant grasping a transparent block and putting it into a specified hole”; e.g., Table 1, the model training is performed for each individual procedure).

Regarding claim 50, Li discloses the method of claim 39 (see above).
While Li at least implies computer implementation, Li does not explicitly teach the hardware used to implement its video processing method.  In particular, Li does not explicitly teach implementing its method as a non-transitory computer-readable medium having instructions stored thereon, wherein the instructions, when executed by processor circuitry, cause the processor circuitry to perform the method.
However, Examiner takes Official Notice that it is old and well-known in the art of image analysis to implement a video processing method as a non-transitory computer-readable medium having instructions stored thereon, wherein the instructions, when executed by processor circuitry, cause the processor circuitry to perform the method.  Such computer implementation advantageously allows the method to be performed quickly and efficiently.
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art implement the video processing method of Li as a non-transitory computer-readable medium having instructions stored thereon, wherein the instructions, when executed by processor circuitry, cause the processor circuitry to perform the method with the reasonable expectation that such computer implementation would advantageously allow the video processing method to perform the method quickly and efficiently.  
Therefore, it would have been obvious to one of ordinary skill in the art to combine the teachings of Li to obtain the invention as specified in claim 50.	

Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
‘Jian’ (“Multitask Learning for Video-based Surgical Skill Assessment,” 2020)
Processes segments of a surgery video to classify surgeon’s skill level and produce scores for different attributes – e.g., Fig. 1
‘Bhatt’ (US 2020/0167715 A1)
Analyzes a hand washing procedure video to identify sub-tasks, score the sub-tasks, and compute an overall skill score – e.g., Figs. 5-6
‘Bernal’ (US 2017/0255831 A1)
Divides video into segments and extracts a most-relevant key frame for each individual action – e.g., Fig. 7



Any inquiry concerning this communication or earlier communications from the examiner should be directed to GEOFFREY E SUMMERS whose telephone number is (571)272-9915. The examiner can normally be reached Monday-Friday, 7:00 AM to 3:30 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan Park can be reached at (571) 272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GEOFFREY E SUMMERS/Examiner, Art Unit 2669
Read full office action
Prosecution Timeline

Dec 28, 2023
Application Filed
Mar 13, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/545,794
Patent 12586379
SYSTEM FOR DETECTING OCCURRENCE PERIOD OF CYCLICAL EVENT
2y 5m to grant Granted Mar 24, 2026
18/055,386
Patent 12561755
System and Method for Image Super-Resolution
2y 5m to grant Granted Feb 24, 2026
17/973,809
Patent 12555205
METHOD AND APPARATUS WITH IMAGE DEBLURRING
2y 5m to grant Granted Feb 17, 2026
18/560,833
Patent 12541838
INSPECTION APPARATUS AND REFERENCE IMAGE GENERATION METHOD
2y 5m to grant Granted Feb 03, 2026
18/301,032
Patent 12536682
METHOD AND SYSTEM FOR GENERATING A DEPTH MAP
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
72%
Grant Probability
99%
With Interview (+35.4%)
2y 5m
Median Time to Grant
Low
PTA Risk
Based on 348 resolved cases by this examiner. Grant probability derived from career allow rate.