Last updated: April 19, 2026

Application No. 18/539,746

USING NEURAL LANGUAGE MODELS FOR LONG-TERM ACTION ANTICIPATION FROM VIDEOS

Non-Final OA §102§103§112

Filed

Dec 14, 2023

Examiner

SHAH, UTPAL D

Art Unit

2668

Tech Center

2600 — Communications

Assignee

Brown University

OA Round

1 (Non-Final)

Interview Optional

— +11.4% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 743 resolved cases, 2023–2026

Examiner Intelligence

SHAH, UTPAL D View full profile →

Grants 88% — above average

Career Allow Rate

652 granted / 743 resolved

+25.8% vs TC avg

Moderate +11% lift

Without

With

+11.4%

Interview Lift

resolved cases with interview

Typical timeline

2y 6m

Avg Prosecution

16 currently pending

Career history

759

Total Applications

across all art units

Statute-Specific Performance

§101

12.1%

-27.9% vs TC avg

§103

30.2%

-9.8% vs TC avg

§102

30.0%

-10.0% vs TC avg

§112

14.4%

-25.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 743 resolved cases

Office Action

§102 §103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1 recites the limitation "the first prediction information" in line 14.  There is insufficient antecedent basis for this limitation in the claim.
Claim 15 recites the limitation "the first prediction information" in line 14.  There is insufficient antecedent basis for this limitation in the claim.
Claim 20 recites the limitation "the first prediction information" in line 15.  There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 5-7, 9-16 and 19-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by “Palm: Predicting Actions through Language Modeles @ Ego4D Long-Term Action Anticipation Challenge 2023” by Huang et al. (hereinafter ‘Huang’).
In regards to claim 1, Huang teaches an electronic device, comprising: circuitry that: receives a video that includes one or more objects performing a physical task; (See Huang Figure 1, Huang inputs a video including tasks)
generates, based on the video, a first set of tags that corresponds to a first sequence of actions associated with the physical task; (See Huang Figure 1 and Section 2.2, ‘Past Action’, Huang teaches determining tags for actions performed in the video)
generates a first prompt for a neural language model based on the first set of tags; (See Huang Figure 1 and Section 2.2, ‘Prompt Selection’, Huang teaches generating LLM prompt based on action tags.)  
predicts, by application of the neural language model on the first prompt, a second set of tags that corresponds to a second sequence of actions associated with the physical task, wherein the second sequence of actions succeed the first sequence of actions; and controls a display device to display the first prediction information based on the second set of tags. (See Huang Figure 1 and Section 2.3, Huang teaches using LLM to generate long-term action sequences based on the prompt.)

In regards to claim 2, Huang teaches wherein the first set of tags is generated by application of an action recognition model on a sequence of frames of the video. (See Huang Figure 1).

In regards to claim 5, Huang teaches wherein the circuitry further splits the received video into a set of segments, and each tag of the generated first set of tags corresponds to a segment of the set of segments. (See Huang Section 2.1).

In regards to claim 6, Huang teaches wherein each tag of the generated first set of tags includes a noun and a verb that is associated with the noun. (See Huang Section 2.2, ‘Past actions’)

In regards to claim 7, Huang teaches wherein the circuitry further: applies the neural language model on a first set of tags; predicts a second set of tags based on application of the neural language model; and fine-tunes the neural language model based on the predicted second sequence of actions, wherein the fine-tuned neural language model is applied on the generated first prompt. (See Huang Section 2.2, Huang teaches iteratively selecting prompt.)

In regards to claim 9, Huang teaches wherein the circuitry further retrieves historical data that includes pairs of input and output tags corresponding to past actions and past action predictions associated with one or more physical tasks that is same as or different from the physical task, wherein the first prompt is generated based on the retrieved historical data and the first set of tags. (See Huang Section 2.2, ‘Past actions’)

In regards to claim 10, Huang teaches wherein the circuitry further: generates an output action sequence based on the application of the neural language model on the first prompt; parses the output action sequence into a set of tags, each of which includes a verb and a noun associated with the verb; (See Huang Section 2.3.)
determines whether the set of tags includes an invalid tag; identifies a valid tag that is nearest to the identified invalid tag from the set of tags based on a distance metric; and updates the set of tags by replacing the identified invalid tag with the identified valid tag, wherein the predicted second set of tags is the updated set of tags. (See Huang Section 3, ‘Evaluation Metric’).

In regards to claim 11, Huang teaches wherein the circuitry further: retrieves historical data that includes pairs of input and output tags corresponding to past actions and past action predictions associated with one or more physical tasks that is same as or different from the physical task; and receives an input that includes a first question associated with an objective of the physical task and a second question associated with the second sequence of actions, wherein the first prompt is generated further based on the historical data and the input. (See Huang Section 2.2).

In regards to claim 12, Huang teaches wherein the circuitry further predicts the objective of the physical task by the application of the neural language model on the first prompt, wherein the first prediction information is displayed further based on the predicted objective. (See Huang Figure 1).

In regards to claim 13, Huang teaches wherein the first set of tags corresponds to a first set of time stamps associated with a timeline of the received video. (See Huang Section 2.1).

In regards to claim 14, Huang teaches wherein the circuitry further predicts a second set of time stamps corresponding to the second set of tags based on the application of the neural language model on the first prompt, wherein the first prediction information is displayed further based on the predicted second set of time stamps. (See Huang Figure 1.)

Claims 15-16 and 19 recite limitations that are similar to that of claims 1-2 and 5, respectively. Therefore, claims 15-16 and 19 are rejected similarly as claims 1-2 and 5, respectively.

Claim 20 recites limitations that are similar to that of claim 1. Therefore, claim 20 is rejected similarly as claim 1.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 3-4 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over “Palm: Predicting Actions through Language Modeles @ Ego4D Long-Term Action Anticipation Challenge 2023” by Huang et al. (hereinafter ‘Huang’) in view of “Frozen CLIP Models are Efficient Video Learners” by Lin et al. (hereinafter ‘Lin’).
In regards to claim 3, Huang teaches all the limitations of claim 2. However, Huang does not expressly teach wherein the action recognition model includes a frozen backbone network and a transformer encoder. 
Lin teaches wherein the action recognition model includes a frozen backbone network and a transformer encoder. (See Lin Figure 2, Lin teaches video learning model comprising frozen backbone and transformer.) 
It would have been obvious before the effective filing date of the claimed invention to one of ordinary skill in the art to Huang to include the action model as taught by Lin.   The determination of obviousness is predicated upon the following findings:  One skilled in the art would have been motivated to modify Huang in this manner because/in order to be able to efficiently train models and still produce high quality outputs.  
Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predictable results.
Therefore, it would have been obvious to combine Huang with Lin to obtain the invention as specified in claim 3. 

In regards to claim 4, Huang and Lin teach all the limitations of claim 3. Lin also teaches wherein the circuitry further: applies the frozen backbone network on the received video; extracts a set of representations from a set of sampled frames associated with the received video; and applies the transformer encoder on the extracted set of representations based on at least one learnable query token, wherein the first set of tags is generated based on the application of the transformer encoder. (See Lin Figure 2 and Section 3.) 

Claims 17-18 recite limitations that are similar to that of claims 3-4, respectively. Therefore, claims 17-18 are rejected similarly as claims 3-4, respectively.

Allowable Subject Matter
Claim 8 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
In regards to claim 8, the applied art does not teach or suggest “generates a second prompt based on the predicted second set of tags; predicts a third set of tags based on application of the neural language model on the second prompt, wherein the third set of tags corresponds to a third sequence of actions associated with the physical task, and the third sequence of actions succeed the second sequence of actions; and controls the display device to display second prediction information based on the predicted third set of tags.”

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to UTPAL D SHAH whose telephone number is (571)272-5729. The examiner can normally be reached M-F: 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached at (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/UTPAL D SHAH/           Primary Examiner, Art Unit 2668

Read full office action

Prosecution Timeline

Dec 14, 2023

Application Filed

Jan 28, 2026

Non-Final Rejection — §102, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/793,951

Patent 12602948

Generating Computer Augmented Maps from Physical Maps

2y 5m to grant Granted Apr 14, 2026

18/358,375

Patent 12602914

PROVIDING USER GUIDANCE TO USE AND TRAIN A GENERATIVE ADVERSARIAL NETWORK

2y 5m to grant Granted Apr 14, 2026

18/511,575

Patent 12597242

DETERMINING EMITTER IDENTIFICATION INFORMATION TO A DESIRED ACCURACY

2y 5m to grant Granted Apr 07, 2026

18/576,652

Patent 12597088

QUALITY FACTOR USING RECONSTRUCTED IMAGES

2y 5m to grant Granted Apr 07, 2026

18/589,742

Patent 12597151

SYSTEMS AND METHODS TO DETERMINE VEGETATION ENCROACHMENT ALONG A RIGHT-OF-WAY

2y 5m to grant Granted Apr 07, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

1-2

Expected OA Rounds

88%

Grant Probability

99%

With Interview (+11.4%)

2y 6m

Median Time to Grant

Low

PTA Risk

Based on 743 resolved cases by this examiner. Grant probability derived from career allow rate.

USING NEURAL LANGUAGE MODELS FOR LONG-TERM ACTION ANTICIPATION FROM VIDEOS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email