Last updated: April 19, 2026

Application No. 17/651,076

ACTION RECOGNITION USING COMPRESSED FRAME FEATURES

Non-Final OA §103

Filed

Feb 15, 2022

Examiner

PRINCE, JESSICA MARIE

Art Unit

2486

Tech Center

2400 — Computer Networks

Assignee

Adobe Inc.

OA Round

3 (Non-Final)

Interview Optional

— +16.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 700 resolved cases, 2023–2026

Examiner Intelligence

PRINCE, JESSICA MARIE View full profile →

Grants 76% — above average

Career Allow Rate

535 granted / 700 resolved

+18.4% vs TC avg

Strong +16% interview lift

Without

With

+16.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

37 currently pending

Career history

737

Total Applications

across all art units

Statute-Specific Performance

§101

6.0%

-34.0% vs TC avg

§103

45.8%

+5.8% vs TC avg

§102

14.5%

-25.5% vs TC avg

§112

17.5%

-22.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 700 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-23have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 8 and 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., (U.S. Pub. No. 20220374635 A1) and in view of Kumar et al., (U.S. Patent No. 11301684-B1).
As per claim 1, Xiong teaches a method of processing video data, comprising: receiving a plurality of frames of a video at an edge device (fig. 1),wherein the video depicts an action that span the plurality of frames ([0041], [0045], [0052], [0063], [0087], [0092]; “detected motion”); compressing (fig. 1 el. 118; “encoder”) using an encoder network of the edge device, each of the plurality of frames to obtain compressed frame features, wherein the features include fewer data bits than the plurality of frames of the video (fig. 1 el. 118; [0025], [0026], [0027], “Encoder 118 may use various possible digital encoding and/or compression formats for encoding the video data generated by image sensor 112 into a time-dependent video stream composed of video frames at a determined frame rate (number of frames per second). In some embodiments, encoder 118 may use a compressed video format to reduce the storage size and network bandwidth necessary for storing and transferring the original video stream.”); classifying, using a classification network  of the edge device (fig. 1 el. 124 “analysis 124”), the compressed frame features to obtain classification information correspond to the in the video (fig. 1, fig. 2, fig. 4, fig. 6, [0008] [0029], [0056-0057] [0061-0062], [0099],  In some embodiments, video analysis subsystem 124 may be configured to support real-time image classification and object detection within camera 110 without processing support from network video recorder 130 or network video server 160. For example, video analysis subsystem 124 may receive a video stream (from sensor 112 and/or encoder 118), classify the video frame to determine whether an object type of interest is present and, if so, initiate an object detector to determine the object's position within the video frame (and/or subsequent video frames)), to obtain classification information corresponding to the video,  wherein the classification information comprises a prediction of a likelihood (fig. 2, [0057], “…  If the video frame does include the object type, the video frame may be selectively passed to object detection 218. At block 216, image classification 212 may generate an object type confidence determination in addition to the binary indicator of the object presence or absence. For example, object type confidence determination 216 may include a confidence value between 0 and 1 that indicates the likelihood that the image classification algorithm has correctly identified the presence or absence of the object type in the video frame”) and transmitting the classification information from the edge device to a central server (fig. 1). Xiong does not explicitly disclose classifying, to obtain action classification information; and wherein the action classification information comprises a prediction of a likelihood that the video depicts the action, as recited in claim 1. 
However, Kumar teaches classifying, to obtain action classification information corresponding to the action in the video,  (abstract, col. 2 lines 36-40, lines 55-58; col. 4 lines 6-17, col. 5 lines 40-44; and figs. 1-2), wherein the action classification information comprises a prediction of a likelihood that the video depicts the action (col. 5 lines 57-65, col.10 lines 38-67 and figs.1-2); and transmitting the action classification information to a central server (fig. 11; col. 22 lines 23-30; col. 23 lines 25-36).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Xiong for the benefit of providing improved classification. 
As per claim 2, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. In addition, Xiong teaches recording the video at the edge device (fig. 1 and [0026]) and selecting the plurality of frames of the video for classification (fig. 4 and [0102]).
As per claim 8, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. In addition, Xiong teaches the compressed frame features comprise a binary code ((fig. 1 el. 118; [0025], [0026], [0027],).
As per claim 21, which is the corresponding non-transitory computer-readable medium storing instructions with the limitations of the method for processing video data as recited in claim 1. Thus, the rejection and analysis made for claim 1 also applies here.

Claim(s) 4 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., in view of Kumar (U.S. Patent No. 11301684-B1) and further in view of Tran et al., (U.S. Pub. No. 2023/0344962 A1).
As per claim 4, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. Although Xiong teaches decoding the compressed frame features ([0074]) and wherein classification is based on the decoding ([0075-0077]), Xiong does not explicitly disclose decoding the compressed frame features using a three-dimensional convolution network and a fully connected layer, wherein the action classification information is based on the decoding.
However, Kumar teaches wherein the action classification information is based on the decoding (col. 7 lines 6-11).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Xiong for the benefit of providing improved classification. 
Xiong (modified by Kumar) does not explicitly disclose the compressed frame features using a three-dimensional convolution network and a fully connected layer.
However, Tran teaches decoding the compressed frame features using a three-dimensional convolutional network and a fully connected layer ([0024], [0038-0039] and figs. 3A-3B). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Tran with Xiong (modified by Kumar) for providing improved performance. 
As per claim 23, which is the corresponding non-transitory computer-readable medium storing instructions with the limitations of the method as recited in claim 4, thus the rejection and analysis made for claim 4 also applies here.

Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., (U.S. Pub. No. 20220374635 A1) in view of Kumar et al., (U.S. Patent No. 11301684-B1 ) and further in view of Zhu et al., (U.S. Pub. No. 20210289227-A1 ).
As per claim 6, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. Although Xiong teaches decoding the compressed frame features (fig.3 and [0074]), wherein the classification is based on the decoding (fig. 3 and [0075-0077]). Xiong does not explicitly disclose decoding the compressed frame feature using a recurrent neural network.
However, Kumar teaches wherein the action classification is based on the decoding (col. 7 lines 6-11). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Xiong for the benefit of providing improved classification. 
Xiong (modified by Kumar) does not explicitly disclose decoding the compressed frame feature using a recurrent neural network.
However, Zhu teaches decoding the compressed frame features using a recurrent neural network (fig. 2, fig. 4 and [0029], [0052]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Zhu with Xiong (modified by Kumar) for the benefit of providing improved image processing. 

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., (U.S. Pub. No. 20220374635-A1) in view of Kumar et al., (U.S. Patent No. 11301684-B2)and further in view of Lu et al., (U.S. Pub. No. 20180144248-A1).
As per claim 7, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 6. Xiong does not explicitly disclose performing a convolution operation on at least one frame the video, wherein a layer of the recurrent neural network takes a hidden state from a previous layer and an output of the convolution operation as input.
However, Kumar teaches performing a convolution operation on at least one frame of the video (col. 7 lines 1-6). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Xiong for the benefit of providing improved classification. 
Xiong (modified by Kumar) does not explicitly disclose wherein a layer of the recurrent neural network takes a hidden state from a previous layer and an output of the convolution layer as input.
However Lu teaches wherein a layer of the recurrent neural network takes a hidden state from a previous layer and an output of the convolution layer as input ([0018]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Lu with Xiong (modified by Kumar) for the benefit of yielding predictable results of improving accuracy of predictions.

Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xiong et al., (U.S. Pub. No. 20220374635-A1) in view of Kumar et al., (U.S. Patent No. 11301684-B2)and further in view of Apparalaju et al., (U.S. Patent No. 10909728-B1).
As per claim 9, Xiong (modified by Kumar) as a whole teaches everything as claimed above, see claim 1. Xiong does not explicitly disclose a compression ratio of the compressed frame features is at least 2.
However, Appalaraju teaches a compression ratio of the compressed frame feature is at least 2 (col. 3 lies 40-63).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Appalaraju with Xiong (modified by Kumar) for the benefit of providing improved image quality and increased coding efficiency.

Claim(s) 13-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Golinski et al., (U.S. Pub. No. 20210281867-A1) in view of Kumar et al., (U.S. Patent No. 11301684-B1).
As per claim 13, Golinski teaches an apparatus comprising: an encoder network configured to compress each of a plurality of frame of video to obtain compressed frame features (abstract, [0002], [0005], [0073], [0100], [0121]), wherein the encoder network is trained to compress the plurality of frames by comparing the plurality of frames to reconstructed frames that are based on the compressed frame features (fig. 4 and [0121] and “The encoder portion 432 can be trained to compress input data 440 (e.g., video frames) using motion compensation based on previous representations (e.g., one or more previously reconstructed frames)” The Examiner notes that motion compensation compares current frames to reconstructed frames); a classification network (fig. 2D, classification section and [0082]), Golinski does not explicitly disclose to classify the compressed frame features to obtain action classification information for an action in the video that spans the plurality of frames of the video, wherein the action classification information comprises a prediction of a likelihood that the video depicts the action.
However, Kumar teaches to classify the compressed frame features to obtain action classification information for an action in the video that span the plurality of frames of the video (abstract, abstract, col. 2 lines 36-40, lines 55-58; col. 4 lines 6-17, col. 5 lines 40-44; and figs. 1-2), wherein the action classification information comprises a prediction of a likelihood that the video depicts the action (col. 5 lines 57-65, col.10 lines 38-67 and figs.1-2). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Golinski for the benefit of providing improved classification. 
As per claim 14, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. In addition, Golinski teaches a camera configured to capture the video ([0034], [0081], [0116] and figs. 2D, 4).
As per claim 15, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. In addition, Golinski teaches a reporting component configured to report the classification information to central server.
However, Kumar teaches a reporting component configured to report the classification information to a central server (fig. 11; col.19 lines 9-23). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Kumar with Golinski for the benefit of providing improved classification. 
As per claim 16, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. In addition, Golinski teaches a decoder network configured to generate a reconstructed video based on the compressed frame features (abstract, [0005], [0073], [0110], and fig. 4).

Claim(s) 17-18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Golinski et al., (U.S. Pub. No. 20210281867 ) in view of Kumar et al., (U.S. Patent No. 11301684-B1 ) and further in view of Tran et al., (U.S Pub. No. 20230344962 A1).
As per claim 17, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. Although Golinski discloses the connections between layers of a neural network may be fully connected or locally connected, [0078] and fig. 2A, Golinski does not explicitly disclose the classification network comprises a three-dimensional convolution layer and a fully connected layer. 
However, Tran teaches the classification network comprises a three-dimensional convolution layer and a fully connected layer ([0024], [0038-0039], and  figs. 3A-3B). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Tran with Golinski (modified by Kumar) for providing improved performance. 
As per claim 18, Golinski (modified by Kumar and Zhou) as a whole teaches everything as claimed above, see claim 17. Golinski does not explicitly disclose the classification network comprises a two-dimensional convolution layer.
However, Tran teaches the classification network comprises a two-dimensional convolution layer ([0039], [0041], [0054] and at least fig. 12). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Tran with Golinski (modified by Kumar) for providing improved performance. 
As per claim 20, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. Golinski does not explicitly disclose the classification network comprises an attention layer.
However, Tran teaches the classification network comprises an attention layer ([0025], [0057], [0061]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Tran with Golinski (modified by Kumar) for providing improved performance. 

Claim(s) 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Golinski et al., (U.S. U.S. Pub. No. 20210281867 ) in view of Kumar et al., (U.S. Patent No. 11301684-B1 ) and further in view of Zhu et al., (U.S. Pub. No. 2021/0289227 A1).
As per claim 19, Golinski (modified by Kumar) as a whole teaches everything as claimed above, see claim 13. In addition, Golinski teaches the classification network comprises a convolution component ([0068-0069], [0075-0077], [0080-0082] and fig. 2D). Golinski does not explicitly disclose the classification network comprises a recurrent neural network. 
However, Zhu teaches the classification network comprises a recurrent neural network ([0027], [0029-0030]).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Zhu with Golinski (modified by Kumar) for the benefit of providing improved image processing and classification. 

Allowable Subject Matter
Claims 3 and 5 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA PRINCE whose telephone number is (571)270-1821. The examiner can normally be reached M-F 7:30-3:30 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jamie Atala can be reached at 571-272-7384. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

JESSICA PRINCE
Examiner
Art Unit 2486



/JESSICA M PRINCE/               Primary Examiner, Art Unit 2486

Read full office action

Prosecution Timeline

Feb 15, 2022

Application Filed

Nov 27, 2024

Non-Final Rejection — §103

Feb 05, 2025

Interview Requested

Feb 24, 2025

Applicant Interview (Telephonic)

Feb 28, 2025

Response Filed

Mar 08, 2025

Examiner Interview Summary

Jun 13, 2025

Non-Final Rejection — §103

Aug 11, 2025

Interview Requested

Sep 16, 2025

Response Filed

Jan 09, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/848,068

Patent 12603998

Configurable Neural Network Model Depth In Neural Network-Based Video Coding

2y 5m to grant Granted Apr 14, 2026

18/569,561

Patent 12603990

METHOD, APPARATUS, AND MEDIUM FOR VIDEO PROCESSING

2y 5m to grant Granted Apr 14, 2026

18/654,604

Patent 12598322

IMAGE PROCESSING APPARATUS USING NEURAL NETWORK AND IMAGE PROCESSING METHOD USING THE SAME

2y 5m to grant Granted Apr 07, 2026

18/659,336

Patent 12598299

LOSSLESS MODE FOR VERSATILE VIDEO CODING

2y 5m to grant Granted Apr 07, 2026

18/820,059

Patent 12593076

Transform Unit Partition Method for Video Coding

2y 5m to grant Granted Mar 31, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

76%

Grant Probability

93%

With Interview (+16.2%)

3y 1m

Median Time to Grant

High

PTA Risk

Based on 700 resolved cases by this examiner. Grant probability derived from career allow rate.