Last updated: May 29, 2026

Application No. 18/637,785

SYSTEMS AND METHODS FOR OPTIMIZING A LOSS FUNCTION FOR VIDEO CODING FOR MACHINES

Final Rejection §103§112

Filed

Apr 17, 2024

Priority

Oct 18, 2021 — provisional 63/256,677 +2 more

Examiner

HAGHANI, SHADAN E

Art Unit

2485

Tech Center

2400 — Computer Networks

Assignee

Op Solutions LLC

OA Round

2 (Final)

This examiner grants 60% of cases after interview

— +17.9% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 374 resolved cases, 2023–2026

Examiner Intelligence

HAGHANI, SHADAN E View full profile →

Grants 60% of resolved cases

Career Allowance Rate

226 granted / 374 resolved

+2.4% vs TC avg

Strong +18% interview lift

Without

With

+17.9%

Interview Lift

resolved cases with interview

Typical timeline

2y 11m

Avg Prosecution

29 currently pending

Career history

401

Total Applications

across all art units

Statute-Specific Performance

§101

0.3%

-39.7% vs TC avg

§103

93.0%

+53.0% vs TC avg

§102

4.8%

-35.2% vs TC avg

§112

0.8%

-39.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 374 resolved cases

Office Action

§103 §112

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1, 11, 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The limitation “the plurality feature maps as rectangular patches of a video frame; and encode the video frame to produce the feature bitstream” is unclear. 2D feature patches are not “video” as “video” is understood in the art. Encoding of a “video” frame does not produce a “feature bitstream” as “feature bitstream” is understood in the art. Also, Applicant discloses both of an encoded video stream and an encoded feature stream (Fig. 3), therefore language such as generating a “video frame” from “feature maps” to produce a “feature bitstream” convolutes the claim scope. 

Claim 24 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Limitation “the frames” is unclear because the claimed embodiment—Fig. 3—includes both video frames encoded by the Second Encoder and features encoded by the First encoder. Therefore, “the frame” does not make clear which frame is being encoded using prediction, transform, quantization, and entropy coding.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 11-15, 21-28 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) in view of Chen (NPL: “Toward Intelligent Sensing: Intermediate Deep Feature Compression,” IEEE 2020).

Regarding Claim 1, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses a system (Fig. 2) for optimizing a loss function (
    PNG
    media_image1.png
    22
    294
    media_image1.png
    Greyscale
, p. 637-638, Section III.C) for video coding for machines (client, server, Fig. 1), the system comprising a computing device (client, Fig. 1) including circuitry and configured to:
receive an input video (input video on the left, Fig. 2);
… extract a plurality (jth feature of the ith frame—indicates multiple features per frame, Section II, p. 634 right column) of feature maps (v is the feature descriptor vector, Section II, p. 634 right column; block-partition guided feature selection, Fig. 2) as a function of the input video (video sequence of consecutive frames I and ith frame, Section II, p. 634 right column; form the input video, Fig. 2) and at least a feature extraction parameter (J, p. 637-638, Section III.C);
arrange (rearrange, p. 637 right column) the plurality of feature maps (the descriptor elements, p. 637 right column) as rectangular patches (2D and rectangles in Fig. 8, p. 637 right column) in a video frame (see Fig. 8, consecutive arrangement of blocks);
encode (transform, quantize, entropy coding, Fig. 2) the video frame (feature selection, Fig. 2) with a [] compliant encoding protocol (using CABAC as adopted into HEVC, p. 637 right column) as an encoded feature bitstream (the output of feature coding is the feature bitstream, see stream of 1s and 0s out of the Feature Coding, Fig. 2);
calculate a loss function (rate-accuracy optimization based on Lagrangian multiplier and degradation D, p. 638 left column, p. 637 last sentence) as a function of the feature layer (feature descriptors, p. 637-638, Section III.C);
and optimize (best tradeoff between rate and distortion, p. 637-638, Section III.C) the at least a feature extraction parameter (J, p. 637-638, Section III.C) as a function of the loss function (
    PNG
    media_image1.png
    22
    294
    media_image1.png
    Greyscale
, p. 637-638, Section III.C).
Zhang does not disclose, but Chen (NPL: “Toward Intelligent Sensing: Intermediate Deep Feature Compression,” IEEE 2020) teaches using a plurality of convolutional (convolutional layers, Section IV.A, p. 2235 left column; Fig. 5) and pooling layers (pooling layers, Section IV.A, p. 2235 left column; Fig. 5) of a convolutional neural network (CNN, Section IV.A, p. 2235 left column; VGG-16, p. 2236 right column, p. 2235 right column; Fig. 5), extract a plurality of feature maps (the outputs of each layer of the deep model can be considered as features, Section IV.A, p. 2235 left column) as a function of the input [] (extract the deep features of the aforementioned deep learning models on a subset of the validation set of the ImageNet 2012 dataset, Section V.A, p. 2236 right column);
encode the video frame (video coding, p. 2232 left column) with a VVC compliant encoding protocol (VVC, p. 2232 left column).
One of ordinary skill in the art before the application was filed would have been motivated to replace the SIFT feature extractor of Zhang with the VGG-16 feature extractor of Chen because Chen teaches that VGG-16 is appealing because of its neat architecture and is the most preferred choice to extract features in the computer vision community (p. 2236 right column). Also, one of ordinary skill in the art before the application was filed would have been motivated to replace the HEVC coder of Zhang with the VVC coder of Chen because Chen teaches VVC is expected to have coding performance superior to HEVC (p. 2232 left column). 

Regarding Claim 2, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses the system of claim 1, wherein the loss function comprises a rate-distortion optimization function (rate-accuracy optimization based on Lagrangian multiplier and degradation D, p. 638 left column, p. 637 last sentence).

Regarding Claim 3, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses the system of claim 2, wherein the rate-distortion optimization function (
    PNG
    media_image1.png
    22
    294
    media_image1.png
    Greyscale
, p. 637-638, Section III.C) aggregates a distortion metric (degradation D, p. 637-638, Section III.C) and a compression metric (number of bits R, p. 637-638, Section III.C).

Regarding Claim 5, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses the system of claim 1, wherein extracting the feature map includes a feature extraction machine learning process (probability statistic based feature selection approach in reference [34], p. 635 right column; see reference [34] for evidence).

Regarding Claim 11, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses a method of optimizing a loss function for video coding for machines, the method comprising: using a computing device (client, server, Fig. 1(c)). The remainder of Claim 11 is rejected on the grounds provided in Claim 1.

Regarding Claim 12, the claim is rejected on the grounds provided in Claim 2.
Regarding Claim 13, the claim is rejected on the grounds provided in Claim 3.
Regarding Claim 14, the claim is rejected on the grounds provided in Claim 4.
Regarding Claim 15, the claim is rejected on the grounds provided in Claim 5.

Regarding Claim 21, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses a machine video encoder (Fig. 2 joint compression framework) for encoding a bitstream (video + feature stream, Fig. 2) for a machine video application (inherent—all videos are for a video application). The remainder of Claim 21 is rejected on the grounds provided in Claim 1.

Regarding Claim 22, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses the machine video encoder of claim 6 wherein the video encoder encodes the frames using NxN coding blocks, where N is one of 4, 8, 16, 32, 64, or 128 (the CU size ranges from 64x64 to 8x8, Section III.A, p. 635 right column).

Regarding Claim 23, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses the machine video encoder of claim 21 wherein video encoder used to encode the frames comprises a video encoder which produces a [] compliant bitstream stream using a subset of [] tools (HEVC coder—inherent the HEVC coder uses HEVC tools).
Zhang does not disclose, but Chen (NPL: “Toward Intelligent Sensing: Intermediate Deep Feature Compression,” IEEE 2020) teaches VVC (VVC, p. 2232 left column).
One of ordinary skill in the art before the application was filed would have been motivated to replace the HEVC coder of Zhang with the VVC coder of Chen because Chen teaches VVC is expected to have coding performance superior to HEVC (p. 2232 left column). 

Regarding Claim 24, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses the machine video encoder of claim 21 wherein video encoder used to encode the frames comprises an encoder using one or more of temporal prediction, transform, quantization, and entropy coding (descriptor elements are predicted, transformed, quantized, and entropy coded, Section 3B, p. 637 right column).

Regarding Claim 25, Chen (NPL: “Toward Intelligent Sensing: Intermediate Deep Feature Compression,” IEEE 2020) teaches a machine video decoder for receiving and decoding (deep feature decoding, Fig. 4) a feature bit stream (encoded bitstream, Fig. 4) for a machine video application (task, Fig. 4),
…
an inner decoder for receiving the encoded feature bitstream (deep feature decoding, Fig. 4) and outputting the frame with the plurality of feature maps (e.g., conv5 features, the output feature maps of the fifth convolutional block; spatial and semantic information from its conv4 layer; visual tracking tasks, pool4 and pool5 features of VGGNet; the f c2 features and pool5 feature of VGGNet, Section III last paragraph, p. 2234);
and a feature decoder (feature decoder, Fig. 4) reconstructing the feature maps (reconstructed deep feature maps, p. 2239 right column), the reconstructed feature maps being input to a machine task (object detection, retrieval, captioning, quality assurance, tracking, Fig. 5) which includes a deep neural network (e.g., LSTM, Fig. 5, other tasks incorporate deep NNs too).
The remainder of Claim 25 is rejected on the grounds provided in Claim 1.

Regarding Claim 26, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses the machine video decoder of claim 25.
Zhang does not disclose, but Chen (NPL: “Toward Intelligent Sensing: Intermediate Deep Feature Compression,” IEEE 2020) teaches wherein the inner decoder is a VVC decoder (VVC, p. 2232 left column) for receiving the bitstream and reconstructing the feature maps (existing video codecs can be applied to compress deep features, Section VI.A, p. 2239 right column).
One of ordinary skill in the art before the application was filed would have been motivated to replace the HEVC coder of Zhang with the VVC coder of Chen because Chen teaches VVC is expected to have coding performance superior to HEVC (p. 2232 left column). 

Regarding Claim 27, Zhang (NPL “A joint compression scheme of video feature descriptors and visual content,” IEEE 2017) discloses the machine video decoder of claim 25 wherein the inner decoder is an entropy decoder for receiving the bitstream and reconstructing the feature maps (feature coding, the extracted feature descriptors from interest objects are compressed with the procedures of prediction, transform, scalar quantization and entropy coding, Section II).

Regarding Claim 28, the claim is rejected on the grounds provided in Claim 22.

Response to Arguments
	Applicant’s remarks are moot because they do not pertain to the combination of references relied upon in this office action. Chen (NPL: “Toward Intelligent Sensing: Intermediate Deep Feature Compression,” IEEE 2020) is relied upon to teach deep learning of features.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Francini, “Selection of local features for visual search,” Signal Processing: Image Communication 2012. 
US PG Publication 2024/0414342
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHADAN E HAGHANI whose telephone number is (571)270-5631. The examiner can normally be reached M-F 9AM - 5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jay Patel can be reached at 571-272-2988. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHADAN E HAGHANI/Examiner, Art Unit 2485

Read full office action

Prosecution Timeline

Apr 17, 2024

Application Filed

Jun 30, 2025

Non-Final Rejection mailed — §103, §112

Dec 29, 2025

Response Filed

Jan 29, 2026

Final Rejection mailed — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/955,013

Patent 12641324

SYSTEM AND METHOD FOR AUTOMATICALLY CAPTURING AND REPLAYING IMAGES, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM

1y 6m to grant Granted May 26, 2026

18/228,342

Patent 12631868

Film Through Scope Camera Mount System

2y 9m to grant Granted May 19, 2026

18/446,265

Patent 12634515

Switchable Dense Motion Vector Field Interpolation

2y 9m to grant Granted May 19, 2026

18/189,385

Patent 12622496

WEATHER PROTECTION COVER FOR A HORIZONTALLY ARRANGED DISPLAY UNIT OF A FIELD DEVICE OF PROCESS AND AUTOMATION TECHNOLOGY

3y 1m to grant Granted May 12, 2026

18/776,204

Patent 12621478

METHODS AND DEVICES FOR DECODER-SIDE INTRA MODE DERIVATION

1y 9m to grant Granted May 05, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

60%

Grant Probability

78%

With Interview (+17.9%)

2y 11m (~10m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 374 resolved cases by this examiner. Grant probability derived from career allowance rate.