Last updated: May 29, 2026

Application No. 18/809,543

SYSTEMS, METHODS, AND BITSTREAM STRUCTURE FOR VIDEO CODING AND DECODING FOR MACHINES WITH ADAPTIVE INFERENCE

Non-Final OA §103§112

Filed

Aug 20, 2024

Priority

Feb 25, 2022 — provisional 63/314,036 +1 more

Examiner

VAZQUEZ COLON, MARIA E

Art Unit

2482

Tech Center

2400 — Computer Networks

Assignee

Op Solutions LLC

OA Round

1 (Non-Final)

Interview Optional

— +13.4% interview lift. Interview lift (+13.4%) is below the 15.0% threshold. A written response is recommended.

Based on 578 resolved cases, 2023–2026

Examiner Intelligence

VAZQUEZ COLON, MARIA E View full profile →

Grants 73% — above average

Career Allowance Rate

421 granted / 578 resolved

+14.8% vs TC avg

Moderate +13% lift

Without

With

+13.4%

Interview Lift

resolved cases with interview

Typical timeline

2y 12m

Avg Prosecution

21 currently pending

Career history

602

Total Applications

across all art units

Statute-Specific Performance

§101

1.0%

-39.0% vs TC avg

§103

85.7%

+45.7% vs TC avg

§102

5.7%

-34.3% vs TC avg

§112

4.1%

-35.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 578 resolved cases

Office Action

§103 §112

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 7 is objected to because of the following informalities: Claim 7 ends in a comma. The Examiner believes this to be a typo and recommends the Applicant to end the claim in a period. Appropriate correction is required.

Claim 8 is objected to because they include reference characters which are not enclosed within parentheses.  
Reference characters corresponding to elements recited in the detailed description of the drawings and used in conjunction with the recitation of the same element or group of elements in the claims should be enclosed within parentheses so as to avoid confusion with other numbers or characters which may appear in the claims.  See MPEP § 608.01(m).

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 1 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 

Claim 1 partially recites “the inference encoder receiving the input signal”. The Examiner was unable to find support for the inference encoder directly receiving the input signal. The specification describes the input signal being received by the inference selector (Figures 6 and 7) not by the inference encoder.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Claim 1 recites the limitation "the inference encoder", “the input signal” in lines 4 and 6. There is insufficient antecedent basis for this limitation in the claim.

In claim 1 the inference metadata encoder receives inference model selection parameters from an inference encoder. Meanwhile, the inference encoder also receives inference model selection parameters from the inference selector. The Examiner is unclear on which part is responsible of providing the inference model selection parameters (the inference selector or the inference encoder).

Claim 4 partially recites “the encoder”. The Examiner is unclear on which decoder the Applicant is referring to.

The preamble of claim 5 appears to be directed to a decoder, however there is mentioning of an encoder. The Examiner is unclear on the purpose of the coding/encoder language in the preamble.

Claim 12 recites the limitation "the encoder" in line 1 and “the input” in line 2.  There is insufficient antecedent basis for these limitations in the claim.

Claim 12 appears to describe functions of an apparatus for encoding, not decoding.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-5, 10-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US Patent No. 11,330,263) in view of Kang et al. (US 2024/0054686).

Regarding claim 1 Wang discloses an encoder for video coding for machine applications, the encoder comprising: 
an inference selector; an inference metadata encoder coupled to the inference selector and receiving inference model selection parameters from the inference encoder and encoding said parameters into an inference metadata substream (explicit features are identified and provided for model training and inference – col.8, 52-55; the trained model parameters are outputted to C or other programming language's header files in some embodiments. The encoder then performs direct model inferencing based on the raw model parameters and known model structure on the input features – col.9, 62-67 and col.10, 1-7; packaging a video stream and possibly other bit streams in a layered stream. Within the video layer of the layered stream, as shown in FIG. 2, the compressed data is further layered. The layers pertain to the operation of the compression scheme as well as the composition of the compressed bit stream – col.6, 23-30); 
an inference encoder, the inference encoder receiving the input signal and inference model selection parameters from the inference selector and routing the input signal to a selected inference model (the encoder applies the trained model to video encoding for coded size estimation through model inference – col.8, 22-25; The encoder then performs direct model inferencing based on the raw model parameters and known model structure on the input features – col.9, 62-67 and col.10, 1-7); 
a feature encoder, the feature encoder being coupled to the inference encoder and generating an encoded feature substream (having determined the features, a target encoder is modified to generate a training dataset and a validation dataset for different picture types in accordance with some embodiments – col.5, 41-45); 
receiving the inference metadata substream from the inference metadata encoder and the feature substream from the feature encoder and providing an encoded bitstream therefrom (the compression system packages a video stream and possibly other bit streams in a layered stream. Within the video layer of the layered stream, as shown in FIG. 2, the compressed data is further layered. The layers pertain to the operation of the compression scheme as well as the composition of the compressed bit stream – col.6, 23-30).
However, fails to explicitly disclose a multiplexor.
In his disclosure Kang teaches a multiplexor (Multiplexer 140 in Figure 1).
It would have been obvious top a person with ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the multiplexer of Kang into the teachings of Wang because the usage of a multiplexer to packet streams in order to be transmitted is a common practice in the art that yields expected results.

Regarding claim 2 Wang discloses the encoder of claim 1, wherein the inference selector produces a recommendation for a best matching inference model for the input signal (to make CNN-based coded size estimation useful in real time video encoding, the exemplary CNN-based model is integrated with the encoder (e.g., the model(s) 122 as part of the encoder 112, FIG. 1A), so that the video encoder invokes model inference directly – col.9, 62-66).

Regarding claim 3 Wang discloses the encoder of claim 2, wherein the inference selector recommends an inference model for each unit of the input signal (The encoder then performs direct model inferencing based on the raw model parameters and known model structure on the input features – col.9, 62-67 and col.10, 1-7).

Regarding claim 4 Wang discloses the encoder of claim 3, wherein the encoder comprises a plurality of inference models and the inference encoder operates to route each unit of the input signal to the recommended inference model for that unit (a training module generates and maintains trained model(s) by applying one or more neural network models to learn and train the relationship between the coded size and picture characteristics for coded size estimation. Examples of such picture characteristics include picture pixels in a CNN model and/or picture features in an MLP model. Once trained, the rate controller uses the trained model(s) to estimate coded size through model inference – col.5, 3-11).

In regards to claim 5, any encoder technology that is present in an encoder also necessarily needs to be present, in substantially identical form, in a corresponding decoder. The description of decoder technologies can be abbreviated as they are the inverse of the comprehensively described encoder technologies. Therefore, claim 5 is being rejected on the same basis as claim 1. It is noted Kang discloses a demultiplexer in Figure 10.

Regarding claim 10 Wang discloses the decoder of claim 5, wherein the inference selector produces a recommendation for a best matching inference model for the input signal (to make CNN-based coded size estimation useful in real time video encoding, the exemplary CNN-based model is integrated with the encoder (e.g., the model(s) 122 as part of the encoder 112, FIG. 1A), so that the video encoder invokes model inference directly – col.9, 62-66).

Regarding claim 11 Wang discloses the decoder of claim 10, wherein the inference selector recommends an inference model for each unit of the input signal (the encoder then performs direct model inferencing based on the raw model parameters and known model structure on the input features – col.9, 62-67 and col.10, 1-7).

Regarding claim 12 Wang discloses the decoder of claim 11, wherein the encoder comprises a plurality of inference models and the inference encoder operates to route each unit of the input signal to the recommended inference model for that unit (a training module generates and maintains trained model(s) by applying one or more neural network models to learn and train the relationship between the coded size and picture characteristics for coded size estimation. Examples of such picture characteristics include picture pixels in a CNN model and/or picture features in an MLP model. Once trained, the rate controller uses the trained model(s) to estimate coded size through model inference – col.5, 3-11).

Claim(s) 6-9 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al. (US Patent No. 11,330,263) in view of Kang et al. (US 2024/0054686) further in view of Zhang et al. (WO 2023/111384 A1).

Regarding claim 6 Wang discloses the decoder of claim 5. Wang discloses creating a packet of a video stream and possibly other bit streams in a layered stream. Within the video layer of the layered stream, as shown in FIG. 2, the compressed data is further layered. The layers pertain to the operation of the compression scheme as well as the composition of the compressed bit stream (col.6, 23-30).
Meanwhile Kang teaches generating a bitstream for the features, said bitstreams are multiplexed together and transmitted (paragraph 7). Kang also teaches a demultiplexer that obtains a first bitstream that is generated by encoding a common feature map representing a representative task that an original image implies and the decoding method also comprises generating a base image from the common feature map by using an image restoration model based on deep learning (paragraph 12). The task feature encoder encodes a task-specific feature map based on deep learning to generate a bitstream. Hereinafter, a bitstream obtained by encoding a task-specific feature map of an individual task is referred to as a second bitstream. The following further refers to a bitstream obtained by encoding a task-specific feature map of a residual task as a third bitstream (paragraph 90).
Therefore, Kang teaches a multiplexed bitstream that contains data used by the demultiplexor to extract the feature substream and inference metadata substream from the bitstream.
It would have been obvious to a person with ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the teachings of Kang into the teachings of Wang because such incorporation ensures improved performance (paragraph 16).
However, fails to explicitly disclose wherein the bitstream comprises a stream level header, the stream level header having data used to extract data.
In his disclosure Zhang teaches wherein the bitstream comprises a stream level header, the stream level header having data used to extract data (a VCM bitstream may comprise a sequence of VCM units. A VCM unit may comprise a VCM unit header and a VCM unit payload. The VCM unit header may comprise a type syntax element that indicates the type of data contained in the VCM unit payload – p.40, 6-17).
It would have been obvious to a person with ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the teachings of Zhang into the teachings of Wang because such incorporation improves task performance (p.43, 25-26).

Regarding claim 7 Wang discloses the decoder of claim 5. However, fails to explicitly disclose wherein the inference metadata substream further comprises an inference metadata header and an inference metadata payload.
In his disclosure Zhang teaches the inference metadata substream further comprises an inference metadata header and an inference metadata payload (a VCM bitstream may comprise a sequence of VCM units. A VCM unit may comprise a VCM unit header and a VCM unit payload. The VCM unit header may comprise a type syntax element that indicates the type of data contained in the VCM unit payload – p.40, 6-17).
It would have been obvious to a person with ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the teachings of Zhang into the teachings of Wang because such incorporation improves task performance (p.43, 25-26).

Regarding claim 8 Wang discloses the decoder of claim 6. However, fails to explicitly disclose wherein the inference metadata header 830 is used by the inference metadata decoder 686 to extract and decode the inference metadata payload.
In his disclosure Zhang teaches the inference metadata header is used by the inference metadata decoder to extract and decode the inference metadata payload (a VCM bitstream may comprise a sequence of VCM units. A VCM unit may comprise a VCM unit header and a VCM unit payload. The VCM unit header may comprise a type syntax element that indicates the type of data contained in the VCM unit payload – p.40, 6-17).
It would have been obvious to a person with ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the teachings of Zhang into the teachings of Wang because such incorporation improves task performance (p.43, 25-26).

Regarding claim 9 Wang teaches the decoder of claim 5. However, fails to explicitly disclose wherein the feature substream comprises a feature stream header and a feature stream payload, wherein the feature stream header is used by the feature decoder to decode the feature stream payload.
In his disclosure Kang teaches a feature substreams used by feature decoders (the VCM encoder encodes the features for machine vision and the inputted images (or residual images) to generate bitstreams. The VCM encoder multiplexes the bitstreams each generated by encoding the features and video and transmits the multiplexed bitstreams together – [0007]). 
It would have been obvious top a person with ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the multiplexer of Kang into the teachings of Wang because the usage of a multiplexer to packet streams in order to be transmitted is a common practice in the art that yields expected results.
However, fails to explicitly disclose a substream comprises a stream header and a stream payload, wherein the stream header is used by the decoder to decode the stream payload.
In his disclosure Zhang teaches a substream comprises a stream header and a stream payload, wherein the stream header is used by the decoder to decode the stream payload (a VCM bitstream may comprise a sequence of VCM units. A VCM unit may comprise a VCM unit header and a VCM unit payload. The VCM unit header may comprise a type syntax element that indicates the type of data contained in the VCM unit payload – p.40, 6-17).
It would have been obvious to a person with ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate the teachings of Zhang into the teachings of Wang because such incorporation improves task performance (p.43, 25-26).

Regarding claim 13 Wang discloses a bitstream for image information encoded using an inference model comprising: 
a stream level header; a feature substream comprising a feature stream header and a feature stream payload (refer to claim 9 rejection); and 
an inference metadata substream comprising an inference metadata header and an inference metadata payload (refer to rejection of claim 7).

Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure: ISO/IEC, “Use cases and requirements for Video Coding for Machines”, ISO/IEC JTC 1/SC 29/WG11 MPEG2020/M53429, April 2020, 4 pages.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARIA E VAZQUEZ COLON whose telephone number is (571)270-1103. The examiner can normally be reached M-F 7:30 AM-3:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CHRISTOPHER S KELLEY can be reached at (571)272-7331. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARIA E VAZQUEZ COLON/Examiner, Art Unit 2482

Read full office action

Prosecution Timeline

Aug 20, 2024

Application Filed

Feb 13, 2026

Non-Final Rejection mailed — §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/897,918

Patent 12634525

LARGE SEI MESSAGES

1y 7m to grant Granted May 19, 2026

18/844,975

Patent 12627840

SYSTEMS AND METHODS FOR SIGNALING SUBLAYER NON-REFERENCE INFORMATION IN VIDEO CODING

1y 8m to grant Granted May 12, 2026

17/848,507

Patent 12621491

DECODING PARAMETER SETS IN VIDEO CODING

3y 10m to grant Granted May 05, 2026

18/360,608

Patent 12615394

PREDICTION TYPE SIGNALING IN VIDEO CODING

2y 9m to grant Granted Apr 28, 2026

18/469,256

Patent 12610072

CHUNKED TRANSCODING AND UPLOADING FOR VIDEO TRANSMISSION

2y 7m to grant Granted Apr 21, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

73%

Grant Probability

86%

With Interview (+13.4%)

2y 12m (~1y 2m remaining)

Median Time to Grant

Low

PTA Risk

Based on 578 resolved cases by this examiner. Grant probability derived from career allowance rate.