Last updated: April 18, 2026
Application No. 18/816,444
MACHINE LEARNING MODEL FOR VIDEO WITH REAL-TIME RATE CONTROL

Final Rejection §103
Filed
Aug 27, 2024
Examiner
HUANG, FRANK F
Art Unit
2485
Tech Center
2400 — Computer Networks
Assignee
NEC Laboratories America Inc.
OA Round
2 (Final)
Interview Optional

— +17.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 691 resolved cases, 2023–2026
Examiner Intelligence

HUANG, FRANK F View full profile →
Grants 75% — above average
Career Allow Rate
519 granted / 691 resolved
+17.1% vs TC avg
Strong +17% interview lift
Without
With
+17.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
33 currently pending
Career history
724
Total Applications
across all art units
Statute-Specific Performance

§101
5.0%
-35.0% vs TC avg
§103
72.0%
+32.0% vs TC avg
§102
3.6%
-36.4% vs TC avg
§112
9.3%
-30.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 691 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
Applicant’s response received has been fully considered and entered.
Response to Arguments
Applicant’s arguments have been considered, but the arguments are not deemed to be persuasive; therefore the same ground for rejection is retained.

Applicant argues the channel capacity is not an input to the process, see argument pg.6-8. examiner respectfully disagrees. The maximum set bit rate, as an input to the encoding method, is the limitation of the channel capacity to this particular communication channel. Therefore, it is the channel capacity and reads on the limitation. See instant application, i.e. para. [0023]During encoding a set of video frames with a given QP, the actual bitrate of the encoded video stream is difficult to predict. While higher QP values generally correspond to higher bitrates, the visual and dynamic complexity of a group of video frames may influence the size of the resulting video. Thus an analytic or formulaic approach to QP selection can result in a condition where the bitrate exceeds the maximum capacity of the channel, which can result in dropped frames and other errors. Such errors are readily visible to a user, where the slight degradation in quality that results from selection of a somewhat higher QP value might not be noticeable. However, being too conservative in QP selection results in underused channel capacity. Para. [0024]QP selection may instead be performed using machine learning, where video features are extracted by feature extractor 202 from the video frames. These features are then processed by prediction head 204 which is trained to generate an encoding parameter to provide bitrates close to the channel capacity without exceeding it. It is specifically contemplated that the prediction head 204 may generate a QP value, but it should be understood that any appropriate encoding parameter may be used in accordance with an encoding standard used by the encoder 104. The machine learning model of the prediction head 204 is trained to recognize the bitrate needs of the input video frames and balance that with the limits imposed by the channel. Wang,  as cited below in col 3, oln 50-65, discloses digital video processing consumes large amount of storage and network capacity. As such, many computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also referred to as coding or encoding) to reduce the bitrate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bitrate form. Therefore, Wang has the channel capacity as input, and adjust the bit rate according to be below that capacity, so that it will fit the channel. Applicant should note that the channel capacity is used to apply to the rate control and network buffer, it is a physical layer and just like the spectrum allocation: that when all the channel added up, it will be the channel capacity of the network. If there is only one channel, then the bit rate is the channel capacity. Although current channel has many available bands, when there is only one band, that bit rate will be the channel capacity of the channel. 
Applicant further argue it claims below the controlled capacity. Examiner respectfully disagrees. According the instant application: [0024]QP selection may instead be performed using machine learning, where video features are extracted by feature extractor 202 from the video frames. These features are then processed by prediction head 204 which is trained to generate an encoding parameter to provide bitrates close to the channel capacity without exceeding it. It is specifically contemplated that the prediction head 204 may generate a QP value, but it should be understood that any appropriate encoding parameter may be used in accordance with an encoding standard used by the encoder 104. The machine learning model of the prediction head 204 is trained to recognize the bitrate needs of the input video frames and balance that with the limits imposed by the channel.  Therefore, the claimed feature is the same as what is taught in the cited prior art: it is not exceeding it. That is also the control principle, and the Applicant in the instant application implied the not exceeding as the below. Therefore, existing prior arts disclose such limitation. 
Therefore, Hwang discloses the channel capacity, i.e., the network capacity, and such capacity is used to limit the bit rate used by the system by using the rate control devices. 
Applicant further argues about claim 4, see instant application para. [0030]The prediction head 204 may be implemented as a deep neural network that includes multiple convolutional layers, each followed by a conditional group normalization (CGN) block. The CGN blocks normalize the output from the previous layer. The prediction head may take the bitrate cap 206 as a tensor of log10⁡(BRmax) with a size [B,1] as a conditioning factor. Each element in the tensor represents the bitrate cap for each video in the batch.
See the argument above, Ram, para. 56 discloses: FIG. 5A is a block diagram of a neural network 502A used to implement the frame (or sub-frame) machine learning model, in accordance with some embodiments. In some embodiments, the neural network 502A is a deep neural network (DNN). In at least some embodiments, the neural network 502A receives states 505 for a number (e.g., N) features, which can be generated by the processing device 112 or the controller 122, as previously discussed. In embodiments, the states depend on the frame statistics for a frame machine learning model or sub-frame statistics for a sub-frame machine learning model. The video encoding device 102 can further update the states based on at least one of the total bits of encoding, a target bitrate, a current group of pictures (GOP) of the video content, or the like. Applicant should consider further normalization process about what is done to be normalized to be different from the cited prior arts.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-8 and 11-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., US 11330263B1 “Wang”, in view of Ram et al. US Pub. No. 20240244228 “Ram”.

Regarding claim 1, WANG discloses Digital video processing consumes large amount of storage and network capacity; engineers use compression (see WANG, col. 3, ln 50-65 -also referred to as coding or encoding) to reduce the bitrate of digital video; and Compression decreases the cost of storing and transmitting video by converting the video into a lower (Wang,  as cited above, digital video processing consumes large amount of storage and network capacity. As such, many computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also referred to as coding or encoding) to reduce the bitrate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bitrate form.) bitrate form (Wang col. 3, ln 50-65, wang, i.e. col. 4, ln 50-col 5, ln 5, ……thus maintaining the desired bitrate to transmit the compressed media content to the client device 160 via the network 150.
	It is noted that RAM is silent about using a machine learning model that accepts the input set of video frames and the current channel capacity as inputs ; encoding the input set of video frames using the encoding parameter to generate encoded video that has a bitrate below the current channel capacity; and transmitting the encoded video  as claimed.
a computer-implemented (see RAM, claim 19) method (see RAM, claim 19) for rate control (RAM, claim 20), comprising: determining an encoding parameter value (RAM, claim 20, i.e. qp value) to use for an input set (RAM, claim 1, i.e. a processing device to receive video content and output encoded video of the video content for a client video device; and
a controller coupled to the processing device, the controller programmed with machine instructions to:
receive, from a video encoder while encoding the video content, frame statistics based on one or more encoded frames of the video content corresponding to a current frame;) of video frames (as cited above, i.e. RAM, claim 1, frames) based on a current (see RAM, ¶ 64) channel capacity (RAM, ¶ 62, i.e. average frame size, according to WANG as cited above, such frame size with frame rate will corresponding to bit rate, which corresponding to channel capacity), using a machine learning model (RAM, claim 19, i.e., machine learning model) that accepts the input set of video frames (as cited above, i.e. video frame input) and the current channel capacity as inputs (as cited above, i.e. RAM, ¶ 62); encoding the input set of video frames using the encoding parameter to generate encoded video that has a bitrate below the current channel capacity (WANG discloses the channel capacity with set bit rate, and RAM discloses targe bit rate, i.e. current channel capacity, see RAM, ¶ 17); and transmitting the encoded video (RAM, ¶ 26).  
	Both WANG and RAM teach systems with machine learning and its application to video compression, and those systems are comparable to that of the instant application.  Because the two cited references are analogous to the instant application, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains, to include in the WANG disclosure, modifying the qp according to the target bit rage, as taught by RAM.  Such inclusion would have increased the usefulness of the system by flexibly taking video content of any content type and/or associated with a specific quality metric and target bitrate, and in real-time, employ a customized machine learning model that will generate the QP values that drive the rate control for encoding the video content on a per-frame and per-sub-frame basis in real-time to avoid redesigning RC algorithms for differing content type, quality metrics, and bitrates, and would have been consistent with the rationale of combining prior art elements according to known methods to yield predictable results to show a prima facie case of obviousness (MPEP 2143(I)(A)) under KSR International Co. v. Teleflex Inc., 127 S. Ct. 1727, 82 USPQ2d 1385, 1395-97 (2007).

Regarding claim 2, WANG/RAM, for the same motivation of combination further discloses the method of claim 1, further comprising determining the current channel capacity based on channel quality information (RAM, ¶ 17) received from user (RAM, ¶ 27) equipment (RAM, ¶ 25).  
Regarding claim 3, WANG/RAM, for the same motivation of combination further discloses the method of claim 1, wherein the machine learning model includes a prediction head model that is trained to generate a parameter value that, when used to encode the input set of video frames, results in the encoded video being at or below (RAM, ¶ 17) the current channel capacity (WANG, col. 12, ln 40-55, i.e. In some embodiments, as represented by block 648, the method 600 further includes identifying a subset of units in the raw video, and applying the neural network to the subset of units to estimate a set of coded sizes for the subset of units. As such, the neural network can be trained and applied to a subset of pictures, e.g., scene change I-pictures, for coded size estimation. As shown in Table 2 and FIG. 5, the machine learning based coded size estimation reduces estimation error and has less over-estimation or under-estimation relative to the reference method).  

    PNG
    media_image1.png
    760
    1150
    media_image1.png
    Greyscale

Regarding claim 4, WANG/RAM, for the same motivation of combination further discloses the method of claim 3, wherein the prediction head model is a deep neural network model (RAM, ¶ 56) that includes conditional group normalization using the current channel capacity (i.e. conditioned on the target bitrate) as a condition (RAM, ¶ 56 discloses: FIG. 5A is a block diagram of a neural network 502A used to implement the frame (or sub-frame) machine learning model, in accordance with some embodiments. In some embodiments, the neural network 502A is a deep neural network (DNN). In at least some embodiments, the neural network 502A receives states 505 for a number (e.g., N) features, which can be generated by the processing device 112 or the controller 122, as previously discussed. In embodiments, the states depend on the frame statistics for a frame machine learning model or sub-frame statistics for a sub-frame machine learning model. The video encoding device 102 can further update the states based on at least one of the total bits of encoding, a target bitrate, a current group of pictures (GOP) of the video content, or the like.)

Regarding claim 5, WANG/RAM, for the same motivation of combination further discloses the method of claim 4, wherein the prediction head model (Ram, para. 56 discloses: FIG. 5A is a block diagram of a neural network 502A used to implement the frame (or sub-frame) machine learning model, in accordance with some embodiments. In some embodiments, the neural network 502A is a deep neural network (DNN). In at least some embodiments, the neural network 502A receives states 505 for a number (e.g., N) features, which can be generated by the processing device 112 or the controller 122, as previously discussed. In embodiments, the states depend on the frame statistics for a frame machine learning model or sub-frame statistics for a sub-frame machine learning model. The video encoding device 102 can further update the states based on at least one of the total bits of encoding, a target bitrate, a current group of pictures (GOP) of the video content, or the like.) includes a plurality of convolutional layers (WANG, col. 8, ln 50-65, FIG. 4 is an exemplary CNN-based model 400 with an embedded sub-MLP network 405 for coded size estimation in accordance with some embodiments. For MLP-based model, as described above with reference to FIGS. 2 and 3, explicit features are identified and provided for model training and inference. The prediction performance depends on available features and feature selection. In contrast, in CNN-based machine learning approaches, convolutional layers 420 are employed to extract features from inputs (e.g., the input pixels or prediction residues to input layer 1 410-1) and to extract deep features from the outputs of previous convolutional layers. In some embodiments, each convolutional layer 420 includes convolution, batch normalization, and max pooling sublayers.), each followed by a respective conditional group normalization (see RAM, as cited above, i.e. ¶ 56 RAM).  

    PNG
    media_image2.png
    567
    776
    media_image2.png
    Greyscale

Regarding claim 6, WANG/RAM, for the same motivation of combination further discloses the method of claim 1, wherein the encoding parameter is a quantization parameter (see RAM, ¶ 23).  
Regarding claim 7, WANG/RAM, for the same motivation of combination further discloses the method of claim 1, further comprising altering the determined encoding parameter value to decrease (¶ 26) video quality before encoding the video (RAM, ¶ 2).  
Regarding claim 8, WANG/RAM, for the same motivation of combination further discloses the method of claim 1, wherein determining the encoding parameter value includes extracting features (as cited below, i.e. RAM, ¶ 39) from the input set of video frames and processing the features with the current channel capacity in a prediction head model (RAM, ¶ 39).  
Regarding claim 11, WANG/RAM, for the same motivation of combination further discloses the method of claim 1, wherein determining the encoding parameter value includes maximization (RAM, ¶ 54) of average video quality of a live video feed (RAM, ¶ 17), subject to the current channel capacity available (RAM, ¶ 17) for the set of video frames and minimization of a probability of packet drop (RAM, ¶ 27) and video artifacts in transmission of the encoded video (RAM, ¶ 18).  
Regarding claim 12, WANG/RAM, for the same motivation of combination discloses a system for rate control (see rejection of claim 1), comprising: a hardware processor; and a memory that stores a computer program which, when executed by the hardware processor, causes the hardware processor to: determine an encoding parameter value to use for an input set of video frames based on a current channel capacity (see rejection of claim 1), using a machine learning model that accepts the input set of video frames and the current channel capacity as inputs (see rejection of claim 1); encode the input set of video frames using the encoding parameter to generate encoded video that has a bitrate below the current channel capacity (see rejection of claim 1); and transmit the encoded video (see rejection of claim 1).  
Regarding claim 13, WANG/RAM, for the same motivation of combination further discloses the system of claim 12, wherein the computer program further causes the hardware processor to determine the current channel capacity based on channel quality information received from user equipment (see rejection of claim 2).  
Regarding claim 14, WANG/RAM, for the same motivation of combination further discloses the system of claim 12, wherein the machine learning model includes a prediction head model that is trained to generate a parameter value that, when used to encode the input set of video frames, results in the encoded video being at or below the current channel capacity (see rejection of claim 3).  
Regarding claim 15, WANG/RAM, for the same motivation of combination further discloses the system of claim 14, wherein the prediction head model is a deep neural network model that includes conditional group normalization using the current channel capacity as a condition (see rejection of claim 4).  
Regarding claim 16, WANG/RAM, for the same motivation of combination further discloses the system of claim 15, wherein the prediction head model includes a plurality of convolutional layers, each followed by a respective conditional group normalization (see rejection of claim 5).  
Regarding claim 17, WANG/RAM, for the same motivation of combination further discloses the system of claim 12, wherein the computer program further causes the hardware processor to alter the determined encoding parameter value to decrease video quality before encoding the video (see rejection of claim 7).  
Regarding claim 18, WANG/RAM, for the same motivation of combination further discloses the system of claim 12, wherein the computer program further causes the hardware processor to extract features from the input set of video frames and to process the features with the current channel capacity in a prediction head model (see rejection of claim 8).  

Claim(s) 9-10 and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Wang et al., US 11330263B1 “Wang”, in view of Ram et al. US Pub. No. 20240244228 “Ram”, in view of JONES et al. US 2016/0225192 Al “JONES”.
Regarding claim 9, WANG/RAM, for the same motivation of combination further discloses the method of claim 1.
It is noted that WANG/RAM is silent about wherein the encoded video is transmitted to a medical professional to aid in medical decision making as claimed.
However, JONES discloses wherein the encoded video is transmitted to a medical professional to aid in medical decision making (JONES, ¶ 51, i.e. medical professional as studio-generated content).  
	Both JONES and WANG/RAM in combination teach systems with remote video transmission, and those systems are comparable to that of the instant application.  Because the two cited references are analogous to the instant application, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains, to include in the WANG/RAM disclosure, operating on the patient according to the activity of the patient, as taught by JONES.  Such inclusion would have increased the usefulness of the system by, and would have been consistent with the rationale of combining prior art elements according to known methods to yield predictable results to show a prima facie case of obviousness (MPEP 2143(I)(A)) under KSR International Co. v. Teleflex Inc., 127 S. Ct. 1727, 82 USPQ2d 1385, 1395-97 (2007).

Regarding claim 10, WANG/RAM/JONES, for the same motivation of combination further discloses the method of claim 1, further comprising performing a treatment action (JONES, Fig. 14, ¶ 91) responsive to the encoded video, including automatically altering a patient’s treatment (see above citation, see also JONES, ¶ 116) in response to a patient activity (JONES, ¶ 110) shown in the encoded video (JONES, Fig. 14, ¶ 3, i.e. when the patient changes the orientation of the leg, the surgical person would have to change his view and operation direction control device accordingly).  
Regarding claim 19, WANG/RAM/JONES, for the same motivation of combination further discloses the system of claim 12, wherein the encoded video is transmitted to a medical professional to aid in medical decision making (see rejection of claim 9).  
Regarding claim 20, WANG/RAM/JONES, for the same motivation of combination further discloses the system of claim 12, wherein the computer program further causes the hardware processor to perform a treatment action responsive to the encoded video, including automatically alteration of a patient’s treatment in response to a patient activity shown in the encoded video (see rejection of claim 10).

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
US 20210120252 A1	TRANSFORM-BASED IMAGE CODING METHOD AND DEVICE
US 20200396487 A1	TRANSFORM AND LAST SIGNIFICANT COEFFICIENT POSITION SIGNALING FOR LOW-FREQUENCY NON-SEPARABLE TRANSFORM IN VIDEO CODING
US 20200244976 A1	Spatial Varying Transform for Video Coding
US 10638144 B2	Content-based transcoder
US 10609384 B2	Restriction on sub-block size derivation for affine inter prediction
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK F HUANG whose telephone number is (571)272-0701. The examiner can normally be reached Monday-Friday, 8:30 am - 6:00 pm (Eastern Time), Federal Alternative First Friday Off.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jay Patel can be reached at (571)272-2988.. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/FRANK F HUANG/Primary Examiner, Art Unit 2485
Read full office action
Prosecution Timeline

Aug 27, 2024
Application Filed
Oct 15, 2025
Non-Final Rejection — §103
Jan 14, 2026
Response Filed
Apr 04, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/763,497
Patent 12593052
LOCAL ILLUMINATION COMPENSATION FOR VIDEO ENCODING AND DECODING USING STORED PARAMETERS
2y 5m to grant Granted Mar 31, 2026
18/533,203
Patent 12587725
IMAGE CAPTURING DEVICE AND IMAGE CAPTURING METHOD THEREOF
2y 5m to grant Granted Mar 24, 2026
18/434,130
Patent 12579815
VIDEO SURVEILLANCE SYSTEM
2y 5m to grant Granted Mar 17, 2026
18/400,840
Patent 12574625
SYSTEM WITH LIGHTING CONTROL INCLUDING GROUPED CHANNELS
2y 5m to grant Granted Mar 10, 2026
18/896,433
Patent 12568248
METHOD AND APPARATUS FOR DECODING A VIDEO SIGNAL
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
75%
Grant Probability
92%
With Interview (+17.3%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 691 resolved cases by this examiner. Grant probability derived from career allow rate.