Last updated: April 19, 2026
Application No. 17/784,877
SYSTEMS AND METHODS FOR ENHANCED FEEDBACK FOR CASCADED FEDERATED MACHINE LEARNING

Non-Final OA §103§112
Filed
Jun 13, 2022
Examiner
KNIGHT, PAUL M
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
OA Round
1 (Non-Final)
This examiner grants 62% of cases after interview

— +17.0% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 272 resolved cases, 2023–2026
Examiner Intelligence

KNIGHT, PAUL M View full profile →
Grants 62% of resolved cases
Career Allow Rate
169 granted / 272 resolved
+7.1% vs TC avg
Strong +17% interview lift
Without
With
+17.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
24 currently pending
Career history
296
Total Applications
across all art units
Statute-Specific Performance

§101
9.5%
-30.5% vs TC avg
§103
45.5%
+5.5% vs TC avg
§102
6.0%
-34.0% vs TC avg
§112
35.2%
-4.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 272 resolved cases
Office Action

§103 §112
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Style
In this action unitalicized bold is used for claim language, while italicized bold is used for emphasis. 

Information Disclosure Statement
All information disclosure statements were submitted prior to the first action and are incompliance with the provisions of 37 C.F.R. § 1.97.  Accordingly, they have been considered. It is noted that the ISR from the parent under §371 (PCT/EP2020/086987) has been included while the ISR from PCT/EP2019/086065 dated 12/06/2020 (to which this application claims priority) has not been filed in an IDS. 


Priority Date
Applicant claims foreign priority to PCT/EP2019/086065 (hereafter ‘065), but this document fails to provide support for the claimed subject matter. Specifically, nothing in this document indicates that applicant was in possession of an invention including the operations of “providing to each client device of the plurality of client devices: the global ML model; and feedback information related to one of a plurality of hidden neural network layers of the neural network comprised in the network ML model for training the local ML models at the client device” together with the other claimed operations. Similar language in claims 14 and 19 is also unsupported by the ‘065 application. As such, all determinations relating to the state of the art at the time of filing for claims that substantially include the language quoted above, are based on the filing date of parent application PCT/EP2020/086987, filed 18 December 2020.  



Applicant Reply
“The claims may be amended by canceling particular claims, by presenting new claims, or by rewriting particular claims as indicated in 37 CFR 1.121(c). The requirements of 37 CFR 1.111(b) must be complied with by pointing out the specific distinctions believed to render the claims patentable over the references in presenting arguments in support of new claims and amendments. . . . The prompt development of a clear issue requires that the replies of the applicant meet the objections to and rejections of the claims. Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. . . . An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.” MPEP § 714.02.  Generic statements or listing of numerous paragraphs do not “specifically point out the support for” claim amendments.  “With respect to newly added or amended claims, applicant should show support in the original disclosure for the new or amended claims. See, e.g., Hyatt v. Dudas, 492 F.3d 1365, 1370, n.4, 83 USPQ2d 1373, 1376, n.4 (Fed. Cir. 2007) (citing MPEP § 2163.04 which provides that a ‘simple statement such as ‘applicant has not pointed out where the new (or amended) claim is supported, nor does there appear to be a written description of the claim limitation ‘___’ in the application as filed’ may be sufficient where the claim is a new or amended claim, the support for the limitation is not apparent, and applicant has not pointed out where the limitation is supported.’)” MPEP § 2163(II)(A).


Attempts to Contact Applicant
In the interest of compact prosecution, Examiner attempted to contact Applicant’s representative Lynn Borchers on three different occasions. No reply has been received.  

Allowable Subject Matter under 35 U.S.C. §§ 102 and 103
Claims 1-12, 14, and 21 are non-obvious in view of the art of record. But see other rejections below. 
The following is a list of the closest prior art: 
Chen (Learning Efficient Object Detection Models with Knowledge Distillation, 2017) teaches sending outputs of intermediate layers from a first model to a second model as part of training the second model. This is similar to the sending of “feedback information related to one of a plurality of hidden neural network layers of [a] neural network for training” another model. But Chen does not teach implementing this as part of a federated learning system including three separate models. Specifically, Chen does not teach a teacher network located on a server sending information related to the hidden layers to a student model on a client device, where the teacher is used in conjunction with both a local model on an end device and a separate global model updated using federated learning techniques. Examiner is unable to find any articulable reason for one of ordinary skill in the art to modify Chen, such that the claimed global model would be used in conjunction with the student teacher models of Chen. 
Sun (US 20200364542; filed Aug. 2019, different assignee) teaches “Lines 1 through 8 describe the iterations through one or more hint learning epochs for blocks 401 and 402, where intermediate layer outputs from the teacher models are used to compute hint losses, and perturbed hint loss is used to update the training parameters of the student model using backpropagation.” Sun ¶46. But nothing in the reference teaches any other aspects of federated learning and no reason is apparent that would motivate one of ordinary skill in the art to modify the teachings of Sun to includes the federated learning aspects of the claimed invention
Kim (US 2018/0336465) teaches “Hints are not only provided by the teacher network at a final layer for a final output but also at intermediate layers with the same level of abstraction to guide the student network to a better solution. A hint may include a mean square error (MSE) or a probability (e.g., a softmax probability). The purpose is to train the student network by considering the hint. Without the hint, the student network is trained to minimize an original loss function. But with the hint, it considers a “denoising” aspect by trying to resemble a teacher layer output.” Similar to the above cited references, this teaches sending an output from an intermediate layer of a first model to a second model, and using the output for training the second model. As with the above references, no aspect of federated learning is found in the reference and there is no clear motivation to modify the reference to include all of the claimed subject matter.
McMahon (US 2019/0340534; filed March 2019, different assignee) corresponding to WO 2018/057302 was cited in the ISR of both the parent under §371 (PCT/EP2020/086987, hereafter ‘987 application) and the oldest application to which this application claims priority (PCT/EP2019/086065, hereafter ‘065 application). The ‘065 application was rejected but the claims of that application do not describe the combination found non-obvious in this application. In the Written Opinion of the ISR from the ‘987 application, the Examiner determined that McMahon did not teach the claimed invention under the standards of the EPO. See Written Opinion of application ‘987, sections 6.9-6.12. For reasons similar to those given in the Written Opinion, the claims are non-obvious under 35 U.S.C. § 103. McMahon teaches operations of a federated system and a way of mitigating bandwidth usage when updating the global and local models. See McMahon ¶¶22-24. But nothing in the reference teaches “providing to each client device of the plurality of client devices . . . feedback information related to one of a plurality of hidden neural network layers of the neural network comprised in the network ML model for training the local ML models at the client device” used in combination with the federated system including separate local and global models, as substantially recited in all claims indicated as non-obvious.  
Lim (Federated Learning in Mobile Edge Networks: A Comprehensive Survey, Sept 2019) teaches various aspects of federated learning, but fails to teach information related to a hidden layer for training another network. 
Thapa (SplitFed: When Federated Learning Meets Split Learning; Sept. 2020) teaches combining split learning and federated learning. Note that split learning is a technique in which only an intermediate layer is passed between servers and edge devices during training. See Thapa Sec. 2.2. While Thapa combines aspects of federated learning, the technique in the reference does not send the local models back and forth between edge devices and the server/cloud. As such, it fails to teach providing the global model to each client device iteratively. For at least this reason, Thapa does not teach the reference as a whole.    






Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-12, 14, and 19-25 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Generally: separately listed claim elements are construed as distinct components, that all claim terms must be given weight, there is presumed to be a difference in meaning and scope when different words or phrases are used in separate claims, and repeated and consistent descriptions in the specification indicate the proper scope of a claimed term.  “[C]laims must ‘conform to the invention as set forth in the remainder of the specification and the terms and phrases used in the claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description.’ 37 C.F.R. § 1.75(d)(1).”  Phillips v. AWH Corp., 415 F.3d 1303, 1316 (Fed. Cir. 2005) (as cited in MPEP § 2111).  Therefore, use of two different terms in the claims that both rely on the description of a single structure in the Specification may render at least one term indefinite because there is no way to determine which term should be construed in view of the description of the single structure. 
All independent claims substantially recite “providing, to each client device of the plurality of client devices . . . feedback information related to one of a plurality of hidden neural network layers of the neural network comprised in the network ML model for training the local ML models at the client device[.]” There is no objective measure which would allow one of ordinary skill in the art to determine whether the “feedback information” is “related to” hidden layers of a neural network. Specifically, this could be reasonably be understood as referring to sending outputs of the outputs of the hidden layers for training of the local ML models. Alternatively, this could refer to any data input or output from the network ML model, because all input or output data ultimately operates in conjunction with the hidden layers, and can be reasonably interpreted as “information related to” the hidden layers of the model. Since there is no objective measure of “related to” in the context of this claim, the scope of the claim is subjective. See MPEP § 2173.05b. 
Claims 1 and 14 substantially recite “providing, to each client device of the plurality of client devices: ■ the global ML model; ■ feedback information related to one of a plurality of hidden neural network layers of the neural network comprised in the network ML model for training the local ML models at the client device[.]” It is not clear whether or not the global ML model or the feedback information must be transmitted to the client device. The term “providing” could be read as simply storing model and feedback information in an unsecured format on the server, rather than actually sending the information. Alternatively, looking at the invention as a whole could reasonably determine that “providing” means transmitting because this operation is required for the invention to function properly. Since there are two inconsistent, but reasonable ways of interpreting the claim language, the claims are indefinite. 
Claims 1 and 14 recite “the estimated values of each of the one or more parameters output by the local ML models for the training epoch[.]” The claims include both a “first” and a “second” “one or more parameters.” The language in bold omits “first” and “second.” It is not clear if the language in bold refer the first, the second, or some other “one or more parameters.”
Claim 19 recites “providing, to the server: ■ the local ML model for the training epoch; and ■ an estimated value of each of the one or more first parameters output by the local ML model at the client device for the training epoch.” Similar to the rejections of claims 1 and 14 directly above, it is not clear whether “providing” requires some transmission of a model and a value, or merely reads on storing in an unsecured format. 
Claim 19 recites “[a] computer-implemented method of operation of a client device for cascaded federated machine learning, the method comprising” various operations, none of which can fairly be characterized as “federated learning.”  Without any claimed operations directed to a method of federated learning, it is not clear how the language “operation of a client device for cascaded federated learning” should limit the claims scope. The extent to which operations from the field of federated learning is required by this claim language is unclear. Note that dependent claim 21 recites operations of a federated learning system, so this rejection does not apply to claim 21. 
Claim 19 recites “training a local machine learning, ML, model based on: . . . wherein local ML model is for estimating one or more first parameters at the client device, and the network ML model is for estimating one or more second parameters at the server[.]” The transition “based on: . . . wherein” is unclear because it implies that the training is based on a wherein clause. This could be read as operations withing the wherein clause somehow adding to the training, but there are no operations within the wherein clause. Further, the wherein clause is ambiguous because it is not clear how or whether the language “is for estimating . . .” the first and second parameters would modify the local ML model and the network ML model. This entire clause could be read as an intended use and be given no weight. If that is the intent, the language may be deleted. If the language is to be given weight, it must be clear ML models should be further limited by the clause. Since there is no way to determine how, or whether the language in this clause limits the scope of claim 19, the language is indefinite. This language is also indefinite because it is not clear whether “at the client device” and “at the server” refer to the location of the models, or if they relate to the parameters. For example, the language could be read as requiring estimating parameters of the client device, or could be read to require locating the local ML model at the client device.
Claim 19 recites “A computer-implemented method of operation of a client device . . . the method comprising: training a local machine learning, ML, model based on: . . . local data available at the client device[.]” The preamble indicates the method is for operation of the client device. Consistent with the preamble, the claim recites training the “local” ML model based on “local data available at the client device[.]” It is not clear whether the use of “local” requires the local ML model to be located on the client device, or if this is merely a name for the model. In other words, it is not clear whether the training of the local machine learning model must take place on a client device, or if the “local” merely names the model based on its ultimate destination, with the recited training potentially occurring on the server.  
All dependent claims are rejected as containing the limitations of the claims from which they depend.  



Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 19-20 and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Chen (Learning Efficient Object Detection Models with Knowledge Distillation; 2017), Jung (US 2021/0056412, filed Feb 2020 based on provisional filed August 2019, different assignee), and Milletari (US 11,804,050 filed Oct 2019, different assignee).
19. (Currently Amended) A computer-implemented method of operation of a client device for cascaded federated machine learning, the method comprising: (This is an intended use in the preamble. As such it is not given patentable weight. Note that the subsequently claimed operations do not constitute federated learning.) for a training epoch: (“Our overall learning objective can be written as follows: 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
 where N is the batch-size for RCN and M for RPN.” Chen P. 4. Note that “batch size” implies multiple iterations (epochs) of a given batch are summed to ultimately train the model based on this objective function at a given iteration during backpropagation.) training a local machine learning, ML, model (“On the other hand, seminal works on knowledge distillation show that a shallow or compressed model trained to mimic the behavior of a deeper or more complex model can recover some or all of the accuracy drop [3, 20, 34].” Chen PP. 1-2. “Conventional use of knowledge distillation has been proposed for training classification networks, where predictions of a teacher network are used to guide the training of a student model.” Chen P. 4.) based on: - local data available at the client device; (“The student s is trained to optimize the following loss function: . . . where Lhard is the hard loss using ground truth labels[.]” Chen P. 4. Note that the “ground truth labels” refer to labels for training data. One of ordinary skill in the art would understand this as teaching the use of both labels and associated data to train the model. 
The previously cited art does not teach that the local (student) machine learning model is trained based on local data available at the client device. Jung teaches “The systems and methods in the present disclosure are applied to the refinement of a neural network model at a transfer learning process after the edge device 150 is deployed into a locally constrained physical environment. This approach assumes that the neural network learns from the input data that are captured by an edge device deployed in a constrained physical space with a known fixed number of classes. Accordingly, the edge learning methods can be deployed into numerous types of consumer electronics and appliance that are designed to utilize artificial intelligence and machine learning techniques for their services and functions.” Jung ¶32.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Jung because this allows further updating of the model using data in the environment of the client device, which may result in a more accurate model especially withing that environment.) and - feedback information received from a server, the feedback information related to one of a plurality of hidden neural network layers of a neural network comprised in a network ML model trained at the server; (Chen teaches “Lsoft is the soft loss using teacher’s prediction and u is the parameter to balance the hard and soft losses. It is known that a deep teacher can better fit to the training data and perform better in test scenarios. The soft labels contain information about the relationship between different classes as discovered by teacher. By learning from soft labels, the student network inherits such hidden information.” Chen P. 4. Chen teaches “Distillation transfers knowledge using only the final output. In [34], Romero et al. demonstrate that using the intermediate representation of the teacher as hint can help the training process and improve the final performance of the student. They use the L2 distance between feature vectors V and Z: [equation 5] where Z represent the intermediate layer we selected as hint in the teacher network and V represent the output of the guided layer in the student network. . . . While applying hint learning, it is required that the number of neurons (channels, width and height) should be the same between corresponding layers in the teacher and student. In order to match the number of channels in the hint and guided layers, we add an adaptation after the guided layer whose output size is the same as the hint layer.” Chen P. 5.See also Chen Fig. 1 showing outputs from various intermediate layers of the teacher being sent to corresponding layers of the student for comparison (i.e. for calculating the loss function. This teaches feedback information being sent from a teacher network to a student network, but fails to teach a configuration where the sending (teacher) network is located on a server while the receiving (student) network is located on a client device. 
As shown above, Chen teaches sending information from hidden layers of a better trained model (teacher) to a lesser trained model (student). The reference does not teach that the better trained (teacher) model is located on a server while the lesser trained model (student) is located on a client device. 
Milletari teaches “In at least one embodiment, training nodes 102 and training aggregator 104 may be used to collaboratively train Machine Learning Model (MLM) 108.” Milletari col. 3 ll. 28-30. “By training machine learning model(s) 106A, 106B, and 106C (also referred to as machine learning models 106), training nodes 102 may learn values of one or more parameters of machine learning models 106, which may in turn be used by training aggregator 104 to determine a value(s) of one or more corresponding parameters of machine learning model(s) 108.” Milletari col. 3 ll. 44-50. “In at least one embodiment, aggregation server(s) 504 include one or more portions of training aggregator 104.” Milletari col. 19 ll. 20-21. “FIG. 5 illustrates a network environment 500 for collaborative training, in at least one embodiment. In at least one embodiment, network environment 500 may include one or more client devices 502A and 502B through 502N (also referred to as “client devices 502”) and aggregation server(s) 504. In at least one embodiment, each client device 502A may include one or more training nodes 102.”  Milletari col. 19 ll. 10-16. “Parameter reviewer(s) 1102 and/or review aggregator(s) 1110 may be used to facilitate collaborative learning, such as federated learning. In at least one embodiment, parameter reviewer(s) 1102 and/or review aggregator(s) 1110 may be used to facilitate transfer learning, such between training nodes 102 of FIG. 1A (e.g., of parameters of machine learning models 106 and/or 108).” Milletari col. 21 ll. 62-67.  
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Milletari with respect to this limitation because this allows the improvements of Chen (i.e. improving a student network using knowledge distillation) to be applied to the models located on the client devices, while minimizing transferring of models to and from client devices.) wherein local ML model is for estimating one or more first parameters at the client device, and the network ML model is for estimating one or more second parameters at the server; (This is written as an intended use and a non-limiting wherein clause. Intended use language is explained in MPEP §§ 2103 and 2111.02.  “Claim scope is not limited by claim language that suggests or makes optional but does not require steps to be performed, or by claim language that does not limit a claim to a particular structure.”  MPEP § 2111.04. ) and providing, to the server: the local ML model for the training epoch; and - an estimated value of each of the one or more first parameters output by the local ML model at the client device for the training epoch. (The claimed “providing to the server” is not interpreted to require transmission of the local model or the local model outputs. In the interest of compact prosecution, a reference is cited for the teaching of transmitting a local ML model and an estimated value of parameters output by the local ML model. 
Milletari teaches “In at least one embodiment, training node 102A may use model trainer 110A and model manager 112A to train machine learning model(s) 106A, training node 102B may use model trainer 110B and model manager 112B to train machine learning model(s) 106B, and training node 102N may use model trainer 110N and model manager 112N to train machine learning model(s) 106N. By training machine learning model(s) 106A, 106B, and 106C (also referred to as machine learning models 106), training nodes 102 may learn values of one or more parameters of machine learning models 106, which may in turn be used by training aggregator 104 to determine a value(s) of one or more corresponding parameters of machine learning model(s) 108.” Milletari col. 3 ll. 38-50. “In at least one embodiment, machine learning model(s) 108 may include a federated machine learning model collaboratively trained using training nodes 102 and machine learning models 106. In at least one embodiment, machine learning model(s) 108 may include a central model and each machine learning model 106 may include a local model used to train a central model of machine learning model(s) 108. . . . In at least one embodiment, interface manager 120 of training aggregator 104 may receive information from and/or provide information to training nodes 102, such as to facilitate collaborative training of machine learning model(s) 108. An example of such information includes values of parameters of machine learning models 106 (e.g., values of all parameters or a subset of parameters), as indicated in FIG. 1A.” Milletari col. 4 ll. 18-30. See also Milletari Fig. 1A. 

20. (Currently Amended) The method of claim 19 further comprising receiving, from the server, the feedback information related to the one of the plurality of hidden neural network layers of the neural network comprised in the network ML model. (This is obvious over the combination of Chen and Milletari. Chen teaches sending information related to hidden layers from a more accurate (teacher) model to a less accurate (student) model. “Lsoft is the soft loss using teacher’s prediction and u is the parameter to balance the hard and soft losses. It is known that a deep teacher can better fit to the training data and perform better in test scenarios. The soft labels contain information about the relationship between different classes as discovered by teacher. By learning from soft labels, the student network inherits such hidden information.” Chen P. 4. “Distillation transfers knowledge using only the final output. In [34], Romero et al. demonstrate that using the intermediate representation of the teacher as hint can help the training process and improve the final performance of the student.” Chen P. 5. See also Chen Fig. 1 showing outputs from various intermediate layers of the teacher being sent to corresponding layers of the student for comparison (i.e. for calculating the loss function.)
Milletari teaches “In at least one embodiment, training nodes 102 and training aggregator 104 may be used to collaboratively train Machine Learning Model (MLM) 108.” Milletari col. 3 ll. 28-30. “By training machine learning model(s) 106A, 106B, and 106C (also referred to as machine learning models 106), training nodes 102 may learn values of one or more parameters of machine learning models 106, which may in turn be used by training aggregator 104 to determine a value(s) of one or more corresponding parameters of machine learning model(s) 108.” Milletari col. 3 ll. 44-50. “In at least one embodiment, aggregation server(s) 504 include one or more portions of training aggregator 104.” Milletari col. 19 ll. 20-21. “FIG. 5 illustrates a network environment 500 for collaborative training, in at least one embodiment. In at least one embodiment, network environment 500 may include one or more client devices 502A and 502B through 502N (also referred to as “client devices 502”) and aggregation server(s) 504. In at least one embodiment, each client device 502A may include one or more training nodes 102.”  Milletari col. 19 ll. 10-16. “Parameter reviewer(s) 1102 and/or review aggregator(s) 1110 may be used to facilitate collaborative learning, such as federated learning. In at least one embodiment, parameter reviewer(s) 1102 and/or review aggregator(s) 1110 may be used to facilitate transfer learning, such between training nodes 102 of FIG. 1A (e.g., of parameters of machine learning models 106 and/or 108).” Milletari col. 21 ll. 62-67.  
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Milletari with respect to this limitation because this allows the improvements of Chen (i.e. improving a student network using knowledge distillation) to be applied to the models located on the client devices, while minimizing transferring of models to and from client devices.)

22. (Currently Amended) The method of claim 19 further comprising repeating the method for one or more additional training epochs. (“Our overall learning objective can be written as follows: 
    PNG
    media_image1.png
    200
    400
    media_image1.png
    Greyscale
 where N is the batch-size for RCN and M for RPN.” Chen P. 4. Note that “batch size” implies multiple iterations of a given batch are summed to ultimately train the model based on this objective function at a given iteration during backpropagation.)

23. (Currently Amended) The method of claim 19 wherein the one or more first parameters are the same as the one or more second parameters. (See Chen Fig. 1 showing the classification and regression soft labels of the teacher network being compared the corresponding classification and regression labels of the student network. Here, the comparison implies that the parameters of “the same” type (i.e. images.)) 

24. (Currently Amended) The method of claim 19 wherein the one or more first parameters are different than the one or more second parameters. (See Chen Fig. 1 showing the classification and regression soft labels of the teacher network being compared the corresponding classification and regression labels of the student network. Here, the comparison implies that the values of the parameters are different, which results in the error to be backpropagated.) 

Claims 19-20 and 22-24 are rejected under 35 U.S.C. 103 as being unpatentable over Chen (Learning Efficient Object Detection Models with Knowledge Distillation; 2017), Jung (US 2021/0056412, filed Feb 2020 based on provisional filed August 2019, different assignee), Milletari (US 11,804,050 filed Oct 2019, different assignee), and Qiu (Throughput Maximization for Polar Coded IR-HARQ Using Deep Reinforcement Learning; Oct 2020.)

25. (Currently Amended) The method of claim 19 wherein the plurality of client devices are User Equipments, UEs, in a cellular communications system, and the one or more first parameters comprise Hybrid Automatic Repeat Request, HARQ, throughput of the UEs. (The previously cited art does not discuss HARQ.
Qiu teaches “To ensure the reliability of data transmission, hybrid automatic repeat request (HARQ) techniques are widely used to improve the data throughput of wireless communication systems. This paper develops a polar coded incremental redundancy HARQ (IR-HARQ) scheme based on deep reinforcement learning (DRL) to combat the unexpected channel fluctuations in practice.” Qiu P. 1 (abstract.) “Specifically, the transmitter is taken as the agent, and the channel together with the receiver are jointly regarded as the environment. At each transmission step t, the transmitter observes a state st (i.e., the current channel signal-to-noise ratio (SNR) and the coded bits length), takes an action at (the number of IR bits), and then receives an immediate reward rt from the environment and transits to the next state st+i.” Qiu P. 1. “Simulation results verified the effectiveness of the proposed DDPG-based algorithm and showed that significant throughput gain can be achieved over conventional schemes” Qiu PP. 1-2. “As shown in Fig. 1, the basic idea of IR-HARQ is as follows: The transmitter first sends a codeword with an aggressive configuration (high code rate) into the channel for the initial transmission; If the codeword is not correctly received, additional parity bits are incrementally provided by the transmitter and sent to the receiver[.]” Qiu P. 2. “In this work, we aim to optimize the number of IR bits to maximize the throughput of the abovementioned adaptive IR-HARQ scheme.” Qiu P. 3. In the interest of compact prosecution, note that one of ordinary skill would understand Qiu as teaching implementing the techniques in the reference on any transmitter within the system (e.g. on cell phone or on the tower) because the reference explicitly names the transmitter as the agent in a DRL training scheme. 
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Qiu because applying the scheme of machine learning in the reference can improve throughput.)



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL M KNIGHT whose telephone number is (571) 272-8646. The examiner can normally be reached Monday - Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached on (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

PAUL M. KNIGHTExaminerArt Unit 2148




/PAUL M KNIGHT/Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Jun 13, 2022
Application Filed
Feb 27, 2026
Non-Final Rejection — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/468,498
Patent 12530592
NON-LINEAR LATENT FILTER TECHNIQUES FOR IMAGE EDITING
2y 5m to grant Granted Jan 20, 2026
18/017,589
Patent 12530612
METHODS FOR ALLOCATING LOGICAL QUBITS OF A QUANTUM ALGORITHM IN A QUANTUM PROCESSOR
2y 5m to grant Granted Jan 20, 2026
17/544,342
Patent 12499348
READ THRESHOLD PREDICTION IN MEMORY DEVICES USING DEEP NEURAL NETWORKS
2y 5m to grant Granted Dec 16, 2025
17/658,841
Patent 12462201
DYNAMICALLY OPTIMIZING DECISION TREE INFERENCES
2y 5m to grant Granted Nov 04, 2025
17/426,366
Patent 12456057
METHODS FOR BUILDING A DEEP LATENT FEATURE EXTRACTOR FOR INDUSTRIAL SENSOR DATA
2y 5m to grant Granted Oct 28, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
62%
Grant Probability
79%
With Interview (+17.0%)
3y 1m
Median Time to Grant
Low
PTA Risk
Based on 272 resolved cases by this examiner. Grant probability derived from career allow rate.