Last updated: July 17, 2026

Application No. 18/893,780

DEVICE AND METHOD FOR JOINT LOCAL AND REMOTE INFERENCE

Final Rejection §103

Filed

Sep 23, 2024

Priority

Mar 24, 2022 — continuation of PCTEP2022057740

Examiner

RASHID, ISHRAT

Art Unit

2459

Tech Center

2400 — Computer Networks

Assignee

Huawei Technologies Co., Ltd.

OA Round

2 (Final)

This examiner grants 59% of cases after interview

— +18.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 207 resolved cases, 2023–2026

Examiner Intelligence

RASHID, ISHRAT View full profile →

Grants 59% of resolved cases

Career Allowance Rate

123 granted / 207 resolved

+1.4% vs TC avg

Strong +18% interview lift

Without

With

+18.5%

Interview Lift

resolved cases with interview

Typical timeline

3y 5m

Avg Prosecution

14 currently pending

Career history

224

Total Applications

across all art units

Statute-Specific Performance

§101

1.1%

-38.9% vs TC avg

§103

90.9%

+50.9% vs TC avg

§102

5.7%

-34.3% vs TC avg

§112

1.5%

-38.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 207 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

This communication is in response to Amendments and Remarks filed on 16 April, 2026.
Claims 1-3 and 5-21 are pending.
Claims 1, 5, 6, 16, and 17 have been amended.
Claim 4 has been cancelled.
New claims 18-21 have been added.
Response to Arguments
35 USC § 112
The 35 USC § 112 rejections of claims 5-6 are withdrawn in view of the amendments filed on 16 April, 2026.
35 USC § 102
Regarding the prior art of record, Applicant argues that it does not teach the amended claim limitations. Examiner finds the argument persuasive. A new ground of rejection is presented herewith.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 7-17 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over “Distributed Deep Neural Networks over the Cloud, the Edge and End Devices” by Teerapittayanon et al, dated 2017, hereinafter NPL, in view of Zhu et al (US 2022/0114475).

Regarding claim 1, NPL teaches a device (NPL section III. A: “by performing a portion of the DNN inference computation on the device rather than sending the raw input to the cloud”) for processing a data sample to form a predicted output (NPL section III. A: “The configurations presented show how DDNN can scale the inference computation across different physical devices”; inference meaning that a prediction is generated based on the data input), the device comprising a processor, wherein the device is configured to: 
receive the data sample (NPL section I: "that capture a large quantity of input data in a streaming fashion” being implied that for inference data has to be received); 
input the data sample and/or one or more of any intermediate outputs derived from the data sample to a learnable control function (NPL section III.D: “We use a normalized entropy threshold as the confidence criteria”, which uses intermediate outputs “C is the set of all possible labels and x is a probability vector”); 
and in dependence on an output of the learnable control function (NPL section III. D: “This normalized entropy n has values between 0 and 1 which allows easier interpretation and searching of its corresponding threshold T”) perform one of the following: 
(i) process the data sample to form the predicted output using a first model stored locally at the device (NPL section III. A: “Using an exit point after device inference, we may classify those samples which the local network is confident about, without sending any information to the cloud”, wherein the model used is stored in the device); and/or 
(ii) send the data sample and/or the one or more of any intermediate outputs derived from the data sampleFor more difficult cases, the intermediate DNN output (up to the local exit) is sent to the cloud, where further inference is performed using additional NN layers and a final classification decision is made”).
NPL teaches the above, but NPL does not explicitly teach a predicted output for at least one of image analytics, face recognition, object recognition, natural language processing, and regression type predictions, wherein the one or more of any intermediate outputs derived from the data sample are one or more intermediate outputs of the first model stored locally at the device. However, in a similar field of endeavor, Zhu teaches a predicted output for at least one of image analytics, face recognition, object recognition, natural language processing, and regression type predictions, wherein the one or more of any intermediate outputs derived from the data sample are one or more intermediate outputs of the first model stored locally at the device (Zhu [0040] provides “In this example, each client 12 is a respective hospital that wants to jointly train a central machine learning model for cervical image analysis. During training, the central machine learning model receives a cervical image as input, and outputs a prediction of whether the corresponding patient has cervical cancer”; Zhu [0042] provides “Each client 22 stores a respective local dataset (local datasets A, B, C, D 28A, 28B, 28C, 28D, generally referred to as local dataset 28). Each client 22 also executes a machine learning algorithm to learn the parameters of a local machine learning model (local models A, B, C, D, 26A, 26B, 26C, 26D, generally referred to as local model 26) using the respective local dataset 28”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Zhu of a predicted output stored locally as communication cost of transferring a large amount of data can be significant, especially when computing systems of end users (who are often the source of the collected data) have limited bandwidth. Further, centralizing data in a centralized datacenter or central server, where machine learning takes place, may be privacy-intrusive, and may risk a leak of an end user's private data. Moreover, the robustness of the machine learning system depends on the robustness of the centralized datacenter or central server (or cloud provider in some implementations). If the centralized datacenter or central server (or cloud provider) fails, the entire system abruptly fails without signs of gradual degradation and the training process (Zhu [0003]).

Regarding claim 2, the device of claim 1, wherein, processing the data sample to form the predicted output using the first model stored locally at the device is in response to the output of the learnable control function exceeding a threshold (NPL section III. D: “At each exit point, n is computed and compared against T in order to determine if the sample should exit at that point”, meaning that the exit point prediction is used; “At a given exit point, if the predictor is not confident in the result (i.e., n > T), the system falls back to a higher exit point in the hierarchy until the last exit is reached which always performs classification”; Section D). 

Regarding claim 3, the device of claim 1, wherein sending the data sample and/or the one or more of the any intermediate outputs derived from the data sample to the remote location for the input to the second model stored at the remote location to form the predicted output is in response to the output of the learnable control function not exceeding a threshold (NPL section D, F).

Regarding claim 7, the device of claim 1, wherein the first model has lower computational requirements and/or a lower storage size requirement than the second model (NPL section I “end devices such as embedded sensor nodes often have limited memory and battery budgets”, which “makes it an issue to fit models on the devices that meet the required accuracy and energy constraints”, wherein it can be reasonably interpreted that the first model stored in the edge device has lower computational requirements and/or a lower storage size requirement than the second model).

Regarding claim 8, the device of claim 1, wherein the first model comprises fewer convolutional layers than the second model (NPL Section I “An example of one such distributed approach is to combine a small NN model (less number of parameters) on end devices and a larger NN model (more number of parameters) in the cloud”, which can be interpreted both as fewer layers or layers with a smaller depth; Figures 3 and 4).

Regarding claim 9, the device of claim 1, wherein the learnable control function is configured to form the output based on features extracted from the data sample (NPL section I “a system could train a single end-to-end model, such as a DNN, and partition it between end devices and the cloud”).

Regarding claim 10, the device of claim 1, wherein the learnable control function is configured to be optimized in dependence on a series of data samples and their respective true outputs associated with the series of data samples (NPL section I “A joint training method that minimizes communication and resource usage for devices and maximizes usefulness of extracted features which are utilized in the cloud, while allowing low-latency classification via early exit for a high percentage of input samples”).

Regarding claim 11, the device of claim 10, wherein the first model and the second model are learnable models, each of the first and second models being configured to be optimized in dependence on the series of data samples and their respective true outputs associated with the series of data samples (NPL section III.A: “DDNN relies on a jointly trained DNN framework at all parts in the neural network, for both training and inference”; section III.C: “Let be y a one-hot ground-truth label vector, be an input sample x C and be the set of all possible labels. For each exit, the softmax cross entropy objective function can be written as (...) “).

Regarding claim 12, the device of claim 1, wherein the remote location is a cloud server (NPL section III.A: “For more difficult cases, the intermediate DNN output (up to the local exit) is sent to the cloud”).

Regarding claim 13, the device of claim 1, wherein the learnable control function is a neural network comprising one or more convolutional layers (NPL Section I “An example of one such distributed approach is to combine a small NN model (less number of parameters) on end devices and a larger NN model (more number of parameters) in the cloud”, which can be interpreted both as fewer layers or layers with a smaller depth; Figures 3 and 4).  

Regarding claim 14, the device of claim 1, wherein the input data sample is an image or a time series of data (NPL section IV.B: “This dataset consists of images acquired at the same time from six cameras placed at different locations facing the same general area”).  

Regarding claim 15, the device of claim 1, wherein the device is a network node or an edge device in a communications network (NPL section II.A: “must be processed locally at the devices or at the edge, for otherwise the total amount of sensor data for a centralized cloud would overwhelm the communication network bandwidth”).

Regarding claim 16, this claim contains limitations found within those of claim 1, and the same rationale of rejection applies, where applicable.

Regarding claim 17, this claim contains limitations found within those of claim 1, and the same rationale of rejection applies, where applicable.

Regarding claim 19, the device of claim 1, wherein the learnable control function is trained based on gradient descent steps (Zhu [0043]). Motivation provided with reference to claim 1.

Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over “Distributed Deep Neural Networks over the Cloud, the Edge and End Devices” by Teerapittayanon et al, dated 2017, hereinafter NPL, in view of Zhu et al (US 2022/0114475), further in view of Beaufays et al (US 2023/0177382).

Regarding claim 18, NPL-Zhu has taught the device of claim 6, but NPL-Zhu does not explicitly teach wherein the first model encodes the data sample in parallel with the data sample being input to the learnable control function (Beaufays [0004] provides “For example, assume audio data capturing a spoken utterance of a human user of “Hey Assistant, turn on the kitchen lights” is generated via microphone(s) of a given client device of the human user. In this example, the audio data capturing the spoken utterance may correspond to the client data. Further, the audio data may be processed using an on-device automatic speech recognition (ASR) model stored in on-device memory of the given client device to generate ASR data, such as one or more speech hypotheses that are predicted to correspond to the spoken utterance captured in the audio data. In this example the one or more speech hypotheses may correspond to the predicted output. Moreover, a gradient may be generated based on processing the audio data and/or one or more of the speech hypotheses using one or more semi-supervised or self-supervised learning techniques described herein”). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Beaufays of encoding data sample for inputting into a type of model in the client device, as using the on-device ML model, it can process user input detected at the client device to generate predicted output, and can generate a gradient based on the predicted output in a supervised or unsupervised manner (Beaufays [0001]).

Claims 5-6 and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over “Distributed Deep Neural Networks over the Cloud, the Edge and End Devices” by Teerapittayanon et al, dated 2017, hereinafter NPL, in view of Zhu et al (US 2022/0114475), further in view of Lee at al (US 2023/0368527).

Regarding claim 5, NPL-Zhu has taught the device of claim 1, but NPL-Zhu does not explicitly teach wherein the first model comprises multiple parts and one intermediate output of the one or more intermediate outputs comprises an output of a first part of the multiple parts of the first model, and wherein the multiple parts are deep neural network models that are sequentially operatable. However, in a similar field of endeavor, Lee teaches wherein the first model comprises multiple parts and one intermediate output of the one or more intermediate outputs comprises an output of a first part of the multiple parts of the first model, and wherein the multiple parts are deep neural network models that are sequentially operatable (Lee [0101] provides “In some implementations, the user computing device 102 can store or include one or more machine-learned models 120 (e.g., one or more machine-learned tag generation models). For example, the machine-learned models 120 can be or can otherwise include various machine-learned models such as neural networks (e.g., deep neural networks) or other types of machine-learned models, including non-linear models and/or linear models. Neural networks can include feed-forward neural networks, recurrent neural networks (e.g., long short-term memory recurrent neural networks), convolutional neural networks or other forms of neural networks. Example machine-learned models 120 are discussed with reference to FIGS. 2A-5B & 9-26”; Lee [0096] provides “Another example of technical effect and benefit relates to improved computational efficiency and improvements in the functioning of a computing system. For example, the systems and methods disclosed herein can leverage on-device machine-learned models and functions to process locally on the device. Processing locally on the device can limit the data that is transmitted over a network to a server computing system for processing, which can be more friendly to users with limited network access”). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to incorporate the teachings of Lee to limit transmission of data to a server, to accommodate clients with limited network access (Lee [0096]).

Regarding claim 6, the device of claim 5, wherein the first part of the first model is configured to encode the data sample for sending the encoded data sample to the remote location (Lee [0118] provides “As another example, the machine-learned model(s) can process the image data to generate an encoded image data output (e.g., an encoded and/or compressed representation of the image data, etc.). As another example, the machine-learned model(s) can process the image data to generate an upscaled image data output. As another example, the machine-learned model(s) can process the image data to generate a prediction output”). Motivation provided with reference to claim 5.

Regarding claim 20, NPL-Zhu has taught the method of claim 16, but NPL-Zhu does not explicitly teach wherein processing the data sample to form the predicted output using the first model stored locally at the device is in response to the output of the learnable control function exceeding a threshold. In a similar field of endeavor, Lee teaches wherein processing the data sample to form the predicted output using the first model stored locally at the device is in response to the output of the learnable control function exceeding a threshold (Lee [0096] provides “Another example of technical effect and benefit relates to improved computational efficiency and improvements in the functioning of a computing system. For example, the systems and methods disclosed herein can leverage on-device machine-learned models and functions to process locally on the device. Processing locally on the device can limit the data that is transmitted over a network to a server computing system for processing, which can be more friendly to users with limited network access”, wherein it can be interpreted that data is not transmitted if the limit is crossed). Motivation provided with reference to claim 5.

Regarding claim 21, the method of claim 16, wherein sending the data sample and/or the one or more of the any intermediate outputs derived from the data sample to the remote location for the input to the second model stored at the remote location to form the predicted output is in response to the output of the learnable control function not exceeding a threshold (Lee [0096] provides “Another example of technical effect and benefit relates to improved computational efficiency and improvements in the functioning of a computing system. For example, the systems and methods disclosed herein can leverage on-device machine-learned models and functions to process locally on the device. Processing locally on the device can limit the data that is transmitted over a network to a server computing system for processing, which can be more friendly to users with limited network access”, wherein it can be interpreted that data is transmitted if the limit is not crossed). Motivation provided with reference to claim 5.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Rafii US 11,783,813.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ISHRAT RASHID whose telephone number is (571)272-5372. The examiner can normally be reached 10AM-6PM EST M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tonia L Dollinger can be reached at 571-272-4170. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/I.R/Examiner, Art Unit 2459                                                                                                                                                                                                        /SCHQUITA D GOODWIN/Primary Examiner, Art Unit 2459

Read full office action

Prosecution Timeline

Sep 23, 2024

Application Filed

Jan 22, 2026

Non-Final Rejection mailed — §103

Apr 16, 2026

Response Filed

Jul 02, 2026

Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/324,942

Patent 12672025

LOCATION-BASED OPTIMIZATION OF QUALITY OF SERVICE NETWORK PERFORMANCE

3y 1m to grant Granted Jun 30, 2026

17/825,584

Patent 12664214

Systems and Methods for Automatic Generation of Social Media Networks and Interactions

4y 0m to grant Granted Jun 23, 2026

18/708,443

Patent 12665952

DISCOVERY OF EDGE APPLICATION SERVER ACROSS NETWORKS

2y 1m to grant Granted Jun 23, 2026

18/363,044

Patent 12647826

COMPUTERIZED SYSTEMS AND METHODS FOR NON-DISRUPTIVE OFF-CHANNEL SCANNING VIA MLO FUNCTIONALITY

2y 10m to grant Granted Jun 02, 2026

18/306,350

Patent 12609884

METHOD AND APPARATUS FOR SENDING ROUTE CALCULATION INFORMATION, DEVICE, AND STORAGE MEDIUM

2y 12m to grant Granted Apr 21, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

59%

Grant Probability

78%

With Interview (+18.5%)

3y 5m (~1y 7m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 207 resolved cases by this examiner. Grant probability derived from career allowance rate.