Prosecution Insights
Last updated: April 19, 2026
Application No. 18/463,108

EFFICIENT COMMUNICATION AND COMPUTATION FOR SPLIT-TRAINING OF MACHINE LEARNING MODELS

Non-Final OA §102§112
Filed
Sep 07, 2023
Examiner
SHIFERAW, ELENI A
Art Unit
2497
Tech Center
2400 — Computer Networks
Assignee
Qualcomm Incorporated
OA Round
1 (Non-Final)
37%
Grant Probability
At Risk
1-2
OA Rounds
5y 1m
To Grant
73%
With Interview

Examiner Intelligence

Grants only 37% of cases
37%
Career Allow Rate
49 granted / 132 resolved
-20.9% vs TC avg
Strong +36% interview lift
Without
With
+35.5%
Interview Lift
resolved cases with interview
Typical timeline
5y 1m
Avg Prosecution
10 currently pending
Career history
142
Total Applications
across all art units

Statute-Specific Performance

§101
14.5%
-25.5% vs TC avg
§103
49.7%
+9.7% vs TC avg
§102
18.1%
-21.9% vs TC avg
§112
9.5%
-30.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 132 resolved cases

Office Action

§102 §112
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claims 1-30 are pending. Claim Interpretation The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked. As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph: (A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; (B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and (C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function. Such claim limitation(s) is/are: “ means for accessing” “means for generating” , “means for transmitting” , “means for determining” , “means for performing” in claim 30. Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof. If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function. The specification provides corresponding adequate structure as follows: “means for accessing an element of input data” : Figure 8: Processing system 800 with memory 824 ; p aragraph [0150]: “At block 705, a first element of input data for a first portion of a neural network is accessed.” ; and Paragraph [0181]: Memory 824 includes training data . “means for generating an element of output data” : Figure 8: Training component 824D and training circuit 829 ; Paragraph [0187]: “The training component 824D and/or the training circuit 829 may generate features (e.g., feature tensor 210 of FIGS. 2A and 2B) by processing input data” ; Paragraph [0151]: “At block 710, a first element of output data is generated based on processing the first element of input data using the first portion of the neural network.” ; and Figures 2A-2D: Workflows showing feature generation . means for transmitting the element of output data” : Figure 8: Communication component 824C and communication circuit 828 ; p aragraph [0186]: “The communication component 824C and/or the communication circuit 828 may be used to communicate relevant data during training” ; Paragraph [0187]: “The communication component 824C and/or the communication circuit 828 may be used to exchange management data with the other participating system(s)” ; Paragraph [0177]: Wireless connectivity component 812 with antennas 814 . “means for determining that communication criteria are not satisfied” : Figure 8: State component 824A and state circuit 826 ; Paragraph [0184]: “The state component 824A and/or the state circuit 826 may be used to determine or select communication or training states” ; Paragraph [0103-0107]: Method 400 describing evaluation of communication criteria . Claim Rejections - 35 USC § 112 The following is a quotation of 35 U.S.C. 112(b): (b ) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the appl icant regards as his invention. Claim 30 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim limitation “ means for performing reduced communication training’ invokes 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. While the specification provides some structural components (state component 824A, compression component 824B, communication component 824C, training component 824D per paragraph [0182]; state circuit 826, compression circuit 827, communication circuit 828, training circuit 829 per paragraph [0183]), these components are described at a high level of generality without sufficient technical detail to clearly correspond to the claimed functions. Specifically: Generic component labels: The specification identifies components such as “state component 824A,” “compression component 824B,” and “communication component 824C,” but these are generic labels without specific technical detail regarding: What specific algorithms or logic these components implement What specific hardware configurations these components comprise How these components interact to perform the claimed functions Lack of specific algorithm description: The specification does not provide: Pseudocode or flowcharts showing the specific steps for “performing reduced communication training” Mathematical formulas or algorithms for determining which technique to use Specific decision logic for selecting between asynchronous training and dynamic compression Insufficient detail for final limitation: The specification particularly lacks sufficient structural detail for the final limitation “means for performing reduced communication training.” The specification describes multiple different techniques (asynchronous training, dynamic compression, combinations) but does not provide sufficient detail regarding: How the system selects which technique to use What specific structure implements each technique How the system transitions between techniques What specific hardware or software performs the reduction of transmitted data Ambiguity in component functionality: The specification does not clearly explain: Whether “state component 824A” and “state circuit 826” perform the function of selecting which technique to use Whether “compression component 824B” and “compression circuit 827” perform the function of reducing data through compression Whether “communication component 824C” and “communication circuit 828” perform the function of reducing data through asynchronous transmission How these components work together to perform “reduced communication training” Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph. (for prior art purpose and claim interpretation please see the 102 rejection) Applicant may: (a) Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph; (b) Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or (c) Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)). If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: (a) Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or (b) Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181. Claim 30 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Claim 30 is rejected under 35 U.S.C. § 112(b) as indefinite. Specifically, the final limitation—“ means for, in response to the determining, performing reduced communication training of the neural network, the performing comprising reducing an amount of data transmitted by the first processing system for one or more rounds of training ”—is indefinite because the phrase “ reducing an amount of data transmitted ” is vague and does not clearly define the scope of the claimed invention. The phrase “reducing an amount of data transmitted” is ambiguous and could encompass multiple different techniques without clear boundaries. Specifically: Ambiguity regarding technique : The phrase “reducing an amount of data transmitted” could mean: Reducing the frequency of data transmission (asynchronous training: omitting transmissions for one or more rounds) Reducing the size of individual data transmissions (dynamic compression: compressing data before transmission) Reducing both frequency and size (combination of asynchronous training and compression) Any other technique that results in less total data transmitted Scope uncertainty : The specification describes asynchronous training (FIGS. 2B-2D, paragraphs [0058]-[0085]) and dynamic compression (FIGS. 3A-3B, paragraphs [0087]-[0101]), but the claim language does not clearly indicate whether both techniques are encompassed, only one is encompassed, or other unspecified techniques are encompassed. Potential for broader scope than intended : The phrase “reducing an amount of data transmitted” could potentially encompass techniques not described in the specification, such as: Quantization of data Pruning of model parameters Sampling of training data Differential privacy techniques Other data reduction methods Vagueness of “one or more rounds of training” : The phrase “one or more rounds of training” is indefinite because it does not clearly define: Whether the rounds must be consecutive or can be non-consecutive Whether the reduction applies to all subsequent rounds or only some rounds What constitutes a “round” of training (one iteration, one epoch, or other definition) A person skilled in the art would not understand the metes and bounds of the claim because the claim language is too broad and vague to clearly define what techniques are encompassed by “reducing an amount of data transmitted” and what the scope of the claimed invention is. The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claim 30 is rejected under 35 U.S.C. § 112(a) for failure to comply with the written description requirement. The specification does not provide a written description of the invention that enables a person skilled in the art to understand the scope of the claimed invention, particularly with respect to the final limitation “ means for, in response to the determining, performing reduced communication training of the neural network, the performing comprising reducing an amount of data transmitted by the first processing system for one or more rounds of training. ” Under 35 U.S.C. § 112(a), the specification must contain a written description of the invention that is sufficient to convey to a person skilled in the art the scope of the claimed invention. Enablement v. Genentech, Inc., 111 F.3d 1461 (Fed. Cir. 1997). The written description must: Clearly identify what the claimed invention is Provide sufficient detail so a person skilled in the art understands what is claimed Provide adequate support for the scope of the claims Deficiency 1: Lack of clear identification of “reduced communication training” The specification describes multiple different techniques without clearly identifying what “reduced communication training” is: Asynchronous training (FIGS. 2B-2D, paragraphs [0052]-[0085]): The specification describes three different asynchronous states (reduced downlink, reduced uplink, bidirectional reduced) where data transmission is omitted or reduced in frequency. Dynamic compression (FIGS. 3A-3B, paragraphs [0087]-[0101]): The specification describes selecting compression operations with different compression rates and decompressing data. Combinations (paragraph [0023]): The specification states that “reduced communication training can be performed using asynchronous training, dynamic compression, or a combination of both asynchronous training and dynamic compression.” The specification does not clearly define what “reduced communication training” is or which techniques are encompassed by the term. A person skilled in the art would not understand whether the claim covers all techniques described in the specification or only some techniques. Deficiency 2: Lack of sufficient detail regarding how “reduced communication training” is performed The specification does not provide sufficient detail regarding: Specific algorithms : The specification does not provide pseudocode, flowcharts, or mathematical formulas describing how “reduced communication training” is performed. Specific hardware structures : The specification does not provide specific circuit diagrams, processor configurations, or memory layouts describing what hardware performs “reduced communication training.” Specific decision logic : The specification does not clearly explain: How the system decides whether to use asynchronous training or dynamic compression What criteria trigger the selection of one technique over another How the system transitions between techniques What happens when the system is in a particular state Specific implementation details : The specification does not provide sufficient detail regarding: How features are omitted in the reduced uplink state (FIG. 2C) How gradients are omitted in the reduced downlink state (FIG. 2B) How compression operations are selected (paragraph [0092], [0099]) How decompression is performed (paragraphs [0093], [0100]) Deficiency 3: Inadequate support for claim scope The claim recites “reducing an amount of data transmitted,” which is a broad functional term that could encompass many different techniques. The specification provides examples of asynchronous training and dynamic compression, but does not clearly define the boundaries of what techniques are encompassed by “reducing an amount of data transmitted.” For example, the specification does not address whether the following techniques would be encompassed by the claim: Quantization of intermediate layer outputs Pruning of model parameters before transmission Sampling of training data Differential privacy techniques Other data reduction methods not described in the specification Without a clear written description of what “reduced communication training” is and which techniques are encompassed by the term, a person skilled in the art would not understand the scope of the claimed invention. Deficiency 4: Lack of clarity regarding “one or more rounds of training” The specification does not clearly define what “one or more rounds of training” means or what the scope of this limitation is. The specification uses terms such as “iterations,” “epochs,” and “rounds,” but does not clearly define whether these terms are synonymous or have different meanings. Additionally, the specification does not clearly explain: Whether “one or more rounds” means consecutive rounds or non-consecutive rounds Whether the reduction applies to all subsequent rounds or only some rounds What constitutes a “round” of training Without clear definition of these terms, a person skilled in the art would not understand the scope of the claim. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale , or otherwise available to the public before the effective filing date of the claimed invention. Claim(s) 1-30 is/are rejected under 35 U.S.C. 10 2(a)(1) as being anticipated by Chen et al., “Communication and Computation Reduction for Split Learning using Asynchronous Training” (IEEE 2021) (herein after Chen) . Regarding claims 1 and 29, Chen , teaches a first processing system and o ne or more non-transitory computer-readable media comprising: one or more memories comprising processor-executable instructions; and one or more processors configured to execute the processor-executable instructions ( see Sec. I, Fig. 1(a) : a split learning system with a client (edge device) and server, each with processing and memory resources. The client corresponds to the “first processing system”; Sec. IV-A describing client hardware as Intel-i7 CPU ) and cause the first processing system to: access a first element of input data for a first portion of a neural network ( Algorithm 1, lines 4–9 ; Client accesses its local dataset (x, y) and processes through client-side model Sec. I, Fig. 1(b), “forward propagation till cut layer” ) ; generate, by the first processing system, a first element of output data, wherein, to generate the first element of output data, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to process the first element of input data using the first portion of the neural network ( Client computes activation of the cut layer (output data) by processing input through WC(·) (Algorithm 1, split_forward, lines 21–22 ; Sec. I, Fig. 1(b) ) ; transmit, from the first processing system and at a first point in time, the first element of output data to a second processing system for the second processing system to update one or more parameters of a second portion of the neural network based on the first element of output data ( Client sends activation to server; server continues forward/backprop to update server-side model WS(·) (Algorithm 1, lines 22–23, 30–31; Sec. I, Fig. 1(b)) ) ; determine, by the first processing system at a second point in time subsequent to the first point in time, that one or more communication criteria are not satisfied ( Loss-based asynchronous training compares loss drop Δloss to threshold l_thred to decide whether to update client-side model or skip communication (Algorithm 1, update_state, lines 38–50; Sec. III-A, Fig. 2). This is a communication criterion. ) ; and in response to the determining, perform reduced communication training of the neural network, wherein, to perform the reduced communication training of the neural network, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to reduce an amount of data transmitted by the first processing system for one or more rounds of training ( If Δloss ≤ threshold, state changes to B or C, reducing transmission of activations/gradients (Fig. 2; Algorithm 1, lines 46–50). Quantization further reduces data size (Sec. III-B, Algorithm 2 ) . Regarding claims 15, Chen teaches a processor-implemented method comprising: accessing a first element of input data for a first portion of a neural network (Algorithm 1, lines 4–9 ; Client accesses its local dataset (x, y) and processes through client-side model Sec. I, Fig. 1(b), “forward propagation till cut layer” ) ; generating, by a first computing system, a first element of output data based on processing the first element of input data using the first portion of the neural network (Client computes activation of the cut layer (output data) by processing input through WC(·) (Algorithm 1, split_forward, lines 21–22 ; Sec. I, Fig. 1(b)) ; transmitting, from the first computing system and at a first point in time, the first element of output data to a second computing system for the second computing system to update one or more parameters of a second portion of the neural network based on the first element of output data ( Client sends activation to server; server continues forward/backprop to update server-side model WS(·) (Algorithm 1, lines 22–23, 30–31; Sec. I, Fig. 1(b) ) ; determining, by the first computing system at a second point in time subsequent to the first point in time, that one or more communication criteria are not satisfied (Loss-based asynchronous training compares loss drop Δloss to threshold l_thred to decide whether to update client-side model or skip communication (Algorithm 1, update_state, lines 38–50; Sec. III-A, Fig. 2). This is a communication criterion. ; and in response to the determining, performing reduced communication training of the neural network, the performing comprising reducing an amount of data transmitted by the first computing system for one or more rounds of training (If Δloss ≤ threshold, state changes to B or C, reducing transmission of activations/gradients (Fig. 2; Algorithm 1, lines 46–50). Quantization further reduces data size (Sec. III-B, Algorithm 2) . Regarding claim 30 , Chen teaches a first processing system, comprising: means for accessing an element of input data for a first portion of a neural network (Algorithm 1, lines 4–9 ; Client accesses its local dataset (x, y) and processes through client-side model Sec. I, Fig. 1(b), “forward propagation till cut layer” ) ; means for generating, by the first processing system, an element of output data based on processing the element of input data using the first portion of the neural network (Client computes activation of the cut layer (output data) by processing input through WC(·) (Algorithm 1, split_forward, lines 21–22 ; Sec. I, Fig. 1(b)) ; means for transmitting, from the first processing system and at a first point in time, the element of output data to a second processing system for the second processing system to update one or more parameters of a second portion of the neural network based on the element of output data ( Client sends activation to server; server continues forward/backprop to update server-side model WS(·) (Algorithm 1, lines 22–23, 30–31; Sec. I, Fig. 1(b) ) ; means for determining, by the first processing system at a second point in time subsequent to the first point in time, that one or more communication criteria are not satisfied (Loss-based asynchronous training compares loss drop Δloss to threshold l_thred to decide whether to update client-side model or skip communication (Algorithm 1, update_state, lines 38–50; Sec. III-A, Fig. 2). This is a communication criterion. ) ; and means for, in response to the determining, performing reduced communication training of the neural network, the performing comprising reducing an amount of data transmitted by the first processing system for one or more rounds of training (If Δloss ≤ threshold, state changes to B or C, reducing transmission of activations/gradients (Fig. 2; Algorithm 1, lines 46–50). Quantization further reduces data size (Sec. III-B, Algorithm 2) . Regarding claim 2, Chen teaches the first processing system of claim 1, wherein the one or more communication criteria comprise at least one of : a state of one or more communication links between the first and second processing systems, resource availability on at least one of the first processing system or the second processing system, one or more characteristics of training data used to train the neural network, or one or more indications of training progress for the neural network ( Criterion is training progress via loss drop threshold (Sec. III-A, lines “loss difference with last update > threshold”); also notes communication cost scaling with data size and cut layer (Sec. II) … Chen explicitly uses a training progress criterion (change in loss) to control client updates and thereby communication (loss based asynchronous training; compare Δloss to l_thred). See Sec. III A (“Loss based Asynchronous training”), Fig. 2 (state diagram), and Algorithm 1 (update_state; lines that compute ∆loss and compare to lthred ) . Regarding claim 3, Chen teaches the first processing system of claim 1, wherein: the first portion of the neural network comprises one or more initial layers of the neural network ( Sec. I, Fig. 1 : Client-side model = initial layers ) , the first element of output data comprises a feature tensor output by an intermediate layer of the neural network ( Sec. I, Fig. 1 : output = activation tensor at cut layer (feature tensor) ) , and the second portion of the neural network comprises one or more final layers of the neural network ( Sec. I, Fig. 1 : server-side model = remaining layers “cut layer” …. Chen splits the model into a client side portion (initial layers) and a server side portion (remaining/final layers); the client computes forward propagation up to the cut layer and produces the activation (feature tensor) which it sends to the server. See Sec. I (introduction of split learning), Fig. 1(a)–(b) (split diagram), and Algorithm 1 split_forward (client computes act ← WC(x); Send act, y to server). Sec. III A also describes the client producing and sending cut layer activation to server. ) . Regarding claim 4, Chen teaches the first processing system of claim 3, wherein the one or more processors are configured to further execute the processor-executable instructions to cause the first processing system to update one or more parameters of the first portion of the neural network based on a set of gradients received at the first processing system from the second processing system ( In state A, server sends gradients to client, which backpropagates to update WC(·) (Algorithm 1, lines 32–36) … Algorithm 1 (split_backward) shows the server computing gradients and—when state = A—sending grad to clients and the client performing client backward(grad) and Update WC (Algorithm 1, split_backward lines that perform Send grad to clients; client backward(grad); Update WC). Sec. III A describes that in state = A, client receives gradient and updates client side model. Thus Chen teaches receiving gradients and updating the client side parameters ) . Regarding claim 5, Cheng teaches the first processing system of claim 1, wherein: the first portion of the neural network comprises one or more final layers of the neural network ( split learning where cut can be anywhere; gradients from server to client (Sec. I, Fig. 1(b) … splitting a neural network into two portions (client-side and server-side) at a partition (cut) layer and explicitly states that either portion may contain any number of layers depending on the chosen cut-point. Chen explains that the model is partitioned into an initial portion and a final portion and that the partition point may be selected to balance objectives (e.g., latency, computation). See Chen, Sec. I (Overview of split learning; description of portions 125A and 125B and partition point), Fig. 1 (showing client-side and server-side model portions ) , the first element of output data comprises a gradient tensor In state A, server sends gradients to client, which backpropagates to update WC(·) (Algorithm 1, lines 32–36) … ; split learning forward/backward flow: the party that holds the final layers (server-side) computes the forward pass, computes loss, performs backward propagation on its portion, and thereby produces gradient tensors at the cut layer that are then transmitted to the other party for use in updating its portion. Chen’s pseudocode explicitly shows the server computing gradients and sending them to the client as grad. See Algorithm 1, split_backward … ; … “grad ← server backward(loss); Update WS; if state = A then Send grad to clients; client backward(grad); Update WC;” (Algorithm 1, split_backward) ; …. Algorithm 1 (split_backward), Sec. III‑A text describing forward/backward and transmission of gradients … quantization/compression of activations/gradients prior to transmission (Sec. III‑B), indicating that gradients are treated as transmitted “output data” in the split learning flow. See Chen, Sec. III‑B (“the activations/gradients are quantized using 8‑bit floating point prior to transmission” ) , and the second portion of the neural network comprises one or more initial layers of the neural network ( split learning where cut can be anywhere; gradients from server to client (Sec. I, Fig. 1(b) ; … t he complementary portion as the client-side model (initial layers), which receives gradient tensors from the server and uses them to update its parameters. See Fig. 1 (client/server split) and Algorithm 1 (split_forward: client computes act ← WC(x); split_backward: client backward(grad); Update WC). … Chen, Algorithm 1 (split_forward and split_backward ) ) . Regarding claim 6, Chen teaches the first processing system of claim 5, wherein the one or more processors are configured to further execute the processor-executable instructions to cause the first processing system to: generate a second element of output data based on processing the first element of input data using the first portion of the neural network ( When client-side model updates, it processes new data to update weights (Algorithm 1, lines 33–36) ;….. after receiving the activation (the “first element of input data” for the server/final portion), the server processes it using its portion WS(·) to produce model output and a loss value: Algorithm 1, split_forward: “y′ ← WS(act);” and “loss ← f(y, y′);” (lines 25–27) ; … Sec. III A (Loss based Asynchronous training) describes the same forward computation on the server side following receipt of the activation from the client. ;… The “second element of output data” in the claim reads on Chen’s forward pass result produced by the first portion (server/final layers), namely the model output y′ and/or the computed loss used to drive updates. ) ; and update one or more parameters of the first portion of the neural network based on the second element of output data ( When client-side model updates, it processes new data to update weights (Algorithm 1, lines 33–36) ; …. the server (holding the final layers, i.e., the first portion in claim 5) updates its own parameters using the loss computed from its forward output: Algorithm 1, split_backward: “grad ← server backward(loss); Update WS;” (lines 30–31). Sec. I and Sec. III A explain that the server continues forward propagation and then backpropagation through its portion before sending gradients back to the client. Thus, the server updates parameters of its own (final) portion “based on the second element of output data” (i.e., using the loss derived from y ) ) . Regarding claim 7, Chen teaches the first processing system of claim 1, wherein, to perform the reduced communication training of the neural network, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to refrain from transmitting ( three communication states (A, B, C) that control whether activations and/or gradients are transmitted between client and server. In particular, Chen defines state = C as a no communication state: “state = C has no communication with server.” (Sec. III A, Fig. 2; Algorithm 1, split_forward / split_backward logic) ) at least a second element of output data from the first processing system to the second processing system ( State C = no activation sent, no gradient sent (Fig. 2; Algorithm 1, lines 46–50) ; … epoch example: “If the client-side model still does not update in epoch n+2, then the activation in epoch n+2 is exactly identical to that of epoch n+1, so the activation is not sent to the server, and the communication due to activation is also saved (state = C) Sec. III‑A ; … state = C (and in certain transitions to B), the client refrains from sending activation tensors and/or the server refrains from sending gradients. ) . Regarding claim 8, Chen teaches the first processing system of claim 7, wherein, to perform the reduced communication training of the neural network, the one or more processors are configured to execute the processor-executable instructions to further cause the first processing system to: receive a second element of input data from the second processing system ( State B = activation sent, no gradient sent; server continues training with stored activations (Sec. III-A, Fig. 2; Algorithm 1, lines 18–24 ; … in split learning, the server computes gradients and transmits them to the client (the client is the “first processing system” in the claim). Algorithm 1 (split_backward) shows the server performing backpropagation and sending gradients to clients when communication state = A ) ; and update one or more parameters of the first portion of the neural network based on the second element of input data ( State B = activation sent, no gradient sent; server continues training with stored activations (Sec. III-A, Fig. 2; Algorithm 1, lines 18–24 ; … the client using the received gradient to perform the backward pass and update the client side model parameters ; … “client backward(grad); Update WC;” (Algorithm 1, split_backward). This corresponds directly to receiving a gradient tensor (the “second element of input data”) and updating the first portion (client side) parameters based on that gradient ) . Regarding claim 9, Chen teaches the first processing system of claim 8, wherein, to update one or more parameters of the first portion of the neural network, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to: generate decompressed data based on decompressing the second element of input data from the second processing system ( Quantization uses 8-bit floating point; dequantization/decompression occurs before use (Sec. III-B, Algorithm 2) ; … activations/gradients are compressed by quantization before transmission; the corollary is that the receiver de quantizes (i.e., decompresses) the received tensor before using it in training Abstract; … ) ; and update one or more parameters of the first portion of the neural network based on the decompressed data ( Quantization uses 8-bit floating point; dequantization/decompression occurs before use (Sec. III-B, Algorithm 2) ; … fter receipt, the tensor is used to update parameters of the recipient’s portion: Client updates client side parameters based on received gradients: “if state = A then Send grad to clients; client backward(grad); Update WC;” (Algorithm 1, split_backward, lines 32–36). Symmetrically, when the server receives activations from the client, it uses them to compute loss and update WS (its portion) before optionally returning gradients (Algorithm 1, split_forward lines 25–27; split_backward lines 30–31: “grad ← server backward(loss); Update WS;”). ) . Regarding claim 10, Chen teaches the first processing system of claim 1, wherein: to perform the reduced communication training of the neural network, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to transmit at least a second element of output data from the first processing system to the second processing system ( client sends the cut layer activation to the server in state B (Algorithm 1 split_forward; Sec. III A ; … If Δloss ≤ threshold, state changes to B or C, reducing transmission of activations/gradients (Fig. 2; Algorithm 1, lines 46–50). Quantization further reduces data size (Sec. III-B, Algorithm 2 ) , and the first processing system does not receive a second element of input data from the second processing system while performing the reduced communication training of the neural network ( server is configured to send gradients to clients only when state = A; in state = B the server does not send gradients, so the client does not receive gradients while still sending activations (Algorithm 1 split_backward; Sec. III A) ; …. State B (send activation, no gradient back) matches this (Fig. 2, state B) ) . Regarding claim 11, Chen teaches the first processing system of claim 10, wherein, to transmit the second element of output data from the first processing system to the second processing system, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to: generate compressed output data, wherein, to generate the compressed output data, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to use one or more compression operations to compress the second element of output data ( compressing (quantizing) the transmitted tensors (activations/gradients) prior to transmission to reduce communication ; … Abstract: “the activations/gradients are quantized using 8 bit floating point prior to transmission. ; … Sec. III B (Search based Quantization): “The activation/gradients are quantized using 8 bit floating point instead of the original 32 bits before being sent to server/clients, to further reduce the communication.” ; … Sec. III B (Search based Quantization): “The activation/gradients are quantized using 8 bit floating point instead of the original 32 bits before being sent to server/clients, to further reduce the communication.” Quantization uses 8-bit floating point; d equantization/decompression occurs before use (Sec. III-B, Algorithm 2) ) ; and transmit the compressed output data from the first processing system to the second processing system ( Algorithm 1, split_forward: the client computes the cut-layer activation and “Send act, y to server” when state ≠ C (which includes state B). Sec. III B states that these activations are quantized prior to transmission, i.e., the transmitted act is the compressed (quantized) data ; … t ransmitting the compressed output (quantized activation) from client to server during reduced-communication training (state B Quantization of activation/gradients before transmission (Sec. III-B, Algorithm 2, lines 1–14)) . Regarding claim 12, Chen teaches the first processing system of claim 1, wherein, to perform the reduced communication training of the neural network, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to: generate compressed output data, wherein, to generate the compressed output data, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to use one or more compression operations to compress at least a second element of output data ( q uantizing activations and gradients from 32 bit to 8 bit floating point “prior to transmission” to reduce communication overhead. (Abstract; Sec. III-B, opening paragraph; plain statement: “the activations/gradients are quantized using 8-bit floating point prior to transmission …. quantization to 8 bit floating (Sec. III B) and the search procedure that chooses representation parameters (Algorithm 2 ) ; and transmit the compressed output data from the first processing system to the second processing system ( Algorithm 1’s split_forward / split_backward behavior where activations/gradients (which Chen quantizes per Sec. III‑B) are transmitted between client and server ) . Regarding claim 13, Chen teaches the first processing system of claim 12, wherein, to generate the compressed output data, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to: select a first compression operation of the one or more compression operations based on the communication criteria ( selection of exponent bits and bias to meet clipping limits and reduce communication, i.e., choosing the quantization configuration responsive to the current epoch’s dynamic range / clipping budget. Chen uses runtime data (median(|X|), computed overflow/underflow proportion) as selection criteria in Algorithm 2 … Search-based quantization selects exponent bits/bias to keep clipping <1% (Algorithm 2, lines 2–11) ) ; and compress the at least the second element of output data, wherein, to compress the at least the second element of output data, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to use the first compression operation to generate the compressed output data ( Chen applying the selected quantization to produce compressed output for transmission (Sec. III B; Algorithm 2 selection followed by using that representation for quantization ) . Regarding claim 14, Chen teaches the first processing system of claim 1, wherein, to perform the reduced communication training of the neural network further, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to: generate a second element of input data, wherein, to generate the second element of input data, the one or more processors are configured to execute the processor-executable instructions to cause the first processing system to decompress received data from the second processing system (Receiving quantized gradients/activations, dequantizing, then using in backprop/update (Sec. III-B)) ; and update one or more parameters of the first portion of the neural network based on the second element of input data ( Receiving quantized gradients/activations, dequantizing, then using in backprop/update (Sec. III-B) … Algorithm 1: when gradient is received and client performs client backward(grad) and Update WC ) . Regarding the method claims 16-28, Claims 16-18 recite similar limitations as claims 2-14 and are rejected based on the same rational as claims 2-14. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 20220004875 A1 : Automated Construction Of Neural Network Architecture With Bayesian Graph Exploration Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT ELENI A SHIFERAW whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-3867 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT 7-3:30 M-F . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ELENI A SHIFERAW/ Supervisory Patent Examiner, Art Unit 2497
Read full office action

Prosecution Timeline

Sep 07, 2023
Application Filed
Mar 12, 2026
Non-Final Rejection — §102, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 7983414
PROTECTED CRYPTOGRAPHIC CALCULATION
2y 5m to grant Granted Jul 19, 2011
Patent 7984512
INTEGRATING SECURITY BY OBSCURITY WITH ACCESS CONTROL LISTS
2y 5m to grant Granted Jul 19, 2011
Patent 7965844
SYSTEM AND METHOD FOR PROCESSING USER DATA IN AN ENCRYPTION PIPELINE
2y 5m to grant Granted Jun 21, 2011
Patent 7954164
METHOD OF COPY DETECTION AND PROTECTION USING NON-STANDARD TOC ENTRIES
2y 5m to grant Granted May 31, 2011
Patent 7954156
METHOD TO ENHANCE PLATFORM FIRMWARE SECURITY FOR LOGICAL PARTITION DATA PROCESSING SYSTEMS BY DYNAMIC RESTRICTION OF AVAILABLE EXTERNAL INTERFACES
2y 5m to grant Granted May 31, 2011
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
37%
Grant Probability
73%
With Interview (+35.5%)
5y 1m
Median Time to Grant
Low
PTA Risk
Based on 132 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month