Last updated: May 29, 2026
Application No. 18/447,886
INFORMATION PROCESSING SYSTEM

Non-Final OA §103§112
Filed
Aug 10, 2023
Priority
Aug 12, 2022 — JP 2022-128577
Examiner
MAIDO, MAGGIE T
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Canon Kabushiki Kaisha
OA Round
1 (Non-Final)
This examiner grants 62% of cases after interview

— +27.6% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 39 resolved cases, 2023–2026
Examiner Intelligence

MAIDO, MAGGIE T View full profile →
Grants 62% of resolved cases
Career Allowance Rate
24 granted / 39 resolved
+6.5% vs TC avg
Strong +28% interview lift
Without
With
+27.6%
Interview Lift
resolved cases with interview
Typical timeline
4y 1m
Avg Prosecution
33 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.4%
-37.6% vs TC avg
§103
94.2%
+54.2% vs TC avg
§102
0.5%
-39.5% vs TC avg
§112
2.9%
-37.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 39 resolved cases
Office Action

§103 §112
DETAILED ACTION

This action is responsive to claims filed on 10 August 2023.
Claims 1-12 are pending for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  

Such claim limitation(s) is/are: 
“second information processing apparatus configured to communicate” in claim 1.
“inference target data acquisition unit configured to acquire” in claim 1.
“inference unit configured to perform” in claim 1.
“second information processing apparatus configured to communicate” in claim 5.
“training data acquisition unit configured to acquire training data” in claim 5.
“a training unit configured to learn” in claim 5.
“second information processing apparatus configured to communicate” in claim 9.

Claims 2-4, 6-8, 10-12, which are directly or indirectly dependent on claims 1, 5, 9, are similarly interpreted.
Examiner notes, for the record, the generic placeholders listed above are listed in the Specification as configurations detailed in Fig. 1 and Fig. 2, “Fig. 2 illustrates an example of a specific configuration of the first information processing apparatus 2.  In the example, the first information processing apparatus 2 includes a central processing unit (CPU) 200, a graphics processing unit (GPU) 201, a random access memory (RAM) 202, a read only memory (ROM) 203, and a storage device 204, which are connected by a system bus 205.  A display device 206 and an input device 207, such as a mouse and a keyboard, are connected to the first information processing apparatus 2.  The second information processing apparatus 3 may be configured in the same manner as the first information processing apparatus 2, or may be configured to include a part of the configuration of the first information processing apparatus 2.” as described in Specification [0024]-[0025].

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 2, 12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 recites the limitation "the inference target data input" in line 2. There is insufficient antecedent basis for this limitation in the claim. For examination purposes, the term "the inference target data input” has been construed to be “an inference target data input”. 
Claim 12 recites the limitation "the same information processing apparatus" in lines 4-5. This claim limitation is indefinite and unclear if “the same information processing apparatus” in lines 4-5 is the same “the same information processing apparatus” in lines 2-3 of Claim 12. For examination purposes, “the same information processing apparatus” in lines 4-5 has been construed to be “another the same information processing apparatus”. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-12 are rejected under 35 U.S.C. 103 as being unpatentable over Gharibi et al. (U.S. Pre-Grant Publication No. 20230252277, hereinafter ‘Gharibi'), in view of Vepakomma et al. (NPL: "Split learning for health: Distributed deep learning without sharing raw patient data", hereinafter 'Vepakomma'). 

	Regarding claim 1, Gharibi teaches An information processing system comprising ([0165]
Additionally, the methods disclosed herein may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executing collectively on one or more processors, by hardware, or combinations thereof. As noted above, the code may be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program including a plurality of instructions executable by one or more processors. The computer-readable or machine-readable storage medium may be non-transitory.):
	a first information processing apparatus; a second information processing apparatus configured to communicate with the first information processing apparatus via a network ([0040] FIG. 2 illustrates a split learning centralized approach. A model (neural network) 204 is split into two parts: one part (206A, 208A, 210A) resides on the a first information processing apparatus respective client side 206, 208, 210 and includes the input layer to the model and optionally other layers up to a cut layer, and the other part (B) resides on the a second information processing apparatus configured to communicate with the first information processing apparatus via a network server side 202 and often includes the output layer. Split layer (S) refers to the layer (the cut layer) where A and B are split. In FIG. 2, SA represents a split layer or data sent from A to B and SB represents a split layer sent from B to A.);
	an inference unit configured to perform predetermined inference processing on the inference target data by using a first partial model and a second partial model ([0059] A global model in federated-split learning can be aggregated as follows. After the training is done, the system uses on the following approach to aggregate a an inference unit configured to perform predetermined inference processing on the inference target data global model, which will be used for the inference task. In a first approach, the server selects one of the models, Ai, to be aggregated with its model, B, to form the global model. The selection of Ai could be achieved using one of the following ways. For example, random selection could be used where the server selects a model (Ai) of any client 406, 408, 410 randomly. This random selection might be influenced by other factors, such as the currently available clients online, the types of data each client processes (text data, image data, temporal data) or based on the transmission speed or network delay between the two entities. The server then stacks both by using a first partial model parts Ai and and a second partial model B to generate the global model.),
	wherein the information processing system performs predetermined inference processing by using an inference model based on a neural network including a first input layer, a group of intermediate layers, a first output layer, and a second output layer, the first output layer and the second output layer being provided in different information processing apparatuses, wherein the first information processing apparatus includes the first partial model configured to include the first input layer, a first group of intermediate layers including at least one of the intermediate layers in the group of intermediate layers, and the first output layer, and wherein the second information processing apparatus includes the second partial model configured to include a second group of intermediate layers including an intermediate layer different from the intermediate layers included in the first group of intermediate layers in the group of intermediate layers, and the second output layer ([0050] The wherein the information processing system performs predetermined inference processing by using an inference model based on a neural network including a first input layer, a group of intermediate layers, a first output layer, and a second output layer, the first output layer and the second output layer being provided in different information processing apparatuses server 402 splits the network at the “split layer” which is a user parameter inserted into the network definition codes. The wherein the second information processing apparatus includes the second partial model configured to include a second group of intermediate layers including an intermediate layer different from the intermediate layers included in the first group of intermediate layers in the group of intermediate layers, and the second output layer “top portion” of the network is kept at the server 402 the wherein the first information processing apparatus includes the first partial model configured to include the first input layer, a first group of intermediate layers including at least one of the intermediate layers in the group of intermediate layers, and the first output layer “bottom portion” is sent to the respective data providers or clients 406, 408, 410 (the terms clients and data providers are used interchangeably here). The training starts at the very lowest network layer which is the layer closest to the data. Each layer reads either the data (from the first layer) or the output of the previous layer (all other layers).; [0051] The layers can calculate their output (these are termed “activations” because they come from an activation function) based on any valid network architecture command (convolutions, dropouts, batch normalization, flatten layers, etc.) and activation function (relu, tanh, etc.). When the last layer on the data side 406, 408, 410 has calculated its appropriate activations (i.e., output) those outputs are sent to the first layer on “the other side of the split”—the first layer on the server side 402.).
	Gharibi fails to teach an inference target data acquisition unit configured to acquire inference target data; and
	Vepakomma teaches an inference target data acquisition unit configured to acquire inference target data ([2 SplitNN configurations for health, pg. 3] Figure 2: an inference target data acquisition unit configured to acquire inference target data Split learning configurations for health shows raw data is not transferred between the client and server health entities for training and inference of distributed deep learning models with SplitNN.); and
	Gharibi and Vepakomma are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Gharibi, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Vepakomma to Gharibi before the effective filing date of the claimed invention in order to facilitate collaboratively train deep learning models without sharing sensitive raw data (cf. Vepakomma, [Abstract, pg. 1] Can health entities collaboratively train deep learning models without sharing sensitive raw data? This paper proposes several configurations of a distributed deep learning method called SplitNN to facilitate such collaborations. SplitNN does not share raw data or model details with collaborating institutions. The proposed configurations of splitNN cater to practical settings of i) entities holding different modalities of patient data, ii) centralized and local health entities collaborating on multiple tasks and iii) learning without sharing labels. We compare performance and resource efficiency trade-offs of splitNN and other distributed deep learning methods like federated learning, large batch synchronous stochastic gradient de scent and show highly encouraging results for splitNN.).

Regarding claim 2, Gharibi, as modified by Vepakomma, teaches The information processing system of claim 1.
	Gharibi teaches wherein the inference unit performs the predetermined inference processing on the inference target data input to the first input layer by using the group of intermediate layers, the first output layer, and the second output layer ([0052] The following approach involves splitting the model up as before. A model is split into two parts: (A) on the client side and includes the input layer, and (B) on the server side and often includes the output layer. (S) is the split layer. The clients or data providers 406, 408, 410 run independently and send back the answer if they have it. inference unit performs the predetermined inference processing on the inference target data input to the first input layer The code on the server 402 processes the data and sends back its output equally to all the clients as SB (406C, 408C, 410C).; [0054] The clients 406, 408, 410 each run their by using the group of intermediate layers portion A (406A, 408A, 410A) of the neural network and generate a respective the first output layer output of A (i.e., SA (406B, 408B, 410B) and send the output to the server 402. The server 402 receives 3 different ‘versions’ of the activations (one from each of SA1, SA2, SA3). At this point, the server 402 processes those activations “appropriately”, which can mean that the server 402 does different operations depending on the case. For example, the server 402 and the second output layer calculates the loss value for each client 406, 408, 410 and the server 402 calculates the average loss across all clients.).
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 3, Gharibi, as modified by Vepakomma, teaches The information processing system of claim 1.
	Gharibi teaches wherein the neural network further includes a second input layer, and
wherein the second partial model further includes the second input layer ([0050] The blind learning approach does not perform the round robin processing described above. The server 402 splits the network at the “split layer” which is a user parameter inserted into the network definition codes. The “top portion” of the network is kept at the server 402 the “bottom portion” is sent to the respective data providers or clients 406, 408, 410 (the terms clients and data providers are used interchangeably here). The training starts at the very lowest network layer which is the layer closest to the data. Each layer reads either the data (from the first layer) or the output of the previous layer (all other layers).; [0051] The layers can calculate their output (these are termed “activations” because they come from an activation function) based on any valid network architecture command (convolutions, dropouts, batch normalization, flatten layers, etc.) and activation function (relu, tanh, etc.). When the last layer on the data side 406, 408, 410 has calculated its appropriate activations (i.e., output) those outputs are sent to the wherein the neural network further includes a second input layer, and wherein the second partial model further includes the second input layer first layer on “the other side of the split”—the first layer on the server side 402.).
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 4, Gharibi, as modified by Vepakomma, teaches The information processing system of claim 3.
	Gharibi teaches wherein, in a case where the inference target data is input to the first input layer, the inference unit performs the predetermined inference processing on the inference target data by using at least the first input layer, the first group of intermediate layers, the second group of intermediate layers, and the first output layer, and wherein, in a case where the inference target data is input to the second input layer, the inference unit performs the predetermined inference processing on the inference target data by using at least the second input layer, the first group of intermediate layers, the second group of intermediate layers, and the second output layer ([0050] The server 402 splits the network at the “split layer” which is a user parameter inserted into the network definition codes. The “top portion” of the network is kept at the server 402 the “bottom portion” is sent to the respective data providers or clients 406, 408, 410 (the terms clients and data providers are used interchangeably here). The training starts at the very lowest network layer which is the layer closest to the data. wherein, in a case where the inference target data is input to the first input layer Each layer reads either the data (inference unit performs the predetermined inference processing on the inference target data by using at least the first input layer from the first layer) or the output of the previous layer (all other layers).; [0051] The the first group of intermediate layers, the second group of intermediate layers layers can calculate their output (these are termed “activations” because they come from an activation function) based on any valid network architecture command (convolutions, dropouts, batch normalization, flatten layers, etc.) and activation function (relu, tanh, etc.). When the last layer on the data side 406, 408, 410 has calculated its appropriate activations (i.e., output) those and the first output layer outputs are sent to the first layer on “the other side of the split”—the first layer on the server side 402.; [0054] The clients 406, 408, 410 each run their portion A (406A, 408A, 410A) of the neural network and generate a respective output of A (i.e., SA (406B, 408B, 410B) and send the output to the server 402. The server 402 wherein, in a case where the inference target data is input to the second input layer receives 3 different ‘versions’ of the activations (one from each of SA1, SA2, SA3). At this point, the server 402 processes those activations “appropriately”, which can mean that the server 402 does different operations depending on the case. For example, the server 402 calculates the loss value for each client 406, 408, 410 and the server 402 calculates the average loss across all clients.; [0055] In other words, the inference unit performs the predetermined inference processing on the inference target data by using at least the second input layer, the first group of intermediate layers, the second group of intermediate layers, and the second output layer training on the server side 402 proceeds much like is described above. Once the first layer on the server side 402 is “complete” (either through averaging or aggregating what is received from the data providers 406, 408, 410) forward propagation occurs until the “top” of the network is reached.).
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

	Regarding claim 5, Gharibi teaches An information processing system comprising ([0165]
Additionally, the methods disclosed herein may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executing collectively on one or more processors, by hardware, or combinations thereof. As noted above, the code may be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program including a plurality of instructions executable by one or more processors. The computer-readable or machine-readable storage medium may be non-transitory.):
	a first information processing apparatus; a second information processing apparatus configured to communicate with the first information processing apparatus via a network ([0040] FIG. 2 illustrates a split learning centralized approach. A model (neural network) 204 is split into two parts: one part (206A, 208A, 210A) resides on the a first information processing apparatus respective client side 206, 208, 210 and includes the input layer to the model and optionally other layers up to a cut layer, and the other part (B) resides on the a second information processing apparatus configured to communicate with the first information processing apparatus via a network server side 202 and often includes the output layer. Split layer (S) refers to the layer (the cut layer) where A and B are split. In FIG. 2, SA represents a split layer or data sent from A to B and SB represents a split layer sent from B to A.);
	a training unit configured to learn a first partial model and a second partial model by using the training data ([0052] The following approach involves splitting the model up as before. A model is split into two parts: a first partial model (A) on the client side and includes the input layer, and and a second partial model (B) on the server side and often includes the output layer. (S) is the split layer. The clients or data providers 406, 408, 410 run independently and send back the answer if they have it. The code on the server 402 a training unit configured to learn by using the training data processes the data and sends back its output equally to all the clients as SB (406C, 408C, 410C).; [0054] The clients 406, 408, 410 each run their portion A (406A, 408A, 410A) of the neural network and generate a respective output of A (i.e., SA (406B, 408B, 410B) and send the output to the server 402. The server 402 receives 3 different ‘versions’ of the activations (one from each of SA1, SA2, SA3). At this point, the server 402 processes those activations “appropriately”, which can mean that the server 402 does different operations depending on the case. For example, the server 402 calculates the loss value for each client 406, 408, 410 and the server 402 calculates the average loss across all clients.),
	wherein the information processing system performs training processing of training an inference model based on a neural network including a first input layer, a group of intermediate layers, a first output layer, and a second output layer, the first output layer and the second output layer being provided in different information processing apparatuses, wherein the first information processing apparatus includes the first partial model configured to include the first input layer, a first group of intermediate layers including at least one of the intermediate layers in the group of intermediate layers, and the first output layer, and wherein the second information processing apparatus includes the second partial model configured to include a second group of intermediate layers including an intermediate layer different from the intermediate layers included in the first group of intermediate layers in the group of intermediate layers, and the second output layer ([0050] The wherein the information processing system performs training processing of training an inference model based on a neural network including a first input layer, a group of intermediate layers, a first output layer, and a second output layer, the first output layer and the second output layer being provided in different information processing apparatuses server 402 splits the network at the “split layer” which is a user parameter inserted into the network definition codes. The wherein the second information processing apparatus includes the second partial model configured to include a second group of intermediate layers including an intermediate layer different from the intermediate layers included in the first group of intermediate layers in the group of intermediate layers, and the second output layer “top portion” of the network is kept at the server 402 the wherein the first information processing apparatus includes the first partial model configured to include the first input layer, a first group of intermediate layers including at least one of the intermediate layers in the group of intermediate layers, and the first output layer “bottom portion” is sent to the respective data providers or clients 406, 408, 410 (the terms clients and data providers are used interchangeably here). The training starts at the very lowest network layer which is the layer closest to the data. Each layer reads either the data (from the first layer) or the output of the previous layer (all other layers).; [0051] The layers can calculate their output (these are termed “activations” because they come from an activation function) based on any valid network architecture command (convolutions, dropouts, batch normalization, flatten layers, etc.) and activation function (relu, tanh, etc.). When the last layer on the data side 406, 408, 410 has calculated its appropriate activations (i.e., output) those outputs are sent to the first layer on “the other side of the split”—the first layer on the server side 402.).
	Gharibi fails to teach a training data acquisition unit configured to acquire training data; and
	Vepakomma teaches a training data acquisition unit configured to acquire training data ([2 SplitNN configurations for health, pg. 3] Figure 2: a training data acquisition unit configured to acquire training data Split learning configurations for health shows raw data is not transferred between the client and server health entities for training and inference of distributed deep learning models with SplitNN.); and
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 6, Gharibi, as modified by Vepakomma, teaches The information processing system of claim 5.
	Gharibi teaches wherein the training unit inputs training data constituting the training data to the first input layer of the inference model, and trains the group of intermediate layers and the first output layer ([0050] The server 402 splits the network at the “split layer” which is a user parameter inserted into the network definition codes. The “top portion” of the network is kept at the server 402 the “bottom portion” is sent to the respective data providers or clients 406, 408, 410 (the terms clients and data providers are used interchangeably here). The wherein the training unit inputs training data constituting the training data to the first input layer of the inference model training starts at the very lowest network layer which is the layer closest to the data. trains the group of intermediate layers and the first output layer Each layer reads either the data (from the first layer) or the output of the previous layer (all other layers).; [0051] The layers can calculate their output (these are termed “activations” because they come from an activation function) based on any valid network architecture command (convolutions, dropouts, batch normalization, flatten layers, etc.) and activation function (relu, tanh, etc.). When the last layer on the data side 406, 408, 410 has calculated its appropriate activations (i.e., output) those outputs are sent to the first layer on “the other side of the split”—the first layer on the server side 402.), and
	wherein the training unit trains the inference model by training the group of intermediate layers and the first output layer by backpropagation using loss information calculated using ground truth data constituting the training data and the first output layer, and by using a parameter relating to the first output layer obtained by the training as a parameter relating to the second output layer ([0041] In one example, the neural network between B 204 and client 1 (206) is the B portion 204 plus the A1 portion (206A) with the communication of data SB1 (206C) and SA1 (206B) to complete the entire neural network. The wherein the training unit trains the inference model by training the group of intermediate layers and the first output layer training process is as follows in this model. The server 202 creates A and B and sends a respective model A (206A, 208A, 210A) to the respective client 206, 208, 210. For every client, the operations include repeating the following in a linear, iterative fashion across the group of clients until some conditions occurs. The respective client 206, 208, 210 on their turn downloads the most recent model A from the server 202 (Note that this step is different between the approach shown in FIG. 2 and FIG. 3 ). The clients 206, 208, 210 in their respective turn do a forward step on the model A and sends the calculated using ground truth data constituting the training data and the first output layer output of A (i.e., activations at S only or SA1 (206B), SA2 (208B), SAN 210B)) to the server 202 in addition to the required labels. The server 202 does a forward step on B using the SAs received from the respective client 206, 208, 210. The server 202 by backpropagation using loss information calculates the loss function and the server 202 does backpropagation and calculates by using a parameter relating to the first output layer obtained by the training gradients at the S layer. The server 202 sends the as a parameter relating to the second output layer gradients of S only (i.e., SB1 (206C), SB2 (208C), SBN (210C)) to the respective client 206, 208, 210. This is process is performed linearly across the different clients such that the operations occur first for client 206, followed by client 208, and then client 210. The client 206, 208, 210 does backpropagation using the SB gradients received from the server 202 and the client 206, 208, 210 shares their updated A (SA1 (206B), SA2 (208B), SAN (210B)) with the server 202.).
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 7, Gharibi, as modified by Vepakomma, teaches The information processing system of claim 5.
	Gharibi teaches wherein the neural network further includes a second input layer, and
wherein the second partial model further includes the second input layer ([0050] The blind learning approach does not perform the round robin processing described above. The server 402 splits the network at the “split layer” which is a user parameter inserted into the network definition codes. The “top portion” of the network is kept at the server 402 the “bottom portion” is sent to the respective data providers or clients 406, 408, 410 (the terms clients and data providers are used interchangeably here). The training starts at the very lowest network layer which is the layer closest to the data. Each layer reads either the data (from the first layer) or the output of the previous layer (all other layers).; [0051] The layers can calculate their output (these are termed “activations” because they come from an activation function) based on any valid network architecture command (convolutions, dropouts, batch normalization, flatten layers, etc.) and activation function (relu, tanh, etc.). When the last layer on the data side 406, 408, 410 has calculated its appropriate activations (i.e., output) those outputs are sent to the wherein the neural network further includes a second input layer, and wherein the second partial model further includes the second input layer first layer on “the other side of the split”—the first layer on the server side 402.).
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 8, Gharibi, as modified by Vepakomma, teaches The information processing system of claim 7.
	Gharibi teaches wherein, in a case where training data constituting the training data is input to the first input layer, the training unit trains the inference model by training the group of intermediate layers and the first output layer by backpropagation using loss information calculated using ground truth data constituting the training data and the first output layer, 
and by using a parameter relating to the first output layer obtained by the training as a parameter relating to the second output layer, and wherein, in a case where training data constituting the training data is input to the second input layer, the training unit trains the inference model by training the group of intermediate layers and the second output layer by backpropagation using loss information calculated using ground truth data constituting the training data and the second output layer, and by using a parameter relating to the second output layer obtained by the training as a parameter relating to the first output layer ([0087] In this case rather than an “automatic” split of the network architecture this variation on the idea allows the network architect (i.e. the data scientist developing the algorithm) to specify the specific network components desired for each data type. Each data type will need network architecture layers relevant to its data type (i.e. convolutional layers for images, Recurrent layers/Long Short Term Memory layers for speech, feed forward layers for tabular data, etc.).; [0088] Split learning is a collaborative deep learning technique, where a deep learning network or neural network (NN) can be split into two portions, a client-side network A and a server-side network B, as discussed above. The NN includes weights, bias, and hyperparameters. In FIG. 7 , the clients 702, 704, 706, where the data reside, commit only to the client-side portion of the network, and the server 710 commits only to the server-side portion of the network 710A. The client-side and server-side portions collectively form the full network NN.; [0089] The training of the network is done by a sequence of distributed training processes. The forward propagation and the back-propagation can take place as follows. With the raw data, a client (say client 702) trains the client-side network 702A up to a certain layer of the network, which can be called the cut layer or the split layer, and sends the activations of the cut layer to the server 710. The server 710 trains the remaining layers of the NN with the activations that it received from the client 702. This completes a single forward propagation step. A similar process occurs in parallel for the second client 704 and its client side network 704A and its data and generated activations which are transmitted to the server 710. A further similar process occurs in parallel for the third client 706 and its client side network 706A and its data and by using a parameter relating to the second output layer obtained by the training as a parameter relating to the first output layer generated activations which are transmitted to the server 710.; [0090] Next, the wherein, in a case where training data constituting the training data is input to the second input layer, the training unit trains the inference model by training the group of intermediate layers and the second output layer by backpropagation using loss information calculated using ground truth data constituting the training data and the second output layer server 710 carries out the back-propagation up to the cut layer and sends the gradients of the activations to the respective clients 702, 704, 706. by using a parameter relating to the first output layer obtained by the training as a parameter relating to the second output layer With the gradients, wherein, in a case where training data constituting the training data is input to the first input layer, the training unit trains the inference model by training the group of intermediate layers and the first output layer by backpropagation using loss information calculated using ground truth data constituting the training data and the first output layer each respective client 702, 704, 706 performs back-propagation on the remaining network 702A, 704A, 706A. This completes a single pass of the back-propagation between a client 702, 704, 706 and the server 710.).
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

	Regarding claim 9, Gharibi teaches An information processing system comprising ([0165]
Additionally, the methods disclosed herein may be performed under the control of one or more computer systems configured with executable instructions and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) executing collectively on one or more processors, by hardware, or combinations thereof. As noted above, the code may be stored on a computer-readable or machine-readable storage medium, for example, in the form of a computer program including a plurality of instructions executable by one or more processors. The computer-readable or machine-readable storage medium may be non-transitory.):
	a first information processing apparatus; and a second information processing apparatus configured to communicate with the first information processing apparatus via a network ([0040] FIG. 2 illustrates a split learning centralized approach. A model (neural network) 204 is split into two parts: one part (206A, 208A, 210A) resides on the a first information processing apparatus respective client side 206, 208, 210 and includes the input layer to the model and optionally other layers up to a cut layer, and the other part (B) resides on the a second information processing apparatus configured to communicate with the first information processing apparatus via a network server side 202 and often includes the output layer. Split layer (S) refers to the layer (the cut layer) where A and B are split. In FIG. 2, SA represents a split layer or data sent from A to B and SB represents a split layer sent from B to A.),
	wherein the information processing system performs at least one of inference processing and training processing by using an inference model based on a neural network including a first input layer, a second input layer, a group of intermediate layers, a first output layer, and a second output layer, the first input layer and the second input layer being provided in different information processing apparatuses ([0083] The MMAI platform 600 shown in FIG. 6 introduces a new generation cryptography toolset to improve the training and protection of private data. The MMAI platform 600 provides the model with more data than is typically used to train AI/ML models and expands on the data. The approach adds a significant amount of data by combining different data types—i.e. images and tabular data, for instance.; [0087] In this case rather than an “automatic” split of the network architecture this variation on the idea allows the network architect (i.e. the data scientist developing the algorithm) to wherein the information processing system performs at least one of inference processing and training processing by using an inference model based on a neural network specify the specific network components desired for each data type. Each data type will need network architecture layers relevant to its data type (i.e. convolutional layers for images, Recurrent layers/Long Short Term Memory layers for speech, feed forward layers for tabular data, etc.). These disparate layers, a first input layer, a second input layer, a group of intermediate layers, a first output layer, and a second output layer, the first input layer and the second input layer being provided in different information processing apparatuses each specific to the data type in question, will be specified such that they run on the “data server” side (almost like independent networks in and of themselves). The last layer of each “independent network” (per data type) will send it's activations “across the split” to the “server side”.), and
	Gharibi fails to teach wherein the information processing system is configured to perform at least one of the inference processing and the training processing on the inference model using a path corresponding to the input layer to which target data is input.
	Vepakomma teaches wherein the information processing system is configured to perform at least one of the inference processing and the training processing on the inference model using a path corresponding to the input layer to which target data is input ([2 SplitNN configurations for health, pg. 3] Figure 2: wherein the information processing system is configured to perform at least one of the inference processing and the training processing on the inference model Split learning configurations for health shows raw data is not transferred between the client and server health entities for training and inference of distributed deep learning models with SplitNN.; [Simple vanilla configuration for split learning:, pg. 2] This is the simplest of splitNN configura tions as shown in Fig 2a. In this setting each client, (for example, radiology center) using a path corresponding to the input layer to which target data is input trains a partial deep network up to a specific layer known as the cut layer. The outputs at the cut layer are sent to a server which completes the rest of the training without looking at raw data (radiology images) from clients. This completes a round of forward propagation without sharing raw data. The gradients are now back propagated at the server from its last layer until the cut layer. The gradients at the cut layer (and only these gradients) are sent back to radiology client centers. The rest of back propagation is now completed at the radiology client centers. This process is continued until the distributed split learning network is trained without looking at each others raw data.).
	Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 10, Gharibi, as modified by Vepakomma, teaches The information processing system of claim 9.
	Gharibi teaches wherein the path includes a first path using the first input layer, the group of intermediate layers, and the first output layer, and a second path using the second input layer, the group of intermediate layers, and the second output layer ([0092] As noted above, a concept introduced in this disclosure relates to the clients 702, 704, 706 each providing a different type of data but also where the different types of data have a common association. Thus, the selection of the machine learning model can be based on the types of data that are being processed on the client side, and the process of finding the cut layer can also depend on what types of data or the disparity in the different types of data. For example, for widely disparate data types across the clients 702, 704, 706, the cut layer may be chosen to have more or less layers on the client- side networks 702A, 704A, 706A. In another aspect, the number of layers before the cut layer or split layer may vary across clients. wherein the path includes a first path using the first input layer, the group of intermediate layers, and the first output layer Client 702 may be processing images and require 8 layers before the cut layer, while a second path using the second input layer, the group of intermediate layers, and the second output layer client 704 may process text and only need 4 layers before the cut layer. In this regard, as long as the vectors, activations or activation layer at the cut layer is consistent across the different clients 702, 704, 706 having different types of data, there is no requirement that the number of layers at the client- side networks 702A, 704A, 706A be the same.).
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

Regarding claim 11, Gharibi, as modified by Vepakomma, teaches The information processing system of claim 9.
	Gharibi teaches wherein the first output layer and the second output layer are provided in different information processing apparatuses ([0050] The server 402 splits the network at the “split layer” which is a user parameter inserted into the network definition codes. The and the second output layer are provided in different information processing apparatuses “top portion” of the network is kept at the server 402 the wherein the first output layer “bottom portion” is sent to the respective data providers or clients 406, 408, 410 (the terms clients and data providers are used interchangeably here). The training starts at the very lowest network layer which is the layer closest to the data. Each layer reads either the data (from the first layer) or the output of the previous layer (all other layers).; [0051] The layers can calculate their output (these are termed “activations” because they come from an activation function) based on any valid network architecture command (convolutions, dropouts, batch normalization, flatten layers, etc.) and activation function (relu, tanh, etc.). When the last layer on the data side 406, 408, 410 has calculated its appropriate activations (i.e., output) those outputs are sent to the first layer on “the other side of the split”—the first layer on the server side 402.; [0054] The clients 406, 408, 410 each run their portion A (406A, 408A, 410A) of the neural network and generate a respective output of A (i.e., SA (406B, 408B, 410B) and send the output to the server 402. The server 402 receives 3 different ‘versions’ of the activations (one from each of SA1, SA2, SA3). At this point, the server 402 processes those activations “appropriately”, which can mean that the server 402 does different operations depending on the case. For example, the server 402 calculates the loss value for each client 406, 408, 410 and the server 402 calculates the average loss across all clients. The server 402 performs backpropagation using the average loss and calculates gradients at S. The server 402 sends gradients at S (i.e., SB (406C, 408C, 410C)) to all the clients 406, 408, 410.).
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 12, Gharibi, as modified by Vepakomma, teaches The information processing system of claim 11.
	Gharibi teaches wherein the first input layer and the first output layer are provided in the same information processing apparatus, and wherein the second input layer and the second output layer are provided in the same information processing apparatus ([0050] The server 402 splits the network at the “split layer” which is a user parameter inserted into the network definition codes. The “top portion” of the network is kept at the server 402 the “bottom portion” is sent to the respective data providers or clients 406, 408, 410 (the terms clients and data providers are used interchangeably here). The training starts at the very lowest network layer which is the layer closest to the data. Each layer reads either the data (from the first layer) or the output of the previous layer (all other layers).; [0051] The layers can calculate their output (these are termed “activations” because they come from an activation function) based on any valid network architecture command (convolutions, dropouts, batch normalization, flatten layers, etc.) and activation function (relu, tanh, etc.). When the last layer on the data side 406, 408, 410 has calculated its appropriate activations (i.e., output) those outputs are sent to the first layer on “the other side of the split”—the first layer on the server side 402.; [0054] The wherein the first input layer and the first output layer are provided in the same information processing apparatus clients 406, 408, 410 each run their portion A (406A, 408A, 410A) of the neural network and generate a respective output of A (i.e., SA (406B, 408B, 410B) and send the output to the server 402. The server 402 wherein the second input layer and the second output layer are provided in the same information processing apparatus receives 3 different ‘versions’ of the activations (one from each of SA1, SA2, SA3). At this point, the server 402 processes those activations “appropriately”, which can mean that the server 402 does different operations depending on the case. For example, the server 402 calculates the loss value for each client 406, 408, 410 and the server 402 calculates the average loss across all clients. The server 402 performs backpropagation using the average loss and calculates gradients at S. The server 402 sends gradients at S (i.e., SB (406C, 408C, 410C)) to all the clients 406, 408, 410.).
Gharibi and Vepakomma are combinable for the same rationale as set forth above with respect to claim 1.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Poirot et al. (NPL: “Split Learning for collaborative deep learning in healthcare”) teaches a split learning based approach in the medical field to compare performance against (1) centrally hosted and (2) non collaborative configurations for a range of participants.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAGGIE MAIDO whose telephone number is (703) 756-1953. The examiner can normally be reached M-Th: 6am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/MM/Examiner, Art Unit 2129  
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Aug 10, 2023
Application Filed
Mar 27, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/196,689
Patent 12639595
INFORMATION PROCESSING DEVICE, INFORMATION COMPUTING METHOD, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM
5y 2m to grant Granted May 26, 2026
17/330,099
Patent 12602603
MULTI-AGENT INFERENCE
4y 10m to grant Granted Apr 14, 2026
17/392,319
Patent 12596933
CONTEXT-AWARE ENTITY LINKING FOR KNOWLEDGE GRAPHS TO SUPPORT DECISION MAKING
4y 8m to grant Granted Apr 07, 2026
17/062,058
Patent 12579463
GENERATIVE REASONING FOR SYMBOLIC DISCOVERY
5y 5m to grant Granted Mar 17, 2026
17/659,028
Patent 12579452
EVALUATION SCORE DETERMINATION MACHINE LEARNING MODELS WITH DIFFERENTIAL PERIODIC TIERS
3y 11m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
62%
Grant Probability
89%
With Interview (+27.6%)
4y 1m (~1y 3m remaining)
Median Time to Grant
Low
PTA Risk
Based on 39 resolved cases by this examiner. Grant probability derived from career allowance rate.