Last updated: May 29, 2026
Application No. 18/054,452
COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER

Final Rejection §101§102§103
Filed
Nov 10, 2022
Examiner
GODO, MORIAM MOSUNMOLA
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
Interview Optional

— +33.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 44% grant rate with +33.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 69 resolved cases, 2023–2026
Examiner Intelligence

GODO, MORIAM MOSUNMOLA View full profile →
Grants 44% of resolved cases
Career Allowance Rate
30 granted / 69 resolved
-11.5% vs TC avg
Strong +34% interview lift
Without
With
+33.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
27 currently pending
Career history
118
Total Applications
across all art units
Statute-Specific Performance

§101
1.4%
-38.6% vs TC avg
§103
91.8%
+51.8% vs TC avg
§102
0.6%
-39.4% vs TC avg
§112
5.4%
-34.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 69 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION
This office action is in response to the Application No. 18054452 filed on 
02/06/2026. Claims 2 and 13 has been cancelled, claims 1, 3-12,14-20 are presented for examination and are currently pending. Applicant’s arguments have been carefully and respectfully considered.

Response to Arguments
On page 12 of the remarks, the Applicant argued that “Applicant respectfully submits that instead of being directed to a judicial exception, claim 1 is directed to an improvement in the functioning of the computing system itself. Claim 1 recites specific architectural features implemented by the plurality of processing devices in order to execute the MoE layer. Rather than reciting a linking of a judicial exception to a particular technological environment, the limitation "a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model at least in part by" recites a structural feature of the machine learning model architecture implemented at the plurality of processing devices according to the subsequent limitations of claim 1”.
On page 12 of the remarks, the Applicant argued that “As disclosed, for example, 
in Paras. [0091], [00113], and [00205] of the subject application, the configuration of claim 1 allows the computing system to efficiently utilize communication bandwidth and execute the MoE layer with low latency even when the workloads of the expert sub-models vary. The configuration of claim 1 also avoids token dropping. These advantages are achieved through the specific tensor layouts and communication patterns recited in claim 1. Thus, claim 1 of the subject application is directed to eligible subject matter for reasons analogous to those provided in Enfish, LLC V. Microsoft Corp., 822 F.3d 1327, 1336-37, 118 USPQ2d 1684, 1689-90 (Fed. Cir. 2016). In the Enfish decision, the claims at issue were found to be directed to eligible subject matter due to improving the functioning of a computing device itself. Applicant respectfully submits that claim 1 is therefore directed to eligible subject matter at Step 2A Prong 2 of the Alice/Mayo subject matter eligibility test, since any alleged judicial exceptions are integrated into a practical application”.
	The above argument is not persuasive because the “plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model at least in part by” is an additional element as analyzed in the 101 rejection. The Applicant needs to argue how the additional elements including the abstract ideas highlighted in the 101 rejection as a whole improve the functioning of the computer as alleged by the Applicant.
	Also, it is important to note, the judicial exception alone cannot provide the improvement. The improvement can be provided by one or more additional elements
	Furthermore, the Applicant needs to explain how the limitations of the claim i.e., the abstract ideas and the additional elements highlighted in the 101 rejection leads to the alleged improvement claimed by the Applicant. For instance, how does the specific tensor layouts and communication patterns lead to the computing system to efficiently utilize communication bandwidth and execute the MoE layer with low latency.

	Applicant also argued on page 13 of the remarks that “As disclosed, for example, in Para. [0078], the steps of claim 1 dynamically adjust the tensor layouts at the MoE layer. By explicitly reciting the gating function and that the first dimension is the expert number dimension, claim 1 as currently amended clarifies how the steps performed during the first and second collective communication phases achieve the advantages discussed above by adapting to changes in the number of expert sub-models selected at the gating function. The current amendment to claim 1 therefore clarifies how the features of claim 1 are integrated into a practical application”.
	The above argument is not persuasive because the steps of how the claimed invention dynamically adjust the tensor layout is not reflected in the claims. The Applicant needs to argue how the gating function which is highlighted as an abstract ideas in the analysis of 101 rejection including the additional elements of first dimension is the expert number dimension leads to the alleged improvement claimed by the Applicant.
	On pages 13-14 of the remarks, the Applicant argued that “Applicant further encourages the Examiner to consider the August 4, 2025 memorandum from Deputy Commissioner Charles Kim, which cautions Examiners against categorizing Al limitations that "cannot practically be performed in the human mind" as mental processes, and also emphasizes analyzing claims "as a whole" rather than evaluating elements "in a vacuum, completely separate from the recited judicial exception." On September 26, 2025, the Director of the USPTO vacated the Patent Trial and Appeal Board's (PTAB) sua sponte §101 rejection in Desjardins [Appeal 2024- 000567, Application 16/319,040 (PTAB Apps. Rev. Sept. 26, 2025) ("Desjardins")] for violating the principles of the August 4, 2025 memorandum by (1) failing to meet the preponderance standard for ineligibility; (2) improperly categorizing complex machine learning operations as abstract algorithms; and (3) analyzing claim elements in isolation rather than considering their integrated technological contribution. Applicant respectfully submits that, when viewed in accordance with the August 4, 2025 memorandum and its interpretation in Desjardins, claim 1 of the subject application is not directed to an abstract idea”.
	Also, the Applicant argued on pages 13-14 that “Applicant has amended claims 12 and 20 similarly to claim 1. For the reasons provided above with reference to claim 1, Applicant respectfully submits that claims 12 and 20 as currently amended are directed to eligible subject matter. Since all features of claims 2 and 13 have been incorporated into claims 1 and 12, respectively, Applicant has canceled claims 2 and 13 without prejudice and amended claim 3 to depend from claim 1. Applicant respectfully requests the withdrawal of the rejection under 35 U.S.C. 101”.
	It is noted that the 101 analysis in this Office Action is inline with August 4, 2025.
	Furthermore, the newly added limitations in the amended claims has not overcome the 101 rejection. As a result, the 101 rejection is maintained and adjusted to reflect the newly added limitations.

	The Examiner is withdrawing the rejections in the previous Office action because Applicant’s amendment necessitated the new grounds of rejection presented in this Office action. Applicant’s arguments are moot because Riquelme, a new primary reference has been applied to the independent claims.
However, some of the references in the previous office action have been applied
to the dependent claims.
In addition, Rhodes et al. (US20220301097 filed 06/03/2022) is still applied to independent claim 20 because they teach many recited limitations of the claim. As a result, independent claim 20 is obvious over the newly applied Riquelme as primary reference in view of Rhodes et al.
	On page 14 of the remark, the Applicant argued that “Rhodes also does not disclose or suggest the first collective communication phase and the second collective communication phase. The Office action cites communication between processor cores, as disclosed in general terms in Fig. 10 and Paras. [0100]-[0101] of Rhodes, as disclosing both the first collective communication phase and the second collective communication phase. However, the cited figure and paragraphs of Rhodes do not disclose collective communication phases that involve all the processing devices, nor do these portions of Rhodes disclose the specific communication steps recited in claim 1, including distinct first and second collective communication phases. The cited paragraphs of Rhodes describe communication between processor cores without making any mention of expert sub-models”.
	It is noted above, Riquelme as a new primary reference has now been applied to independent claims 1 and 12. 
However, the above argument is not persuasive because Rhodes teaches the first collective communication phase (first collective communication phase is between CORE 1 and CORE 2, in Fig. 10) and the second collective communication phase (second collective communication phase is between CORE 3 and CORE 4, in Fig. 10). The broadest reasonable interpretation of Rhodes reads on the limitation “first collective communication phase and the second collective communication phase”. 
The argument that Rhodes does not mention expert sub-models while describing communication between processor cores is not persuasive because Rhodes core 1 or each core in Fig. 10 represents processor circuitry 912, and 912 includes positional encoding circuitry 108 as expert sub-models in Fig. 9. Furthermore, positional encoding circuitry 108 as expert sub-model in Fig. 9 reads on the same local expert number of the plurality of expert sub-model 40 in Fig. 12 of the instant specification.
As a result, Rhodes is a very relevant secondary reference that has been applied to independent claim 20 and some of the dependent claims.

On page 15 of the remarks, the Applicant argued that “Applicant further submits that the cited references do not disclose or suggest the features of claims 4 and 14. In the rejections of claims 4 and 14, the Office action cites the same paragraph of Rhodes (Para. [0039]) as disclosing both the first dimension and the second dimension, even though the tensors are concatenated and split along opposite dimensions in the first and second collective communication phases. Rhodes therefore does not disclose or suggest the tensor concatenation and splitting features recited in claims 4 and 14”.
The above argument that “the Office action cites the same paragraph of Rhodes (Para. [0039]) as disclosing both the first dimension and the second dimension” is not persuasive because the Office Action clearly states that 3D dimension is a first dimension and a second dimension as flattened 2D patches.
Rhodes teaches spatially partitioned tensor crops 202 are split along a 3D dimension which is a first dimension [0056] and Rhodes also teaches the embedded tensor representation 206 is illustrated in Equation 5 … In Equation 5, the embedding representation z0 includes a sequence of flattened 2D patches [0039-0040]. Furthermore, the paragraph [0039] was cited to teach the claimed limitation of concatenation which Rhodes teaches.

On page 15 of the remarks, the Applicant argued that “Applicant further submits that the cited references do not disclose or suggest the features of claims 7 and 16. In the rejections of claims 7 and 16, the Office incorrectly interprets control unit circuitry and AL circuitry included in a processor core as corresponding to the nodes. As shown in FIG. 1 of the subject application, the nodes are instead server devices that include multiple processing devices. Accordingly, the interpretation of the term "node" in the rejections of claims 7 and 16 does not correspond to the broadest reasonable interpretation of the term in light of the written description. The Office action also cites the control unit circuitry and AL circuitry, along with the generic description of communication between processor cores provided in Para. [0102] of Rhodes, as disclosing both inter- and intra-node collective communication. However, the cited paragraph of Rhodes does not disclose or suggest the specific communication patterns between the nodes recited in claims 7 and 16”.
The above argument is not persuasive because Rhodes teaches intra-node collective communications performed between the two or more processing devices included in each of the plurality of nodes (control unit circuitry 1014 includes semiconductor-based circuits ... The AL circuitry 1016 includes semiconductor-based circuits [0102]. The Examiner notes that the circuits in 1014 and 1016 perform intra-node collective communication); and 
inter-node collective communications performed between the plurality of nodes (Each core 1002 includes control unit circuitry 1014, arithmetic and logic (AL) circuitry (sometimes referred to as an ALU) 1016 … and an example bus 1022 [0102]. The Examiner notes control unit circuitry 1014 and AL circuitry 1016 are communicatively connected via a bus). The citation of Rhodes above reads broadly on the claimed limitations “intra-node collective communications ...” and “inter-node collective communications ...”. The Applicant has not argued how the “node” of the claimed invention is different from the interpretation given in the office Action. As a result, the citation of Rhodes above reads on the limitations of claim 7 and 16.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
 

3.	Claims 1-20 are rejected under 35 U.S.C 101 because the claimed invention is directed towards an abstract idea without significantly more.

Step 1 
Independent claim 1 is directed to a system, and falls into one of the four statutory categories. 

Step 2A, Prong 1
Claim 1 recites the following abstract ideas:
executing a gating function to select a plurality of expert sub-models included in 
the MoE layer, wherein a number of the expert sub-models is dynamically selected (Mental process directed to the selecting of plurality of expert sub-models. This is can be done by observing the sub-models and making a judgement on selecting the sub-models);
splitting each of a plurality of first input tensors along a first dimension to obtain a plurality of first output tensors (Mental process directed to dividing each of the plurality of input tensors which can be carried out a human using a pen and paper as a physical aid); 
concatenating the plurality of second input tensors along the first dimension to obtain a plurality of second output tensors (Mental process directed to joining each of the plurality of input tensors which can be carried out a human using a pen and paper as a physical aid); and 

Step 2A, Prong 2
a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model at least in part by (this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)): 
 during a first collective communication phase between the plurality of processing devices (this limitation is directed to insignificant extra-solution activity of data transmission between plurality of processing devices. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(g)), 
wherein the first dimension is an expert number dimension that indicates the number of the expert sub-models to which input tokens included in the first input tensor are transmitted in the first collective communication phase (This limitation is directed to a particular type or source of data, which is field of use and it does not integrate the abstract idea into a practical application);
processing the first output tensors at respective expert sub-models of the plurality of expert sub-models to obtain a plurality of second input tensors (this limitation is directed to insignificant extra-solution activity of data gathering between plurality of input sensors. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(g)); 
during a second collective communication phase between the plurality of processing devices (this limitation is directed to insignificant extra-solution activity of data transmission between plurality of processing devices. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(g)):
 receiving the plurality of second input tensors from the plurality of expert sub-models (this limitation is directed to insignificant extra-solution activity of data transmission between input tensors from the plurality of expert sub-models. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(g)); and
outputting the second output tensors to an additional computing process as output of the MoE layer (this limitation is directed to insignificant extra-solution activity of data transfer of outputting tensors. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(g)).  

Step 2B, Prong 2
a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model at least in part by (this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h)): 
 during a first collective communication phase between the plurality of processing devices (this limitation is directed to insignificant extra-solution activity of data transmission between plurality of processing devices and it is well understood routine and conventional. This does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), example i), 
wherein the first dimension is an expert number dimension that indicates the number of the expert sub-models to which input tokens included in the first input tensor are transmitted in the first collective communication phase (This limitation is directed to a particular type or source of data, which is field of use and it does not integrate the abstract idea into a practical application);
processing the first output tensors at a respective a plurality of expert sub-models to obtain a plurality of second input tensors (this limitation is directed to insignificant extra-solution activity of data gathering between plurality of input sensors and it is well understood routine and conventional. This does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), example i); 
during a second collective communication phase between the plurality of processing devices (this limitation is directed to insignificant extra-solution activity of data transmission between plurality of processing devices and it is well understood routine and conventional. This does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), example i):
 receiving the plurality of second input tensors from the plurality of expert sub-models (this limitation is directed to insignificant extra-solution activity of data transmission between input tensors from the plurality of expert sub-models and it is well understood routine and conventional. This does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), example i); and
outputting the second output tensors to an additional computing process as output of the MoE layer (this limitation is directed to insignificant extra-solution activity of data transfer of outputting tensors and it is well understood routine and conventional. This does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), example i).  

4.	Dependent claim 3 is directed to a system, and falls into one of the four statutory categories.
Claim 3 recite the following abstract ideas:
the plurality of first output tensors each have a size in the expert number dimension equal to the local expert number (Mental process directed to observing the first output tensors size and making a judgement whether the output tensor size in the expert number dimension is equal to the local expert number).  

Claim 3 recite the following additional elements:
wherein: a same local expert number of the plurality of expert sub-models are executed at each of the plurality of processing devices configured to execute the expert sub-models (this limitation is directed to mere instruction to apply a judicial exception. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(f)); and 

Claim 3 recite the following additional elements:
wherein: a same local expert number of the plurality of expert sub-models are executed at each of the plurality of processing devices configured to execute the expert sub-models (this limitation is directed to mere instruction to apply a judicial exception. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)); and 

5.	Dependent claim 4 is directed to a system, and falls into one of the four statutory categories.
Claim 4 recite the following abstract ideas:
concatenating the plurality of first input tensors along a second dimension when computing the plurality of first output tensors during the first collective communication phase (Mental process directed to joining each of the plurality of input tensors which can be carried out a human using a pen and paper as a physical aid); and
 splitting each of the plurality of second input tensors along the second dimension when computing the plurality of second output tensors during the second collective communication phase (Mental process directed to dividing each of the plurality of input tensors which can be carried out a human using a pen and paper as a physical aid).  

Claim 4 recite the following additional elements:
wherein the plurality of processing devices are further configured to execute the MoE layer at least in part by (this limitation is directed to mere instruction to apply a judicial exception. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(f)): 

Claim 4 recite the following additional elements:
wherein the plurality of processing devices are further configured to execute the MoE layer at least in part by (this limitation is directed to mere instruction to apply a judicial exception. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)): 

6.	Dependent claim 5 is directed to a system, and falls into one of the four statutory categories.
Claim 5 recite the following abstract ideas:
wherein the second dimension is a token number dimension (Mental process directed to observing the second dimension and making a judgement that the second dimension is a token dimension).  
Claim 5 do not recite any additional elements.

7.	Dependent claim 6 is directed to a system, and falls into one of the four statutory categories.
Claim 6 recite the following abstract ideas:
the plurality of first output tensors each have a size in the token number dimension equal to the per-processing-device token number (Mental process directed to observing the size of the plurality of first output tensors and making a judgement that each size of output tensors in the token number dimension is equal to the processing device token number).

Claim 6 recite the following additional elements:
wherein: a same per-processing-device token number of tokens are processed at each of the plurality of processing devices configured to execute the expert sub-models (this limitation is directed to mere instruction to apply a judicial exception. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(f)); and  

Claim 6 recite the following additional elements:
wherein: a same per-processing-device token number of tokens are processed at each of the plurality of processing devices configured to execute the expert sub-models (this limitation is directed to mere instruction to apply a judicial exception. This does not amount to significantly more than judicial exception. See MPEP 2106.05(f)); and

8.	Dependent claim 7 is directed to a system, and falls into one of the four statutory categories.
Claim 7 do not recite any abstract ideas.
Claim 7 recite the following additional elements:
wherein: the plurality of processing devices are provided at least in part in a plurality of nodes that each include two or more of the plurality of processing devices (this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(h));
 and the first collective communication phase and the second collective communication phase each include (this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)): 
intra-node collective communications performed between the two or more processing devices included in each of the plurality of nodes (this limitation is directed to insignificant extra-solution activity of data transmission. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(g)); and 
inter-node collective communications performed between the plurality of nodes (this limitation is directed to insignificant extra-solution activity of data transmission. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(g)).  

Claim 7 recite the following additional elements:
wherein: the plurality of processing devices are provided at least in part in a plurality of nodes that each include two or more of the plurality of processing devices this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h);
 and the first collective communication phase and the second collective communication phase each include (this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h)): 
intra-node collective communications performed between the two or more processing devices included in each of the plurality of nodes (this limitation is directed to insignificant extra-solution activity of data transmission and it is well understood routine and conventional. This does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), example i); and 
inter-node collective communications performed between the plurality of nodes (this limitation is directed to insignificant extra-solution activity of data transmission and it is well understood routine and conventional. This does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), example i).  

9.	Dependent claim 8 is directed to a system, and falls into one of the four statutory categories.
Claim 8 do not recite any abstract ideas.
Claim 8 recite the following additional elements:
wherein, prior to the intra-node collective communications, the plurality of processing devices are further configured to reorganize a first plurality of memory regions of respective memory devices associated with the plurality of processing devices at least in part by performing a first plurality of strided memory copy operations on the first plurality of memory regions (this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)).  

Claim 8 recite the following additional elements:
wherein, prior to the intra-node collective communications, the plurality of processing devices are further configured to reorganize a first plurality of memory regions of respective memory devices associated with the plurality of processing devices at least in part by performing a first plurality of strided memory copy operations on the first plurality of memory regions (this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h)).  

10.	Dependent claim 9 is directed to a system, and falls into one of the four statutory categories.
Claim 9 do not recite any abstract ideas.
Claim 9 recite the following additional elements:
wherein, subsequently to performing the intra-node collective communications, the plurality of processing devices are further configured to further reorganize a second plurality of memory regions at least in part by performing a second plurality of strided memory copy operations on the second plurality of memory regions (this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)). 

Claim 9 recite the following additional elements:
wherein, subsequently to performing the intra-node collective communications, the plurality of processing devices are further configured to further reorganize a second plurality of memory regions at least in part by performing a second plurality of strided memory copy operations on the second plurality of memory regions (this limitation is directed to linking the use of a judicial exception to a particular technological environment or field of use. This does not amount to significantly more than judicial exception. See MPEP 2106.05(h)). 

 
11.	Dependent claim 10 is directed to a system, and falls into one of the four statutory categories.
Claim 10 recite the following abstract ideas:
	wherein: when reorganizing the first plurality of memory regions, the plurality of processing devices are further configured to aggregate a plurality of first memory chunks that have a same destination processing device to which the first memory chunks are configured to be transmitted during the intra-node collective communications (Mental process directed to organizing memory regions and aggregating memory chunks which can be done with a pen and paper); and
 when reorganizing the second plurality of memory regions, the plurality of processing devices are further configured to aggregate a plurality of second memory chunks that have a same destination processing device to which the second memory chunks are configured to be transmitted during the inter-node collective communications (Mental process directed to organizing memory regions and aggregating memory chunks which can be done with a pen and paper).  
	Claim 10 do not recite any additional elements.

12.	Dependent claim 11 is directed to a system, and falls into one of the four statutory categories.
Claim 11 recite the following abstract ideas:
respective first input tensors received in the plurality of iterations each have a same size in the second dimension across the plurality of iterations (Mental process directed to first input tensors which have the same size in the second dimension which can be done by observing the input tensors and making a judgement for the input tensors to have the same size in a second dimension); and
the respective first output tensors computed in each iteration have differing respective sizes in the second dimension (Mental process directed to first input tensors computed to have different sizes in the second dimension which can be done by observing the input tensors and making a judgement for the input tensors to have different sizes in the second dimension).

Claim 11 recites the following additional elements:
wherein: the first collective communication phase and the second collective communication phase are performed in each of a plurality of iterations (this limitation is directed to insignificant extra-solution activity of data transmission. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(g)); 

Claim 11 recites the following additional elements:
wherein: the first collective communication phase and the second collective communication phase are performed in each of a plurality of iterations (this limitation is directed to insignificant extra-solution activity of data transmission and it is well understood routine and conventional. This does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), example i); 

13.	Independent claim 12 is directed to a method, and falls into one of the four statutory categories.
	With regards to claim 12, it is substantially similar to claim 1, and is rejected in the same manner and reasoning applying.
14.	Dependent claim 14 is directed to a method, and falls into one of the four statutory categories.
	With regards to claim 14, it is substantially similar to claim 4, and is rejected in the same manner and reasoning applying.

15.	Dependent claim 15 is directed to a method, and falls into one of the four statutory categories.
	With regards to claim 15, it is substantially similar to claim 5, and is rejected in the same manner and reasoning applying.

16.	Dependent claim 16 is directed to a method, and falls into one of the four statutory categories.
	With regards to claim 16, it is substantially similar to claim 7, and is rejected in the same manner and reasoning applying.

17.	Dependent claim 17 is directed to a method, and falls into one of the four statutory categories.
	With regards to claim 17, it is substantially similar to claims 8 and 9, and is rejected in the same manner and reasoning applying.

18.	Dependent claim 18 is directed to a method, and falls into one of the four statutory categories.
	With regards to claim 18, it is substantially similar to claim 10, and is rejected in the same manner and reasoning applying.

19.	Dependent claim 19 is directed to a method, and falls into one of the four statutory categories.
	With regards to claim 19, it is substantially similar to claim 11, and is rejected in the same manner and reasoning applying.

20.	Dependent claim 20 is directed to a method, and falls into one of the four statutory categories.
	With regards to claim 20, it is substantially similar to claim 1, and is rejected in the same manner and reasoning applying.
	Further, claim 20 recites additional elements “intra-node collective communication performed between the two or more processing devices included in each of the plurality of nodes; and inter-node collective communication performed between the plurality of nodes”, (this limitation is directed to insignificant extra-solution activity of data transmission. This does not integrate the abstract idea into a practical application. See MPEP 2106.05(g)

	Claim 20 recites additional elements “intra-node collective communication performed between the two or more processing devices included in each of the plurality of nodes; and inter-node collective communication performed between the plurality of nodes”, (this limitation is directed to insignificant extra-solution activity of data transmission and it is well understood routine and conventional. This does not amount to significantly more than judicial exception. See MPEP 2106.05(d)(II), example i)
	
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

21.	Claims 1 and 12 are rejected under 35 U.S.C 102(a)(1) as being anticipated by Riquelme et al. ("Scaling vision with sparse mixture of experts." Advances in Neural Information Processing Systems 34 (2021): 8583-8595).

Regarding claim 1, Riquelme teaches a computing system (Figure 1: Overview of the architecture., pg. 2, Fig. 1; Figure 2a shows the quality of different V-MoE and ViT variants with respect to total training compute and time,  pg. 4, section 3.3) 
comprising: a plurality of processing devices (Device 1 ... Device E, bottom Fig. 1) 
configured to execute a Mixture-of-Experts (MoE) layer (Sparse MoE, bottom Fig. 1) included in an MoE model (we experimented with using fewer MoE layers, pg. 4, section 3.1; We show 4 MoE layers of a V-MoE-H/14. The x-axis corresponds to the 32 experts in a layer, pg. 8, Fig. 7; We .... present for deep learning a mixture of experts layer with E experts as 
    PNG
    media_image1.png
    26
    598
    media_image1.png
    Greyscale
the function computed by expert I, pg. 2, last para.) at least in part by: executing a gating function (Our approach is ... proposed a top-k gating in LSTMs, pg. 9, section 6) to select a plurality of expert sub-models included in the MoE layer (Sparse MoE, bottom Fig. 1), 
wherein a number of the expert sub-models is dynamically selected (We define the buffer capacity of an expert (Be) as a function of the number of ... the number of selected experts per token (k), pg. 3, section 2.4); 
during a first collective communication phase between the plurality of processing devices (Similar conclusions hold for training time, which includes communication overhead of dispatching data across devices, pg. 5, first para.), 
splitting each of a plurality of first input tensors along a first dimension (With the notation from Section 2, the routing function g is applied row-wise to a batch of inputs X ∈ RN⋅P×D. A batch contains N images composed of P tokens each; each row of X corresponds to the D-dimensional representation of a particular token of an image, pg. 6, last para to pg. 7, first para.) to obtain a plurality of first output tensors (2 rows x 8 column matrix which is output from routers in bottom Fig. 1),
 wherein the first dimension (each row of X corresponds to the D-dimensional representation of a particular token of an image, pg. 6, last para to pg. 7, first para.) is an expert number dimension that indicates the number of the expert sub-models to which input tokens included in the first input tensor are transmitted in the first collective communication phase (The communication of these tokens between devices is shown in this example, which depicts the case when k = 1 expert is selected per token, pg. 2, Fig. 1); 
processing the first output tensors at a respective expert sub-models of the plurality of expert sub-models (2 rows x 8 column matrix which is output from routers in bottom Fig. 1) to obtain a plurality of second input tensors (2 rows x 8 column matrices as inputs is to be received MLP1 ... MLPE, bottom Fig. 1); 
during a second collective communication phase (Each MLP (the expert) is stored on a separate device, pg. 2, Fig. 1) between the plurality of processing devices (Device 1 ... Device E, bottom Fig. 1): 
receiving the plurality of second input tensors from the plurality of expert sub-models (MLP1 ... MLPE receives 2 rows x 8 columns matrices as inputs); and 
concatenating the plurality of second input tensors along the first dimension (2 rows x 8 column matrix as inputs each from MLP1 ... MLPE are concatenated, bottom Fig. 1) to obtain a plurality of second output tensors (plurality of 3 rows x 4 columns matrices as second output tensors, bottom Fig.1); and
 outputting the second output tensors to an additional computing process as output of the MoE layer (Sparse MoE in top Fig. 1 outputs the second output tensors).
Regarding claim 12, claim 12 is similar to claim 1. It is rejected in the same manner and reasoning applying.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



22.	Claims 3, 4, 7, 14, 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Riquelme et al. ("Scaling vision with sparse mixture of experts." Advances in Neural Information Processing Systems 34 (2021): 8583-8595) in view of Rhodes et al. (US20220301097 filed 06/03/2022) 

Regarding claim 3, Riquelme teaches the computing system of claim 1, Riquelme does not explicitly teach limitations of claim 3.
 Rhodes teaches wherein: a same local expert number of the plurality of expert sub-models (The Examiner notes this indicates 1 core or each core in Fig. 10 represents processor circuitry 912, and 912 includes positional encoding circuitry 108 as expert sub-models in Fig. 9; Furthermore, a positional encoding circuitry 108 as expert sub-model in Fig. 9 reads on the same local expert number of the plurality of expert sub-model 40 in Fig. 12 of instant specification) are executed at each of the plurality of processing devices configured to execute the expert sub-models (Although it may include any number of example cores 1002 (e.g., 1 core) [0100] The Examiner notes this indicates 1 core or each core in Fig. 10 represents processor circuitry 912, and 912 includes positional encoding circuitry 108 as expert sub-models in Fig. 9); and 
the plurality of first output tensors each have a size in the expert number dimension equal to the local expert number (For example, the number of generated tensor representations may equal the number of crops or patches of pixels that an image (e.g., represented by the image data) is divided into [0078]).  
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme to incorporate the teachings of Rhodes for the benefit of increasing execution speed (Rhodes [0108])

Regarding claim 4, Riquelme teaches the computing system of claim 1, Rhodes teaches wherein the plurality of processing devices are further configured to execute the MoE layer at least in part by: concatenating the plurality of first input tensors (positional tensor representations 204 receives spatially partitioned tensor crops 202, Fig. 2B) along a second dimension (The embedded tensor representation 206 is illustrated in Equation 5 … In Equation 5, the embedding representation z0 includes a sequence of flattened 2D patches [0039-0040]) when computing the plurality of first output tensors (In the illustrated example of FIG. 2B, embedded tensor generation circuitry 110 accesses the series of positional tensor representations 204 and produces an embedded tensor representation 206, which is a tensor (e.g., vector) value representing the series of positional tensor representations concatenated together [0039]) during the first collective communication phase (The cores 1002 of the microprocessor 1000 … may cooperate to execute machine readable instructions [0100], Fig. 10; The Examiner notes the first collective communication phase is between CORE 1 and CORE 2, in Fig. 10); and 
splitting each of the plurality of second input tensors along the second dimension when computing the plurality of second output tensors  … comprising instructions that, when executed, cause processor circuitry to at least partition an input tensor into a plurality of tensor crops [0125]) during the second collective communication phase (second collective communication phase is between CORE 3 and CORE 4, in Fig. 10).  
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme to incorporate the teachings of Rhodes for the benefit of increasing execution speed (Rhodes [0108])

Regarding claim 7, Riquelme teaches the computing system of claim 1, Rhodes teaches wherein: the plurality of processing devices (Each CORE in Fig. 10) are provided at least in part in a plurality of nodes (Nodes 1014 and 1016 in Fig. 10) that each include two or more of the plurality of processing devices (Control unit circuitry and AL Circuitry, Fig. 10; The control unit circuitry 1014 includes semiconductor-based circuits structured to control (e.g., coordinate) data movement within the corresponding core 1002. The AL circuitry 1016 includes semiconductor-based circuits structured to perform one or more mathematic and/or logic operations on the data within the corresponding core 1002 [0102]. The Examiner notes control unit circuitry 1014 and AL circuitry 1016 are communicatively connected via a bus); and 
the first collective communication phase (first collective communication phase is between CORE 1 and CORE 2, in Fig. 10)  and the second collective communication phase (second collective communication phase is between CORE 3 and CORE 4, in Fig. 10) each include: 
intra-node collective communications performed between the two or more processing devices included in each of the plurality of nodes (control unit circuitry 1014 includes semiconductor-based circuits ... The AL circuitry 1016 includes semiconductor-based circuits [0102]. The Examiner notes that the circuits in 1014 and 1016 perform intra-node collective communication); and 
inter-node collective communications performed between the plurality of nodes (Each core 1002 includes control unit circuitry 1014, arithmetic and logic (AL) circuitry (sometimes referred to as an ALU) 1016 … and an example bus 1022 [0102]. The Examiner notes control unit circuitry 1014 and AL circuitry 1016 are communicatively connected via a bus).  
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme to incorporate the teachings of Rhodes for the benefit of increasing execution speed (Rhodes [0108])

Regarding claim 14, claim 14 is similar to claim 4. It is rejected in the same manner and reasoning applying.

Regarding claim 16, claim 16 is similar to claim 7. It is rejected in the same manner and reasoning applying.

Regarding claim 20, claim 20 is similar to claim 1. It is rejected in the same manner and reasoning applying. Further, Rhodes teaches wherein: the plurality of processing devices (Each CORE in Fig. 10) are provided at least in part in a plurality of nodes (Nodes 1014 and 1016 in Fig. 10) that each include two or more of the plurality of processing devices (Control unit circuitry and AL Circuitry, Fig. 10; The control unit circuitry 1014 includes semiconductor-based circuits structured to control (e.g., coordinate) data movement within the corresponding core 1002. The AL circuitry 1016 includes semiconductor-based circuits structured to perform one or more mathematic and/or logic operations on the data within the corresponding core 1002 [0102]. The Examiner notes control unit circuitry 1014 and AL circuitry 1016 are communicatively connected via a bus); and 
the first collective communication phase (first collective communication phase is between CORE 1 and CORE 2, in Fig. 10) and 
the second collective communication phase (second collective communication phase is between CORE 3 and CORE 4, in Fig. 10) each include: 
intra-node collective communication performed between the two or more processing devices included in each of the plurality of nodes (control unit circuitry 1014 includes semiconductor-based circuits ... The AL circuitry 1016 includes semiconductor-based circuits [0102]. The Examiner notes that the circuits in 1014 and 1016 perform intra-node collective communication); and 
inter-node collective communication performed between the plurality of nodes (Each core 1002 includes control unit circuitry 1014, arithmetic and logic (AL) circuitry (sometimes referred to as an ALU) 1016 … and an example bus 1022 [0102]. The Examiner notes control unit circuitry 1014 and AL circuitry 1016 are communicatively connected via a bus).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme to incorporate the teachings of Rhodes for the benefit of increasing execution speed (Rhodes [0108])

23.	Claims 5, 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Riquelme et al. ("Scaling vision with sparse mixture of experts." Advances in Neural Information Processing Systems 34 (2021): 8583-8595) in view of Rhodes et al. (US20220301097 filed 06/03/2022) in view of Choudhury et al. (US20220253716 filed 01/04/2022)

Regarding claim 5, Riquelme and Rhodes teaches the computing system of claim 4, Riquelme and Rhodes does not explicitly teach the limitation of claim 5.
Choudhury teaches wherein the second dimension is a token number dimension (4x3 matrix with four rows and three columns, Fig. 9.The Examiner notes second dimension which is a token number dimension is the row dimension of the 4x3 matrix). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme  and Rhodes to incorporate the teachings of Choudhury for the benefit of performance improvements which include one or more of increased computational performance, reduced latency, increased throughput, and/or reduced power consumption (Choudhury [0160]) 

Regarding claim 6, Riquelme, Rhodes and Choudhury teaches the  computing system of claim 5, Rhodes teaches wherein: a same per-processing-device token number of tokens are processed at each of the plurality of processing devices configured to execute the expert sub-models (The Examiner notes this indicates 1 core or each core in Fig. 10 represents processor circuitry 912, and 912 includes decoder circuitry 120 as expert sub-model in Fig. 9; For example, machine code corresponding to a firmware program …or a software program may be executed by one of the cores 1002 ... at the same … times [0100]); and 
the plurality of first output tensors each have a size in the token number dimension equal to the per-processing-device token number (For example, the number of generated tensor representations may equal the number of crops or patches of pixels that an image (e.g., represented by the image data) is divided into [0078]).  
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme to incorporate the teachings of Rhodes for the benefit of increasing execution speed (Rhodes [0108])

Regarding claim 15, claim 15 is similar to claim 5. It is rejected in the same manner and reasoning applying.

24.	Claims 8-10, 17 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Riquelme et al. ("Scaling vision with sparse mixture of experts." Advances in Neural Information Processing Systems 34 (2021): 8583-8595) in view of Rhodes et al. (US20220301097 filed 06/03/2022) in view of Jain et al. (US20240007414 filed 06/25/2021)

Regarding claim 8, Riquelme and Rhodes teaches the computing system of claim 7, Riquelme and Rhodes does not explicitly teach wherein, prior to the intra-node collective communications, the plurality of processing devices are further configured to reorganize a first plurality of memory regions of respective memory devices associated with the plurality of processing devices at least in part by performing a first plurality of strided memory copy operations on the first plurality of memory regions.   
Jain teaches wherein, prior to the intra-node collective communications (The control unit circuitry D314 includes semiconductor-based circuits … The AL circuitry D316 includes semiconductor-based circuits [0149]. The Examiner nodes that circuits in D314 and D316 perform intra-node collective communications), 
the plurality of processing devices are further configured to reorganize a first plurality of memory regions of respective memory devices associated with the plurality of processing devices (a plurality of registers D318 in CORE 1, Fig. D3; Alternatively, the registers D318 may be organized in any other arrangement, format, or structure including distributed throughout the core D302 to shorten access time [0149]) at least in part by performing a first plurality of strided memory copy operations on the first plurality of memory regions (An example static attributer ID6_510 is an example structure with real value quantities that can include (e.g., store) operator hyperparameters (e.g., …, stride) and hardware attributes (e.g., number of processing elements, cache size) … The static attributor ID6_510 is a memory (e.g., storage) that includes these hyperparameters (e.g., attributes, properties) [0516]).  
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme  and Rhodes to incorporate the teachings of Jain for the benefit of improving performance related to cache space and memory bandwidth (Jain [0290])                                                                                                                                                                                                                                                                                                            

Regarding claim 9, Riquelme, Rhodes and Jain teaches the computing system of claim 8, Jain teaches wherein, subsequently to performing the intra-node collective communications (The control unit circuitry D314 includes semiconductor-based circuits … The AL circuitry D316 includes semiconductor-based circuits [0149]. The Examiner nodes that circuits in D314 and D316 perform intra-node collective communications), 
the plurality of processing devices are further configured to further reorganize a second plurality of memory regions (a plurality of registers D318 in CORE 2, Fig. D3; Alternatively, the registers D318 may be organized in any other arrangement, format, or structure including distributed throughout the core D302 to shorten access time [0149]) at least in part by performing a second plurality of strided memory copy operations on the second plurality of memory regions (An example static attributer ID6_510 is an example structure with real value quantities that can include (e.g., store) operator hyperparameters (e.g., …, stride) and hardware attributes (e.g., number of processing elements, cache size) … The static attributor ID6_510 is a memory (e.g., storage) that includes these hyperparameters (e.g., attributes, properties) [0516]).  
The same motivation to combine dependent claim 8 applies here.

Regarding claim 10, Riquelme, Rhodes and Jain teaches the computing system of claim 9, Jain teaches wherein: when reorganizing the first plurality of memory regions (a plurality of registers D318 in CORE 1, Fig. D3; Alternatively, the registers D318 may be organized in any other arrangement, format, or structure including distributed throughout the core D302 to shorten access time [0149]), 
the plurality of processing devices are further configured to aggregate a plurality of first memory chunks that have a same destination processing device to which the first memory chunks (The registers D318 are semiconductor-based structures to store data and/or instructions such as results of one or more of the operations performed by the AL circuitry D316 of the corresponding core D302 [0149]; a distributed platform that provides computing and storage resources to perform processing or data-intensive tasks such as data analytics, data aggregation, and machine-learning, among others [0177]) are configured to be transmitted during the intra-node collective communications (In yet other examples, the AL circuitry D316 may include first AL circuitry that performs integer based operations and second AL circuitry that performs floating point operations [0149]); and
 when reorganizing the second plurality of memory regions (a plurality of registers D318 in CORE 1, Fig. D3; Alternatively, the registers D318 may be organized in any other arrangement, format, or structure including distributed throughout the core D302 to shorten access time [0149]), the plurality of processing devices are further configured to aggregate a plurality of second memory chunks that have a same destination processing device to which the second memory chunks (The registers D318 are semiconductor-based structures to store data and/or instructions such as results of one or more of the operations performed by the AL circuitry D316 of the corresponding core D302 [0149]; a distributed platform that provides computing and storage resources to perform processing or data-intensive tasks such as data analytics, data aggregation, and machine-learning, among others [0177]) are configured to be transmitted during the inter-node collective communications (Each core D302 includes control unit circuitry D314, arithmetic and logic (AL) circuitry (sometimes referred to as an ALU) D316 … and an example bus D322 [0149]. The Examiner notes D314 and D316 are communicatively connected via a bus D322).  
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme and Rhodes to incorporate the teachings of Jain for the benefit of improving performance related to cache space and memory bandwidth (Jain [0290])   

           Regarding claim 17, Riquelme and Rhodes teaches the method of claim 16, Riquelme and Rhodes does not explicitly teach further comprising: prior to the intra-node collective communications, reorganizing a first plurality of memory regions of respective memory devices associated with the plurality of processing devices at least in part by performing a first plurality of strided memory copy operations on the first plurality of memory regions; and subsequently to performing the intra-node collective communications, reorganizing a second plurality of memory regions at least in part by performing a second plurality of strided memory copy operations on the second plurality of memory regions.
          Jain teaches further comprising: prior to the intra-node collective communications (The control unit circuitry D314 includes semiconductor-based circuits … The AL circuitry D316 includes semiconductor-based circuits [0149]. The Examiner nodes that circuits in D314 and D316 perform intra-node collective communications), 
           reorganizing a first plurality of memory regions of respective memory devices associated with the plurality of processing devices (a plurality of registers D318 in CORE 1, Fig. D3; Alternatively, the registers D318 may be organized in any other arrangement, format, or structure including distributed throughout the core D302 to shorten access time [0149]) at least in part by performing a first plurality of strided memory copy operations on the first plurality of memory regions (An example static attributer ID6_510 is an example structure with real value quantities that can include (e.g., store) operator hyperparameters (e.g., …, stride) and hardware attributes (e.g., number of processing elements, cache size) … The static attributor ID6_510 is a memory (e.g., storage) that includes these hyperparameters (e.g., attributes, properties) [0516]); and 
           subsequently to performing the intra-node collective communications (The control unit circuitry D314 includes semiconductor-based circuits … The AL circuitry D316 includes semiconductor-based circuits [0149]. The Examiner nodes that circuits in D314 and D316 perform intra-node collective communications),  
       reorganizing a second plurality of memory regions (a plurality of registers D318 in CORE 2, Fig. D3; Alternatively, the registers D318 may be organized in any other arrangement, format, or structure including distributed throughout the core D302 to shorten access time [0149]) at least in part by performing a second plurality of strided memory copy operations on the second plurality of memory regions (An example static attributer ID6_510 is an example structure with real value quantities that can include (e.g., store) operator hyperparameters (e.g., …, stride) and hardware attributes (e.g., number of processing elements, cache size) … The static attributor ID6_510 is a memory (e.g., storage) that includes these hyperparameters (e.g., attributes, properties) [0516]). 
          It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme and Rhodes to incorporate the teachings of Jain for the benefit of improving performance related to cache space and memory bandwidth (Jain [0290])   

 Regarding claim 18, claim 18 is similar to claim 10. It is rejected in the same manner and reasoning applying.

25.	Claims 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Riquelme et al. ("Scaling vision with sparse mixture of experts." Advances in Neural Information Processing Systems 34 (2021): 8583-8595) in view of Jain et al. (US20240007414 filed 06/25/2021)

Regarding claim 11, Riquelme teaches the computing system of claim 1, Riquelme does not explicitly teach wherein: the first collective communication phase and the second collective communication phase are performed in each of a plurality of iterations; respective first input tensors received in the plurality of iterations each have a same size in the second dimension across the plurality of iterations; and the respective first output tensors computed in each iteration have differing respective sizes in the second dimension.
Jain teaches wherein: the first collective communication phase (first collective communication phase is between CORE 1 and CORE 2, in Fig. D3) and the second collective communication phase (The cores D302 of the microprocessor D300 may operate independently or may cooperate to execute machine readable instructions [0349]. The Examiner notes the second collective communication phase is between CORE 3 and CORE 4, in Fig. D3) are performed in each of a plurality of iterations; respective first input tensors received in the plurality of iterations each have a same size in the second dimension across the plurality of iterations (In such circumstances, the alterations may be relatively smaller and/or otherwise proportional to the amount of change in a calculated value from one iteration to the next [0497]); and
 the respective first output tensors computed in each iteration have differing respective sizes in the second dimension (The service iterates through ML algorithms paired with feature selections, where each iteration produces a model with a training score [0395]).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of Riquelme to incorporate the teachings of Jain for the benefit of improving performance related to cache space and memory bandwidth (Jain [0290])    
	
Regarding claim 19, claim 19 is similar to claim 11. It is rejected in the same manner and reasoning applying.

Conclusion
	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to MORIAM MOSUNMOLA GODO whose telephone number is (571)272-8670. The examiner can normally be reached Monday-Friday 8am-5pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle T Bechtold can be reached on (571) 431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/M.G./Examiner, Art Unit 2148  
/MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Nov 10, 2022
Application Filed
Nov 10, 2025
Non-Final Rejection mailed — §101, §102, §103
Jan 15, 2026
Applicant Interview (Telephonic)
Jan 16, 2026
Examiner Interview Summary
Feb 06, 2026
Response Filed
Apr 27, 2026
Final Rejection mailed — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

16/927,018
Patent 12639556
Object-Centric Learning with Slot Attention
5y 10m to grant Granted May 26, 2026
18/583,459
Patent 12608609
MACHINE LEARNING BASED FILE RANKING METHODS AND SYSTEMS
2y 2m to grant Granted Apr 21, 2026
18/919,417
Patent 12602586
SUPERVISORY NEURON FOR CONTINUOUSLY ADAPTIVE NEURAL NETWORK
1y 5m to grant Granted Apr 14, 2026
17/096,425
Patent 12530583
VOLUME PRESERVING ARTIFICIAL NEURAL NETWORK AND SYSTEM AND METHOD FOR BUILDING A VOLUME PRESERVING TRAINABLE ARTIFICIAL NEURAL NETWORK
5y 2m to grant Granted Jan 20, 2026
16/249,279
Patent 12511528
NEURAL NETWORK METHOD AND APPARATUS
6y 11m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
44%
Grant Probability
77%
With Interview (+33.7%)
4y 7m (~1y 0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 69 resolved cases by this examiner. Grant probability derived from career allowance rate.