Prosecution Insights
Last updated: April 19, 2026
Application No. 18/127,105

AI COMPUTING PLATFORM, AI COMPUTING METHOD, AND AI CLOUD COMPUTING SYSTEM

Final Rejection §101§103§112
Filed
Mar 28, 2023
Examiner
LIN, HSING CHUN
Art Unit
2195
Tech Center
2100 — Computer Architecture & Software
Assignee
Montage Technology Co. Ltd.
OA Round
2 (Final)
59%
Grant Probability
Moderate
3-4
OA Rounds
3y 4m
To Grant
99%
With Interview

Examiner Intelligence

Grants 59% of resolved cases
59%
Career Allow Rate
64 granted / 108 resolved
+4.3% vs TC avg
Strong +80% interview lift
Without
With
+79.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 4m
Avg Prosecution
37 currently pending
Career history
145
Total Applications
across all art units

Statute-Specific Performance

§101
17.1%
-22.9% vs TC avg
§103
35.8%
-4.2% vs TC avg
§102
6.5%
-33.5% vs TC avg
§112
34.0%
-6.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 108 resolved cases

Office Action

§101 §103 §112
DETAILED ACTION The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-14 are pending in this application. Response to Arguments Applicant’s arguments regarding the rejections of claims 1-14 under 35 U.S.C. 112b have been fully considered and are persuasive. The rejections have been withdrawn. However, new 35 U.S.C. 112b rejections are applied to claims 1-14 based on the amendments. Applicant's arguments regarding the 35 U.S.C. 101 rejections of claims 1-14 have been fully considered but they are not persuasive. Regarding the 35 U.S.C. 101 rejection, the applicant argues the following in the remarks: The claims are directed to a practical application that provides a technical solution to a technical problem in distributed computing architectures. Examiner has thoroughly considered Applicant’s arguments, but respectfully finds them unpersuasive for at least the following reasons: As to point (a), the examiner respectfully disagrees. The claims do not reflect the improvement recited in the specification (see MPEP 2106.04(d)(1) if the specification sets forth an improvement in technology, the claim must be evaluated to ensure that the claim itself reflects the disclosed improvement. That is, the claim includes the components or steps of the invention that provide the improvement described in the specification.). For example, one of the improvements that Applicant points to is optimized resource utilization since each specialized module processes a corresponding subtask of the package according to its capabilities. However, that is not what the claims recite. For example in claim 1, it merely recites that each of the plurality of near-memory computing modules is configured to complete one or more of the plurality of ordered subtasks according to the respective operation types of the different oppression types that the plurality of near-memory computing modules implement, and does not recite each of the plurality of near-memory computing modules actually executes the one or more of the plurality of ordered subtasks that the respective near-memory computing module is capable of executing. Applicant also points to [0040] of the specification that recites the improvement of reducing unified scheduling load of the processor. However, Applicant does not recite the entirety of [0040] in the remarks. Paragraph [0040] recites “In the implementation of an embodiment of the present application, the processor decomposes a calculation task into a plurality of ordered subtasks based on a network topology information table, and generates a package based on the near-memory segmented computing protocol (SC4NCM). When receiving the package, each near-memory computing module processes the corresponding subtask according to the operation type that it can perform. This application reduces unified scheduling load of the processor”. Therefore, it is the decomposition of a task into a plurality of ordered subtasks and having each near-memory computing module processing a corresponding subtask according to the operation type it can perform that reduces unified scheduling load of the processor. As explained above, the claims do not recite this, so the claims do not realize the improvement. Applicant's arguments regarding the 35 U.S.C. 103 rejections of claims 1-14 have been fully considered but are moot in light of the references being applied in the current rejection. Drawings The drawings filed on 11/13/2025 are objected to as failing to comply with 37 C.F.R 1.84(p)(1) and 37 C.F.R. 1.84(q) as recited here: “Reference characters (numerals are preferred), sheet numbers, and view numbers must be plain and legible, and must not be used in association with brackets or inverted commas, or enclosed within outlines, e.g., encircled. They must be oriented in the same direction as the view so as to avoid having to rotate the sheet. Reference characters should be arranged to follow the profile of the object depicted” and “Lead lines are those lines between the reference characters and the details referred to. Such lines may be straight or curved and should be as short as possible. They must originate in the immediate proximity of the reference character and extend to the feature indicated. Lead lines must not cross each other. Lead lines are required for each reference character except for those which indicate the surface or cross section on which they are placed. Such a reference character must be underlined to make it clear that a lead line has not been left out by mistake. Lead lines must be executed in the same way as lines in the drawing. See paragraph (l) of this section.” Figures 2 and 6 show circled reference characters. Figures 2 and 6 do not show underlined reference characters for reference characters that are on a surface. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance. Specification The amendment filed 11/13/2025 is objected to under 35 U.S.C. 132(a) because it introduces new matter into the disclosure. 35 U.S.C. 132(a) states that no amendment shall introduce new matter into the disclosure of the invention. The added material which is not supported by the original disclosure is as follows: “The payload may comprise data a with address0, data b with address1, and data c with address2.” Figure 3 shows addresses 0, 1, and 2, but the drawing does not show if a, b, and c are pieces of data or something else such as tasks. Applicant is required to cancel the new matter in the reply to this Office Action. Claim Rejections - 35 USC § 112 The following is a quotation of the first paragraph of 35 U.S.C. 112(a): (a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention. The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112: The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention. Claims 1-14 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. As per claims 1, 8, and 14 (line numbers refer to claim 1): Lines 16-21 recite “receive the package either initially from the processor or as modified by another near-memory computing module of the plurality of near-memory computing modules, process a corresponding subtask of the package to modify the package, and transmit the package as modified directly to either (i) a subsequent near-memory computing module for processing a subsequent subtask, or (ii) the processor after processing a last subtask of the plurality of ordered subtasks” but the specification does not recite that the package is modified. The specification recites “processing, by the near-memory computing modules, a corresponding subtask when receiving the package to generate a processed package, and transmitting the processed package to a next near-memory computing module that processes a next subtask until completing all the subtasks, and then routing to the processor connecting to the near-memory computing module that processes the last subtask”. The package is merely processed, not modified. Claims 2-7 and 9-13 are dependent claims of claims 1 and 8 and fail to resolve the deficiencies of claims 1 and 8, so they are rejected for the same reasons. The following is a quotation of 35 U.S.C. 112(b): (b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention. Claims 1-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. As per claims 1 and 14 (line numbers refer to claim 1): Line 7 recites “the decomposed plurality of ordered subtasks” whereas lines 13-14 recite “the plurality of ordered subtasks”. It is unclear if they refer to the same plurality of ordered subtasks or different ones. Lines 16-18 recite “receive the package either initially from the processor or as modified by another near-memory computing module of the plurality of near-memory computing modules, process a corresponding subtask of the package to modify the package” but it is unclear what it means to process the corresponding subtask of the package to modify the package if the received package is already modified by another near-memory computing module. Line 19 recites “transmit the package as modified” but it is unclear if this modification is by the other near-memory computing module or if this modification is as a result of processing the corresponding subtask of the package. As per claim 5: Lines 2-3 recite “an IP address of a host where each near-memory computing module sequentially processing each subtask is located” so it is unclear if the IP address is an IP address of the host or IP address of each near-memory computing module. As per claim 8: Lines 8-11 recite “processing, by a plurality of near-memory computing modules implementing different operations types, a corresponding subtask according to the operation type implemented by a respective near-memory computing module receiving the package to modify the package” but it is unclear what is modifying the package. Claims 2-7 and 9-13 are dependent claims of claims 1 and 8 and fail to resolve the deficiencies of claims 1 and 8, so they are rejected for the same reasons. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (abstract idea) without significantly more. As per claim 1, in step 1 of the 101 analysis, the examiner has determined that the claim is directed to a platform. Therefore, the claim is directed to one of the four statutory categories of invention. In step 2A prong 1 of the 101 analysis, the examiner has determined that the claim recites a judicial exception. Specifically, the limitations “configured to initiate a calculation task and decompose the calculation task into a plurality of ordered subtasks according to a network topology information table stored in the processor”, “generate a package based on the decomposed plurality of ordered subtasks”, “to: implement different operation types”, “to complete one or more of the plurality of ordered subtasks according to respective operation types of the different operation types that the plurality of near-memory computing modules implement”, “process a corresponding subtask of the package to modify the package”, “for processing a subsequent subtask”, and “after processing a last subtask of the plurality of ordered subtasks” recite mental processes. Humans are able to perform calculations mentally and humans are able to mentally split a task into a plurality of ordered subtasks based on information from a table. Humans can mentally group the decomposed plurality of ordered subtasks to form a package. Implementing different operation types can be performed mentally since humans can mentally perform mathematical operations. Humans can mentally perform the subtasks since the subtasks are part of a calculation task and humans can perform mental computations. In step 2A prong 2 of the 101 analysis, the examiner has determined that the additional elements, alone or in combination do not integrate the judicial exceptions into a practical application for the following rationale: The limitations "an AI computing platform, comprising at least one computing component each computing component comprising: a processor", "a plurality of near-memory computing modules, the plurality of near-memory computing modules connecting in pairs with the processor, and the plurality of near-memory computing modules connecting in pairs with each other, wherein the plurality of near-memory computing modules are each configured", and “the plurality of near-memory computing modules being configured” apply judicial exceptions on a generic computer. "Alappat 's rationale that an otherwise ineligible algorithm or software could be made patent-eligible by merely adding a generic computer to the claim was superseded by the Supreme Court's Bilski and Alice Corp. decisions" so therefore applying judicial exceptions on an AI computing platform, processor, and plurality of near-memory computing modules which are generic computers does not integrate the judicial exceptions into a practical application (MPEP 2106.05(b)). The limitations "receive the package either initially from the processor or as modified by another near-memory computing module of the plurality of near-memory computing modules" and "transmit the package as modified directly to either (i) a subsequent near-memory computing module…or (ii) the processor" represent insignificant, extra-solution activities. The term "extra-solution activity" can be understood as "activities incidental to the primary process or product that are merely a nominal or tangential addition to the claim" (MPEP 2106.05(g)). The examiner has determined that the limitations "receive the package either initially from the processor or as modified by another near-memory computing module of the plurality of near-memory computing modules" and " transmit the package as modified directly to either (i) a subsequent near-memory computing module…or (ii) the processor" are directed to mere data gathering activities which is a category of insignificant extra-solution activities (MPEP 2106.05(g)). In step 2B of the 101 analysis, the examiner has determined that the additional elements, alone or in combination do not recite significantly more than the abstract ideas identified above for the following rationale: The limitations "an AI computing platform, comprising at least one computing component each computing component comprising: a processor", "a plurality of near-memory computing modules, the plurality of near-memory computing modules connecting in pairs with the processor, and the plurality of near-memory computing modules connecting in pairs with each other, wherein the plurality of near-memory computing modules are each configured", and “the plurality of near-memory computing modules being configured” apply judicial exceptions on a generic computer and therefore do not provide significantly more. The limitations "receive the package either initially from the processor or as modified by another near-memory computing module of the plurality of near-memory computing modules" and "transmit the package as modified directly to either (i) a subsequent near-memory computing module…or (ii) the processor" represent insignificant, extra-solution activities. The limitations "receive the package either initially from the processor or as modified by another near-memory computing module of the plurality of near-memory computing modules" and " transmit the package as modified directly to either (i) a subsequent near-memory computing module…or (ii) the processor" are well-understood, routine, or conventional because they are directed to "receiving or transmitting data" (MPEP 2106.05(d)). These are additional elements that the courts have recognized as well understood, routine, or conventional (MPEP 2106.05(d)). The citation of court cases in the MPEP meets the Berkheimer evidentiary burden since citation of a court case in the MPEP is one of the 4 types of evidentiary support that can be used to prove that the additional elements are well-understood, routine, or conventional (see 125 USPQ2d 1649 Berkheimer v. HP, Inc.). Thus, the limitations do not amount to significantly more than the abstract idea. As per claim 8, in step 1 of the 101 analysis, the examiner has determined that the claim is directed to a method. Therefore, the claim is directed to one of the four statutory categories of invention. In step 2A prong 1 of the 101 analysis, the examiner has determined that the claim recites a judicial exception. Specifically, the limitations “initiating a calculation task and decomposing the calculation task into a plurality of ordered subtasks according to a network topology information table stored in the processor”, “generating a package according to the decomposed plurality of ordered subtasks, “that processes a first subtask”, “processing a corresponding subtask according to the operation type implemented by a respective near-memory computing module”, “implementing different operations types”, “that processes a next subtask until completing all the subtasks”, and “that processes the last subtask” recite mental processes. Humans are able to perform calculations mentally and humans are able to mentally split a task into a plurality of ordered subtasks based on information from a table. Generating a package is a mental process since humans can aggregate pieces of information together to form a package. Humans can mentally perform the subtasks since the subtasks are part of a calculation task and humans can perform mental computations. In step 2A prong 2 of the 101 analysis, the examiner has determined that the additional elements, alone or in combination do not integrate the judicial exceptions into a practical application for the following rationale: The limitations “AI computing”, “by a processor” and “by a plurality of near-memory computing modules” apply judicial exceptions on a generic computer or are mere instructions to apply an abstract idea on a generic computer. "Alappat 's rationale that an otherwise ineligible algorithm or software could be made patent-eligible by merely adding a generic computer to the claim was superseded by the Supreme Court's Bilski and Alice Corp. decisions" so therefore applying judicial exceptions on a processor and the near-memory computing modules which are generic computers and performing AI computing which involves mere instructions to apply an abstract idea on a generic computer does not integrate the judicial exceptions into a practical application (MPEP 2106.05(b)). The limitations "routing the package to a near-memory computing module", “receiving the package to modify the package”, "transmitting the package as modified directly to a subsequent near-memory computing module", and “then routing the package to the processor connecting to a near-memory computing module” represent insignificant, extra-solution activities. The term "extra-solution activity" can be understood as "activities incidental to the primary process or product that are merely a nominal or tangential addition to the claim" (MPEP 2106.05(g)). The examiner has determined that the limitations "routing the package to a near-memory computing module", “receiving the package to modify the package”, "transmitting the package as modified directly to a subsequent near-memory computing module", and “then routing the package to the processor connecting to a near-memory computing module” are directed to mere data gathering activities which is a category of insignificant extra-solution activities (MPEP 2106.05(g)). In step 2B of the 101 analysis, the examiner has determined that the additional elements, alone or in combination do not recite significantly more than the abstract ideas identified above for the following rationale: The limitations “by a processor” and “by a plurality of near-memory computing modules” apply judicial exceptions on a generic computer and the limitation “AI computing” recites mere instructions to apply an abstract idea on a generic computer, so these limitations do not provide significantly more. The limitations "routing the package to a near-memory computing module", “receiving the package to modify the package”, "transmitting the package as modified directly to a subsequent near-memory computing module", and “then routing the package to the processor connecting to a near-memory computing module” represent insignificant, extra-solution activities. The "routing the package to a near-memory computing module", “receiving the package to modify the package”, "transmitting the package as modified directly to a subsequent near-memory computing module", and “then routing the package to the processor connecting to a near-memory computing module” are well-understood, routine, or conventional because they are directed to "receiving or transmitting data" (MPEP 2106.05(d)). These are additional elements that the courts have recognized as well understood, routine, or conventional (MPEP 2106.05(d)). The citation of court cases in the MPEP meets the Berkheimer evidentiary burden since citation of a court case in the MPEP is one of the 4 types of evidentiary support that can be used to prove that the additional elements are well-understood, routine, or conventional (see 125 USPQ2d 1649 Berkheimer v. HP, Inc.). Thus, the limitations do not amount to significantly more than the abstract idea. As per claim 2 (and similarly for claim 10), it recites attributes of the technological environment that neither integrate the judicial exceptions into a practical application nor recite significantly more. As per claim 3, it recites generic computing components, attributes of the technological environment, and an insignificant extra solution activity that is well understood, routine, or conventional because it is directed to "receiving or transmitting data" (MPEP 2106.05(d)). Therefore, the additional elements neither integrate the judicial exceptions into a practical application nor recite significantly more. As per claim 4, it recites generic computing components, and an insignificant extra solution activity that is well understood, routine, or conventional because it is directed to "receiving or transmitting data" (MPEP 2106.05(d)). Therefore, the additional elements neither integrate the judicial exceptions into a practical application nor recite significantly more. As per claim 5 (and similarly for claim 12), it recites attributes of the technological environment that neither integrate the judicial exceptions into a practical application nor recite significantly more. As per claim 6 (and similarly for claim 13), it recites mental processes and generic computing components that neither integrate the judicial exceptions into a practical application nor recite significantly more. As per claim 7, it recites generic computing components that neither integrate the judicial exceptions into a practical application nor recite significantly more. As per claim 9, it recites generic computing components that neither integrate the judicial exceptions into a practical application nor recite significantly more. As per claim 11, it recites attributes of the technological environment that neither integrate the judicial exceptions into a practical application nor recite significantly more. As per claim 14, it is an AI cloud computing system of claim 1, so it is rejected for similar reasons. Claim 14 recites additional generic computing components that neither integrate the judicial exceptions into a practical application nor recite significantly more. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1, 6-9, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Paul et al. (US 20250156356 A1 hereinafter Paul), in view of Zhong et al. (US 20230124520 A1 hereinafter Zhong), in view of Bayat (US 20210295145 A1), and further in view of Manipatruni et al. (US 20200242459 A1 hereinafter Manipatruni) As per claim 1, Paul teaches the invention substantially as claimed including an AI computing platform, comprising at least one computing component each computing component comprising ([0060] the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads; [0019] In some examples, the one or more workloads to be executed by host CPU 111 may include, but are not limited to AI workloads.): a processor, configured to: initiate a calculation task and decompose the calculation task into a plurality of subtasks ([0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads; [0055] FIG. 9 illustrates an example software flow 900. In some examples, software flow 900 shows an initialization phase and run-time phase for offloading memory bound kernels to NMC circuitry (e.g., NMC circuitry 122) in an I/O switch (e.g., I/O switch 120). For these examples, application programming interface (API) calls running on a host CPU (e.g., host CPU 111) may be used to offload the memory bound kernels to the NMC circuitry; [0019] In some examples, the one or more workloads to be executed by host CPU 111 may include, but are not limited to AI workloads.); and generate a package based on the decomposed plurality of subtasks (Fig. 3; [0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads. [0061] Moving to block 930, for each B/P request assigned per thread; the application may aggregate addresses targeted for attached memory device ‘i’ into a packet. ); and a plurality of near-memory computing modules, the plurality of near-memory computing modules connecting in pairs with the processor, and the plurality of near-memory computing modules connecting in pairs with each other, wherein the plurality of near-memory computing modules are each configured to (Figs. 1 and 8; [0020] NMC circuitry 122-1/2, I/O transaction logic 124-1/2 and memory controllers 126-1/2 may be configured to facilitate a gathering and aggregation of data associated with memory-bound AI workloads to be executed by host CPU 111; [0053] According to some examples, if NMC circuitry 822-1 or 822-2 are configured as ASICs, NMC circuitry 822-1 or 822-2 may include an advanced extensible interface (AXI) master interface to couple with I/O transaction logic 824-1 to 824-N and corresponding memory controllers 826-1 or 826-2; [0017] The programmable compute circuitry is hereinafter referred to a “near-memory compute (NMC) circuitry”): receive the package either initially from the processor ([0037] incoming command packets (e.g., using command packet format 300) from a host CPU; [0031] a host CPU that generated a command packet to be sent to NMC circuitry), process a corresponding subtask of the package to modify the package ([0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads. [0061] Moving to block 930, for each B/P request assigned per thread; the application may aggregate addresses targeted for attached memory device ‘i’ into a packet; [0034] FIG. 4 illustrates an example result packet format 400. In some examples, result packet format 400 may include information associated with operations completed by NMC circuitry (e.g., NMC circuitry 122-1) responsive to a command packet sent from a host CPU (e.g., host CPU 111).), and transmit the package as modified directly to (ii) the processor after processing a subtask of the plurality of subtasks ([0034] FIG. 4 illustrates an example result packet format 400. In some examples, result packet format 400 may include information associated with operations completed by NMC circuitry (e.g., NMC circuitry 122-1) responsive to a command packet sent from a host CPU (e.g., host CPU 111); [0035] If communications between the NMC circuitry and the host CPU can utilize CXL.cache protocols according to the CXL specification, the NMC circuitry may be capable of writing results to a host memory space of the host CPU (separate from memory included in attached memory devices) and provide a notification of this writing of results to the host CPU; [0063] Moving to block 940, NMC circuitry receives and processes the request in the command packet from the host CPU; [0064] Moving to block 945, once the request is complete, NMC circuitry may perform a direct memory access to system memory of the host CPU to send results+status into host system memory; [0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads. [0061] Moving to block 930, for each B/P request assigned per thread; the application may aggregate addresses targeted for attached memory device ‘i’ into a packet;). Paul fails to teach a processor, configured to: initiate a calculation task and decompose the calculation task into a plurality of ordered subtasks according to a network topology information table stored in the processor, and generate a package based on the decomposed plurality of ordered subtasks, wherein the plurality of near-memory computing modules are each configured to: implement different operation types, the plurality of near-memory computing modules being configured to complete one or more of the plurality of ordered subtasks according to respective operation types of the different operation types that the plurality of near-memory computing modules implement, receive the package as modified by another near-memory computing module of the plurality of near-memory computing modules, and transmit the package as modified directly to either (i) a subsequent near-memory computing module for processing a subsequent subtask, or (ii) the processor after processing a last subtask of the plurality of ordered subtasks. However, Zhong teaches a processor, configured to initiate a calculation task and decompose the calculation task into a plurality of ordered subtasks according to a network topology information table stored in the processor, and generate a package based on the decomposed plurality of ordered subtasks (Fig. 3; [0052] The central processing unit is configured to: obtain a data processing task, perform division to obtain subtasks, and schedule each dedicated processor; [0049] the following embodiment may be used as a general-purpose near data computing system to support execution of data processing tasks generated by various applications such as a database application, a big data application, and an AI application, to improve flexibility of near data computing. In addition, the data processing task is divided into a plurality of subtasks; [0019] In this optional manner, in one aspect, because the topology diagram records an execution sequence of the subtasks, the central processing unit does not need to recalculate the execution sequence of the subtasks, and can directly perform scheduling according to the execution sequence recorded in the topology diagram; [0153] The topology diagram is used to indicate the plurality of subtasks and the execution sequence of different subtasks; [0154] For example, as shown in FIG. 3, a topology diagram is a DAG 204; [0155] The DAG 204 output by the parser 201 is sent to an executor (Executor) 202 included in the storage device.), wherein the plurality of near-memory computing modules are each configured to: implement different operation types, the plurality of near-memory computing modules being configured to complete one or more of the plurality of ordered subtasks according to respective operation types of the different operation types that the plurality of near-memory computing modules implement ([0012] Different dedicated processors are good at processing different tasks. Therefore, in this optional manner, whether a computing feature of a subtask matches a dedicated processor is considered, and the subtask is scheduled to a dedicated processor matching the computing feature of the subtask for execution, so that the dedicated processor can process a task that the dedicated processor is good at processing; [0016] Different dedicated processors are suitable for processing different types of data. For example, a GPU is suitable for processing an image, and some dedicated codec processors are suitable for processing videos; [0097] (1) A subtask suitable for being allocated to the GPU. [0102] (2) A subtask suitable for being allocated to the NPU. [0103] The NPU is specially designed for AI. The NPU includes modules required for AI computing, such as multiplication and addition, activation function, two-dimensional data calculation, and decompression; [0116] if the data is located in the SSD, the central processing unit allocates the subtask to the DPU in the SSD, to schedule the DPU in the SSD to execute the subtask. If the data is located in the DIMM, the central processing unit allocates the subtask to the DPU in the DIMM, to schedule the DPU in the DIMM to execute the subtask; [0106] (4) A subtask suitable for being allocated to a processor of the DIMM. [0107] For example, the DIMM includes the DPU and a DRAM chip (DRAM chips). The DPU can quickly access the DRAM and process data stored in the DRAM, to complete a task nearby. When this feature of the DIMM is considered, in some embodiments, when data that needs to be processed in a task is located in the DRAM in the DIMM, because the DPU and the DRAM are integrated in the same DIMM, the DPU has an advantage of being closest to the data or having highest data affinity. Accordingly, the task can be allocated to the DPU of the DIMM. The DPU in the DIMM is scheduled to process data stored in the DIMM, so that processing in memory (Processing in Memory) or near memory computing (Near Memory Computing) can be implemented). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul with the teachings of Zhong to improve processing efficiency (see Zhong [0121] Different dedicated processors are good at processing different tasks. Therefore, when Scheduling Policy 2 is used, whether a computing feature of a subtask matches a dedicated processor is considered, and the subtask is scheduled to a dedicated processor matching the computing feature of the subtask for execution, so that the dedicated processor can process a task that the dedicated processor is good at processing. In this way, a performance advantage of the dedicated processor is utilized, and data processing efficiency is improved.). Paul and Zhong fail to teach receive the package as modified by another near-memory computing module of the plurality of near-memory computing modules, and transmit the package as modified directly to either (i) a subsequent near-memory computing module for processing a subsequent subtask, or (ii) the processor after processing a last subtask of the plurality of ordered subtasks. However, Bayat teaches receive the package as modified by another near-memory computing module of the plurality of near-memory computing modules, and transmit the package as modified directly to either (i) a subsequent near-memory computing module for processing a subsequent subtask ([0054] Any digital accelerator (Di) in the plurality of digital accelerators 103 or any IMC accelerator (Ai) in the plurality of IMC accelerators 102 may receive inputs either from an internal memory, such as central memory 106 or an external memory (not shown), or from the processor/controller 101, or directly from an internal memory or buffer of the Di or Ai accelerators and send back the results of the computation either to the internal or external memory, or to the processor/controller 101, or directly to any of the Di or Ai accelerators; [0002] In-Memory Computing (IMC) and digital accelerators to be used for the acceleration of AI algorithms; [0052] In some embodiments, the results produced by one accelerator may be directly routed to the input of another accelerator; [0047] These accelerators may work together to implement the same layer of the network or they may be pipelined to implement different layers of a network.). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul and Zhong with the teachings of Bayat to save power (see Bayat [0052] In some embodiments, the results produced by one accelerator may be directly routed to the input of another accelerator. Skipping the transfer of results to memory may result in further power saving.). Paul, Zhong, and Bayat fail to teach transmit the package as modified directly to the processor after processing a last subtask of the plurality of ordered subtasks. However, Manipatruni teaches transmit the package as modified directly to the processor after processing a last subtask of the plurality of ordered subtasks ([0019] The analog in-memory AI processor 140 comprises one or more NN layers, which may be configured as convolutional neural network (CNN) layers and/or full connected (e.g., all-to-all) NN layers, in any combination, as will be described in greater detail below. The results of the NN processing (e.g., an image classification or recognition) are provided back to the CPU 110 as outputs 150; [0030] AI processor 140 is shown to implement a multi-layer layer CNN comprising N CNN layers 140a. The multi-layer layer CNN is mapped to an in-memory data path, wherein the output of each layer is coupled to the next layer through a digital access circuit 210. In some embodiments, AI processor 140 may also include one or more fully connected (e.g., all-to-all) layers 140b coupled in series with the CNN layers 140a to implement a neural network of any desired complexity; bottom table on page 3 PNG media_image1.png 102 534 media_image1.png Greyscale ). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul, Zhong, and Bayat with the teachings of Manipatruni to improve efficiency (see Manipatruni [0014] To this end, the disclosed techniques for implementing a hybrid processor, with an extended AI instruction set to perform analog in-memory processing, provide for reduced latency and improved efficiency in AI applications). As per claim 6, Paul, Zhong, Bayat, and Manipatruni teach the AI computing platform according to claim 1. Zhong teaches wherein the processor is further configured to implement one or more operation types different from the plurality of near-memory computing modules, wherein the processor and the plurality of near-memory computing modules complete one or more of the plurality of ordered subtasks according to the operation types the processor and the plurality of near-memory computing modules each implement ([0120] In some embodiments, the computation feature of the subtask includes a type of an algorithm that is required for executing the subtask. An implementation of Scheduling Policy 2 includes: the central processing unit selects, from the plurality of dedicated processors based on the type of the algorithm that is required for executing the subtask, a dedicated processor suitable for running the algorithm of the type. For example, the subtask is to perform facial recognition. A neural network algorithm needs to be used when facial recognition is performed, and an NPU that executes the neural network algorithm is just configured for the storage device. In this case, the central processing unit selects the NPU, and schedules the NPU to perform facial recognition by using the neural network algorithm. For another example, the subtask is to perform image compression, and a dedicated chip for image compression is just configured in the storage device. In this case, the central processing unit schedules the dedicated chip to perform image compression; [0121] Different dedicated processors are good at processing different tasks. Therefore, when Scheduling Policy 2 is used, whether a computing feature of a subtask matches a dedicated processor is considered, and the subtask is scheduled to a dedicated processor matching the computing feature of the subtask for execution, so that the dedicated processor can process a task that the dedicated processor is good at processing; [0052] The central processing unit is configured to: obtain a data processing task, perform division to obtain subtasks, and schedule each dedicated processor; [0154] A dependency relationship and an execution sequence that are of the functions and that are recorded by the DAG 204 in FIG. 3; [0116] if the data is located in the SSD, the central processing unit allocates the subtask to the DPU in the SSD, to schedule the DPU in the SSD to execute the subtask. If the data is located in the DIMM, the central processing unit allocates the subtask to the DPU in the DIMM, to schedule the DPU in the DIMM to execute the subtask; [0106] (4) A subtask suitable for being allocated to a processor of the DIMM. [0107] For example, the DIMM includes the DPU and a DRAM chip (DRAM chips). The DPU can quickly access the DRAM and process data stored in the DRAM, to complete a task nearby. When this feature of the DIMM is considered, in some embodiments, when data that needs to be processed in a task is located in the DRAM in the DIMM, because the DPU and the DRAM are integrated in the same DIMM, the DPU has an advantage of being closest to the data or having highest data affinity. Accordingly, the task can be allocated to the DPU of the DIMM. The DPU in the DIMM is scheduled to process data stored in the DIMM, so that processing in memory (Processing in Memory) or near memory computing (Near Memory Computing) can be implemented). As per claim 7, Paul, Zhong, Bayat, and Manipatruni teach the AI computing platform according to claim 1. Zhong teaches wherein the processors of each computing component are connected via a bus ([0060] The central processing unit communicates with the dedicated processor in a plurality of manners. In some embodiments, the central processing unit is connected to the dedicated processor through a high-speed Internet network, and the central processing unit communicates with the dedicated processor through the high-speed Internet network. The high-speed Internet network is, for example, a peripheral component interconnect express, PCIe (peripheral component interconnect express, PCIe) bus). As per claim 8, Paul teaches an AI computing method, comprising: initiating, by a processor, a calculation task and decomposing the calculation task into a plurality of subtasks ([0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads; [0055] FIG. 9 illustrates an example software flow 900. In some examples, software flow 900 shows an initialization phase and run-time phase for offloading memory bound kernels to NMC circuitry (e.g., NMC circuitry 122) in an I/O switch (e.g., I/O switch 120). For these examples, application programming interface (API) calls running on a host CPU (e.g., host CPU 111) may be used to offload the memory bound kernels to the NMC circuitry); generating, by the processor, a package according to the decomposed plurality of subtasks, and routing the package to a near-memory computing module that processes the first subtask; and processing, by a plurality of near-memory computing modules, a corresponding subtask implemented by a respective near-memory computing module receiving the package to modify the package, and transmitting the package as modified ([0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads; [0061] Moving to block 930, for each B/P request assigned per thread; the application may aggregate addresses targeted for attached memory device ‘i’ into a packet; [0046] According to some examples, scheme 700 begins at process 7.1 where a packet may be received from a host. The packet, for example, may be a request command packet in example command packet format 300 and may indicate a type of memory-bound AI workload to be accelerated by NMC circuitry 122; [0062] Moving to block 935, the application may for each B/P request/thread; for each attached device i=1 to N, cause a command packet to be enqueued into a host CPU work queue (WQ). In some examples, the host CPU WQ may be used for enqueuing command packets to be sent to the NMC circuitry; [0063] Moving to block 940, NMC circuitry receives and processes the request in the command packet from the host CPU; [0064] Moving to block 945, once the request is complete, NMC circuitry may perform a direct memory access to system memory of the host CPU to send results+status into host system memory.). Paul fails to teach initiating, by a processor, a calculation task and decomposing the calculation task into a plurality of ordered subtasks according to a network topology information table stored in the processor; the decomposed plurality of ordered subtasks, processing, by a plurality of near-memory computing modules implementing different operations types, a corresponding subtask according to the operation type implemented by a respective near-memory computing module receiving the package to modify the package and transmitting the package as modified directly to a subsequent near-memory computing module that processes a next subtask until completing all the subtasks, and then routing the package to the processor connecting to a near-memory computing module that processes the last subtask. However, Zhong teaches initiating, by a processor, a calculation task and decomposing the calculation task into a plurality of ordered subtasks according to a network topology information table stored in the processor; the decomposed plurality of ordered subtasks, processing, by a plurality of near-memory computing modules implementing different operations types, a corresponding subtask according to the operation type implemented by a respective near-memory computing module receiving the package to modify the package and transmitting the package to a subsequent near-memory computing module that processes a next subtask until completing all the subtasks ([0052] The central processing unit is configured to: obtain a data processing task, perform division to obtain subtasks, and schedule each dedicated processor; [0154] A dependency relationship and an execution sequence that are of the functions and that are recorded by the DAG 204 in FIG. 3 are as follows: The function d and the function e are first executed. The function b and the function c depend on the function e. Accordingly, the function b and the function c are executed after the function e is executed. The function a depends on the function b, the function c, and the function d. Accordingly, the function a is executed at last. According to the DAG 204, first, the central processing unit indicates the DPU in the DIMM to execute the function e, and indicates the DPU in the SSD to execute the function d. After the function e is executed, the central processing unit indicates the NPU to execute the function b, and indicates the GPU to execute the function c. After the function b, the function c, and the function d are all executed, the central processing unit executes the function a; [0019] In this optional manner, in one aspect, because the topology diagram records an execution sequence of the subtasks, the central processing unit does not need to recalculate the execution sequence of the subtasks, and can directly perform scheduling according to the execution sequence recorded in the topology diagram; [0154] For example, as shown in FIG. 3, a topology diagram is a DAG 204; [0155] The DAG 204 output by the parser 201 is sent to an executor (Executor) 202 included in the storage device; [0116] if the data is located in the SSD, the central processing unit allocates the subtask to the DPU in the SSD, to schedule the DPU in the SSD to execute the subtask. If the data is located in the DIMM, the central processing unit allocates the subtask to the DPU in the DIMM, to schedule the DPU in the DIMM to execute the subtask; [0106] (4) A subtask suitable for being allocated to a processor of the DIMM. [0107] For example, the DIMM includes the DPU and a DRAM chip (DRAM chips). The DPU can quickly access the DRAM and process data stored in the DRAM, to complete a task nearby. When this feature of the DIMM is considered, in some embodiments, when data that needs to be processed in a task is located in the DRAM in the DIMM, because the DPU and the DRAM are integrated in the same DIMM, the DPU has an advantage of being closest to the data or having highest data affinity. Accordingly, the task can be allocated to the DPU of the DIMM. The DPU in the DIMM is scheduled to process data stored in the DIMM, so that processing in memory (Processing in Memory) or near memory computing (Near Memory Computing) can be implemented). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul with the teachings of Zhong to improve processing efficiency (see Zhong [0121] Different dedicated processors are good at processing different tasks. Therefore, when Scheduling Policy 2 is used, whether a computing feature of a subtask matches a dedicated processor is considered, and the subtask is scheduled to a dedicated processor matching the computing feature of the subtask for execution, so that the dedicated processor can process a task that the dedicated processor is good at processing. In this way, a performance advantage of the dedicated processor is utilized, and data processing efficiency is improved.). Paul and Zhong fail to teach transmitting the package as modified directly to a subsequent near-memory computing module that processes a next subtask, and then routing the package to the processor connecting to the near-memory computing module that processes the last subtask. However, Bayat teaches transmitting the package as modified directly to a subsequent near-memory computing module that processes a next subtask ([0054] Any digital accelerator (Di) in the plurality of digital accelerators 103 or any IMC accelerator (Ai) in the plurality of IMC accelerators 102 may receive inputs either from an internal memory, such as central memory 106 or an external memory (not shown), or from the processor/controller 101, or directly from an internal memory or buffer of the Di or Ai accelerators and send back the results of the computation either to the internal or external memory, or to the processor/controller 101, or directly to any of the Di or Ai accelerators; [0002] In-Memory Computing (IMC) and digital accelerators to be used for the acceleration of AI algorithms; [0052] In some embodiments, the results produced by one accelerator may be directly routed to the input of another accelerator; [0047] These accelerators may work together to implement the same layer of the network or they may be pipelined to implement different layers of a network.). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul and Zhong with the teachings of Bayat to save power (see Bayat [0052] In some embodiments, the results produced by one accelerator may be directly routed to the input of another accelerator. Skipping the transfer of results to memory may result in further power saving.). Paul, Zhong, and Bayat fail to teach routing the package to the processor connecting to the near-memory computing module that processes the last subtask. However, Manipatruni teaches routing the package to the processor connecting to the near-memory computing module that processes the last subtask ([0019] The analog in-memory AI processor 140 comprises one or more NN layers, which may be configured as convolutional neural network (CNN) layers and/or full connected (e.g., all-to-all) NN layers, in any combination, as will be described in greater detail below. The results of the NN processing (e.g., an image classification or recognition) are provided back to the CPU 110 as outputs 150; [0030] AI processor 140 is shown to implement a multi-layer layer CNN comprising N CNN layers 140a. The multi-layer layer CNN is mapped to an in-memory data path, wherein the output of each layer is coupled to the next layer through a digital access circuit 210. In some embodiments, AI processor 140 may also include one or more fully connected (e.g., all-to-all) layers 140b coupled in series with the CNN layers 140a to implement a neural network of any desired complexity; bottom table on page 3 PNG media_image1.png 102 534 media_image1.png Greyscale ). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul, Zhong, and Bayat with the teachings of Manipatruni to improve efficiency (see Manipatruni [0014] To this end, the disclosed techniques for implementing a hybrid processor, with an extended AI instruction set to perform analog in-memory processing, provide for reduced latency and improved efficiency in AI applications). As per claim 9, Paul, Zhong, Bayat, and Manipatruni teach the AI computing method according to claim 8. Paul teaches wherein the plurality of near-memory computing modules connect in pairs with the processor, and the plurality of near-memory computing modules connect in pairs with each other (Figs. 1 and 8; [0020] NMC circuitry 122-1/2, I/O transaction logic 124-1/2 and memory controllers 126-1/2 may be configured to facilitate a gathering and aggregation of data associated with memory-bound AI workloads to be executed by host CPU 111; [0053] According to some examples, if NMC circuitry 822-1 or 822-2 are configured as ASICs, NMC circuitry 822-1 or 822-2 may include an advanced extensible interface (AXI) master interface to couple with I/O transaction logic 824-1 to 824-N and corresponding memory controllers 826-1 or 826-2; [0017] The programmable compute circuitry is hereinafter referred to a “near-memory compute (NMC) circuitry”). As per claim 13, it is an AI computing method claim of claim 6, so it is rejected for similar reasons. Claims 2 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Paul, Zhong, Bayat, and Manipatruni, as applied to claims 1 and 8 above, in view of Sun et al. (US 11119754 B1 hereinafter Sun). As per claim 2, Paul, Zhong, Bayat, and Manipatruni teach the AI computing platform according to claim 1. Zhong teaches wherein the network topology information table comprises: a near-memory computing module index, one or more operation types supported by each near-memory computing module, a load rate, a number of adjacent near-memory computing modules, and operation types supported by the adjacent near-memory computing modules ([0154] For example, as shown in FIG. 3, a topology diagram is a DAG 204; [0154] A dependency relationship and an execution sequence that are of the functions and that are recorded by the DAG 204 in FIG. 3 are as follows: The function d and the function e are first executed. The function b and the function c depend on the function e. Accordingly, the function b and the function c are executed after the function e is executed. The function a depends on the function b, the function c, and the function d. Accordingly, the function a is executed at last. According to the DAG 204, first, the central processing unit indicates the DPU in the DIMM to execute the function e, and indicates the DPU in the SSD to execute the function d. After the function e is executed, the central processing unit indicates the NPU to execute the function b, and indicates the GPU to execute the function c. After the function b, the function c, and the function d are all executed, the central processing unit executes the function a; [0012] Different dedicated processors are good at processing different tasks. Therefore, in this optional manner, whether a computing feature of a subtask matches a dedicated processor is considered, and the subtask is scheduled to a dedicated processor matching the computing feature of the subtask for execution, so that the dedicated processor can process a task that the dedicated processor is good at processing; [0016] Different dedicated processors are suitable for processing different types of data. For example, a GPU is suitable for processing an image, and some dedicated codec processors are suitable for processing videos. Therefore, in this optional manner, whether a type of to-be-processed data in a subtask matches a dedicated processor is considered, and the subtask is scheduled to a dedicated processor matching a dataset type of the subtask for execution, so that the dedicated processor can process data that is suitable for the dedicated processor to process; [0120] For example, the subtask is to perform facial recognition. A neural network algorithm needs to be used when facial recognition is performed, and an NPU that executes the neural network algorithm is just configured; [0105] The DPU can run efficiently on a network data packet, a storage request, or an analysis request; [0151] The central processing unit loads the data to a memory of the selected dedicated processor, and schedules the selected dedicated processor to execute the subtask.). Paul, Zhong, Bayat, and Manipatruni fail to teach an IP address of a host where the processor is located. However, Sun teaches an IP address of a host where the processor is located (Col. 9 lines 11-12 The host CPU 608 can have an independent IP address). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul, Zhong, Bayat, and Manipatruni with the teachings of Sun to route a packet to a proper location (see Sun Col. 10 lines 23-30 The router 716 reads address information in a received packet and determines the packet's destination. If the router decides that a different data center contains a host server computer, then the packet is forwarded to that data center. If the packet is addressed to a host in the data center 710a, then it is passed to a network address translator (NAT) 718 that converts the packet's public IP address to a private IP address). As per claim 10, it is an AI computing method claim of claim 1, so it is rejected for similar reasons. Claims 3, 4, 5, 11, and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Paul, Zhong, Bayat, and Manipatruni, as applied to claim 1 above, in view of Labonté (US 20210021522 A1). As per claim 3, Paul, Zhong, Bayat, and Manipatruni teach the AI computing platform according to claim 1. Paul teaches wherein the processor is further configured to transmit the package to a near-memory computing module that processes a first subtask, wherein the package comprises: near-memory computing modules required to process the package, a payload length, and a payload (Figs. 3 and 4; [0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads; [0062] Moving to block 935, the application may for each B/P request/thread; for each attached device i=1 to N, cause a command packet to be enqueued into a host CPU work queue (WQ). In some examples, the host CPU WQ may be used for enqueuing command packets to be sent to the NMC circuitry; [0046] According to some examples, scheme 700 begins at process 7.1 where a packet may be received from a host. The packet, for example, may be a request command packet in example command packet format 300 and may indicate a type of memory-bound AI workload to be accelerated by NMC circuitry 122; [0030] command packet format 300 may be capable of holding 1 kilobyte (KB) of information; [0052] thus allow a simple inspection of a neighbor ID/offset to allow a host CPU to send separate command packets over one of I/O links 815-1 to 815-N and through corresponding I/O transaction logic 824-1 to 824-N to reach a destination NMC circuitry 822-1 or 822-2; [0055] The run-time phase may occur when the host CPU sends command packets (e.g., using command packet format 300) that include memory addresses of the memory devices and the NMC circuitry responds back). Additionally, Zhong teaches transmit the package to a near-memory computing module that processes a first subtask, wherein the package comprises: a number of near-memory computing modules required to process the package, a list of near-memory computing modules required to process the package (Fig, 3; [0154] A dependency relationship and an execution sequence that are of the functions and that are recorded by the DAG 204 in FIG. 3 are as follows: The function d and the function e are first executed. The function b and the function c depend on the function e. Accordingly, the function b and the function c are executed after the function e is executed. The function a depends on the function b, the function c, and the function d. Accordingly, the function a is executed at last. According to the DAG 204, first, the central processing unit indicates the DPU in the DIMM to execute the function e, and indicates the DPU in the SSD to execute the function d. After the function e is executed, the central processing unit indicates the NPU to execute the function b, and indicates the GPU to execute the function c. After the function b, the function c, and the function d are all executed, the central processing unit executes the function a; [0012] Different dedicated processors are good at processing different tasks. Therefore, in this optional manner, whether a computing feature of a subtask matches a dedicated processor is considered, and the subtask is scheduled to a dedicated processor matching the computing feature of the subtask for execution, so that the dedicated processor can process a task that the dedicated processor is good at processing; [0155] to generate the DAG 204, so that the DAG 204 is used to represent each subtask in the NDP task. The DAG 204 output by the parser 201 is sent to an executor (Executor) 202 included in the storage device. The executor 202 sequentially schedules, based on the DAG 204, steps or functions in the NDP task to corresponding dedicated processors for execution). Paul, Zhong, Bayat, and Manipatruni fail to teach wherein the package comprises: a header cyclic redundancy check. However, Labonté teaches wherein the package comprises: a header cyclic redundancy check ([0105] the data packet may include, but is not limited to, an assortment of header information, a data payload, and cyclic redundancy check (CRC)). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul, Zhong, Bayat, and Manipatruni with the teachings of Labonté since a cyclic redundancy check performs error checking and thus reduces errors. As per claim 4, Paul, Zhong, Bayat, Manipatruni, and Labonté teach the AI computing platform according to claim 3. Bayat teaches wherein each near-memory computing module is further configured to transmit the package as modified to the subsequent near-memory computing module according to the list of near-memory computing modules ([0054] Any digital accelerator (Di) in the plurality of digital accelerators 103 or any IMC accelerator (Ai) in the plurality of IMC accelerators 102 may receive inputs either from an internal memory, such as central memory 106 or an external memory (not shown), or from the processor/controller 101, or directly from an internal memory or buffer of the Di or Ai accelerators and send back the results of the computation either to the internal or external memory, or to the processor/controller 101, or directly to any of the Di or Ai accelerators; [0002] In-Memory Computing (IMC) and digital accelerators to be used for the acceleration of AI algorithms; [0052] In some embodiments, the results produced by one accelerator may be directly routed to the input of another accelerator; [0047] These accelerators may work together to implement the same layer of the network or they may be pipelined to implement different layers of a network.). As per claim 5, Paul, Zhong, Bayat, Manipatruni, and Labonté teach the AI computing platform according to claim 3. Zhong teaches wherein the list of near-memory computing modules comprises where each near-memory computing module sequentially processing each subtask is located, an index, one or more operation types supported, and a load rate of each near-memory computing module sequentially processing each subtask ([0154] A dependency relationship and an execution sequence that are of the functions and that are recorded by the DAG 204 in FIG. 3 are as follows: The function d and the function e are first executed. The function b and the function c depend on the function e. Accordingly, the function b and the function c are executed after the function e is executed. The function a depends on the function b, the function c, and the function d. Accordingly, the function a is executed at last. According to the DAG 204, first, the central processing unit indicates the DPU in the DIMM to execute the function e, and indicates the DPU in the SSD to execute the function d. After the function e is executed, the central processing unit indicates the NPU to execute the function b, and indicates the GPU to execute the function c. After the function b, the function c, and the function d are all executed, the central processing unit executes the function a; [0012] Different dedicated processors are good at processing different tasks. Therefore, in this optional manner, whether a computing feature of a subtask matches a dedicated processor is considered, and the subtask is scheduled to a dedicated processor matching the computing feature of the subtask for execution, so that the dedicated processor can process a task that the dedicated processor is good at processing; [0016] Different dedicated processors are suitable for processing different types of data. For example, a GPU is suitable for processing an image, and some dedicated codec processors are suitable for processing videos; [0120] For example, the subtask is to perform facial recognition. A neural network algorithm needs to be used when facial recognition is performed, and an NPU that executes the neural network algorithm is just configured; [0155] to generate the DAG 204, so that the DAG 204 is used to represent each subtask in the NDP task. The DAG 204 output by the parser 201 is sent to an executor (Executor) 202 included in the storage device. The executor 202 sequentially schedules, based on the DAG 204, steps or functions in the NDP task to corresponding dedicated processors for execution; [0151] The central processing unit loads the data to a memory of the selected dedicated processor, and schedules the selected dedicated processor to execute the subtask.). Additionally, Labonté teaches an IP address of a host ([0120] a network interface of the LBI host (e.g., a network device or another LBN) that connects to the given LBN, and/or IP address; [0130] the network device (904) is hosting a load balancing intelligence (LBI)). As per claim 11, Paul, Zhong, Bayat, and Manipatruni teach the AI computing method according to claim 8. Paul teaches wherein the package comprises: near-memory computing modules required to process the package, a payload length, and a payload (Figs. 3 and 4; [0062] Moving to block 935, the application may for each B/P request/thread; for each attached device i=1 to N, cause a command packet to be enqueued into a host CPU work queue (WQ). In some examples, the host CPU WQ may be used for enqueuing command packets to be sent to the NMC circuitry; [0046] According to some examples, scheme 700 begins at process 7.1 where a packet may be received from a host. The packet, for example, may be a request command packet in example command packet format 300 and may indicate a type of memory-bound AI workload to be accelerated by NMC circuitry 122; [0030] command packet format 300 may be capable of holding 1 kilobyte (KB) of information; [0052] thus allow a simple inspection of a neighbor ID/offset to allow a host CPU to send separate command packets over one of I/O links 815-1 to 815-N and through corresponding I/O transaction logic 824-1 to 824-N to reach a destination NMC circuitry 822-1 or 822-2; [0055] The run-time phase may occur when the host CPU sends command packets (e.g., using command packet format 300) that include memory addresses of the memory devices and the NMC circuitry responds back). Additionally, Zhong teaches wherein the package comprises: a number of near-memory computing modules required to process the package, a list of near-memory computing modules required to process the package ([0154] A dependency relationship and an execution sequence that are of the functions and that are recorded by the DAG 204 in FIG. 3 are as follows: The function d and the function e are first executed. The function b and the function c depend on the function e. Accordingly, the function b and the function c are executed after the function e is executed. The function a depends on the function b, the function c, and the function d. Accordingly, the function a is executed at last. According to the DAG 204, first, the central processing unit indicates the DPU in the DIMM to execute the function e, and indicates the DPU in the SSD to execute the function d. After the function e is executed, the central processing unit indicates the NPU to execute the function b, and indicates the GPU to execute the function c. After the function b, the function c, and the function d are all executed, the central processing unit executes the function a). Paul, Zhong, Bayat, and Manipatruni fail to teach wherein the package comprises: a header cyclic redundancy check. However, Labonté teaches wherein the package comprises: a header cyclic redundancy check ([0105] the data packet may include, but is not limited to, an assortment of header information, a data payload, and cyclic redundancy check (CRC)). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul, Zhong, Bayat, and Manipatruni with the teachings of Labonté since a cyclic redundancy check performs error checking and thus reduces errors. As per claim 12, it is an AI computing method claim of claim 5, so it is rejected for similar reasons. Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Paul in view of Zhong, in view of Reddy (US 20230266996 A1), in view of Bayat, and further in view of Manipatruni. As per claim 14, Paul teaches an AI computing system, comprising: a plurality of Al computing platforms; each AI computing platform comprises at least one computing component, each computing component comprising: a processor, configured to: initiate a calculation task and decompose the calculation task into a plurality of subtasks (Figs. 1 and 2; [0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads; [0020] As described in more detail below, NMC circuitry 122-1/2, I/O transaction logic 124-1/2 and memory controllers 126-1/2 may be configured to facilitate a gathering and aggregation of data associated with memory-bound AI workloads; [0055] FIG. 9 illustrates an example software flow 900. In some examples, software flow 900 shows an initialization phase and run-time phase for offloading memory bound kernels to NMC circuitry (e.g., NMC circuitry 122) in an I/O switch (e.g., I/O switch 120). For these examples, application programming interface (API) calls running on a host CPU (e.g., host CPU 111) may be used to offload the memory bound kernels to the NMC circuitry; [0019] In some examples, the one or more workloads to be executed by host CPU 111 may include, but are not limited to AI workloads), and generate a package based on the decomposed plurality of subtasks (Fig. 3; [0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads. [0061] Moving to block 930, for each B/P request assigned per thread; the application may aggregate addresses targeted for attached memory device ‘i’ into a packet. ); and a plurality of near-memory computing modules, the plurality of near-memory computing modules connecting in pairs with the processor, and the plurality of near-memory computing modules connecting in pairs with each other, wherein the plurality of near-memory computing modules are each configured to (Figs. 1 and 8; [0020] NMC circuitry 122-1/2, I/O transaction logic 124-1/2 and memory controllers 126-1/2 may be configured to facilitate a gathering and aggregation of data associated with memory-bound AI workloads to be executed by host CPU 111; [0053] According to some examples, if NMC circuitry 822-1 or 822-2 are configured as ASICs, NMC circuitry 822-1 or 822-2 may include an advanced extensible interface (AXI) master interface to couple with I/O transaction logic 824-1 to 824-N and corresponding memory controllers 826-1 or 826-2; [0017] The programmable compute circuitry is hereinafter referred to a “near-memory compute (NMC) circuitry”): receive the package either initially from the processor ([0037] incoming command packets (e.g., using command packet format 300) from a host CPU); [0031] a host CPU that generated a command packet to be sent to NMC circuitry, process a corresponding subtask of the package to modify the package ([0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads. [0061] Moving to block 930, for each B/P request assigned per thread; the application may aggregate addresses targeted for attached memory device ‘i’ into a packet; [0034] FIG. 4 illustrates an example result packet format 400. In some examples, result packet format 400 may include information associated with operations completed by NMC circuitry (e.g., NMC circuitry 122-1) responsive to a command packet sent from a host CPU (e.g., host CPU 111)), and transmit the package as modified directly to (ii) the processor after processing a subtask of the plurality of subtasks ([0034] FIG. 4 illustrates an example result packet format 400. In some examples, result packet format 400 may include information associated with operations completed by NMC circuitry (e.g., NMC circuitry 122-1) responsive to a command packet sent from a host CPU (e.g., host CPU 111); [0035] If communications between the NMC circuitry and the host CPU can utilize CXL.cache protocols according to the CXL specification, the NMC circuitry may be capable of writing results to a host memory space of the host CPU (separate from memory included in attached memory devices) and provide a notification of this writing of results to the host CPU; [0063] Moving to block 940, NMC circuitry receives and processes the request in the command packet from the host CPU; [0064] Moving to block 945, once the request is complete, NMC circuitry may perform a direct memory access to system memory of the host CPU to send results+status into host system memory; [0060] Moving to block 925, the application logic divides batch of ‘B’ requests/queues among ‘P’ threads, where B represents any whole, positive integer greater than 1. According to some examples, the requests may be associated with acceleration requests to use the NMC circuitry for memory-bound AI workloads. [0061] Moving to block 930, for each B/P request assigned per thread; the application may aggregate addresses targeted for attached memory device ‘i’ into a packet). Paul fails to teach an AI cloud computing system, comprising: a cloud computing center, wherein the cloud computing center is connected to the plurality of AI computing platforms; a processor, configured to initiate a calculation task and decompose the calculation task into a plurality of ordered subtasks according to a network topology information table stored in the processor, and generate a package based on the decomposed plurality of ordered subtasks, wherein the plurality of near-memory computing modules are each configured to: implement different operation types, the plurality of near-memory computing modules being configured to complete one or more of the plurality of ordered subtasks according to respective operation types of the different operation types that the plurality of near-memory computing modules implement, receive the package as modified by another near-memory computing module of the plurality of near-memory computing modules, and transmit the package as modified directly to either (i) a subsequent near-memory computing module for processing a subsequent subtask, or (ii) the processor after processing a last subtask of the plurality of ordered subtasks. However, Zhong teaches a processor, configured to: initiate a calculation task and decompose the calculation task into a plurality of ordered subtasks according to a network topology information table stored in the processor, and generate a package based on the decomposed plurality of ordered subtasks ([0049] the following embodiment may be used as a general-purpose near data computing system to support execution of data processing tasks generated by various applications such as a database application, a big data application, and an AI application, to improve flexibility of near data computing. In addition, the data processing task is divided into a plurality of subtasks; [0019] In this optional manner, in one aspect, because the topology diagram records an execution sequence of the subtasks, the central processing unit does not need to recalculate the execution sequence of the subtasks, and can directly perform scheduling according to the execution sequence recorded in the topology diagram; [0153] The topology diagram is used to indicate the plurality of subtasks and the execution sequence of different subtasks; [0154] For example, as shown in FIG. 3, a topology diagram is a DAG 204; [0155] The DAG 204 output by the parser 201 is sent to an executor (Executor) 202 included in the storage device), wherein the plurality of near-memory computing modules are each configured to: implement different operation types, the plurality of near-memory computing modules being configured to complete one or more of the plurality of ordered subtasks according to respective operation types of the different operation types that the plurality of near-memory computing modules implement ([0016] Different dedicated processors are suitable for processing different types of data. For example, a GPU is suitable for processing an image, and some dedicated codec processors are suitable for processing videos; [0097] (1) A subtask suitable for being allocated to the GPU. [0102] (2) A subtask suitable for being allocated to the NPU. [0103] The NPU is specially designed for AI. The NPU includes modules required for AI computing, such as multiplication and addition, activation function, two-dimensional data calculation, and decompression; [0116] if the data is located in the SSD, the central processing unit allocates the subtask to the DPU in the SSD, to schedule the DPU in the SSD to execute the subtask. If the data is located in the DIMM, the central processing unit allocates the subtask to the DPU in the DIMM, to schedule the DPU in the DIMM to execute the subtask; [0106] (4) A subtask suitable for being allocated to a processor of the DIMM. [0107] For example, the DIMM includes the DPU and a DRAM chip (DRAM chips). The DPU can quickly access the DRAM and process data stored in the DRAM, to complete a task nearby. When this feature of the DIMM is considered, in some embodiments, when data that needs to be processed in a task is located in the DRAM in the DIMM, because the DPU and the DRAM are integrated in the same DIMM, the DPU has an advantage of being closest to the data or having highest data affinity. Accordingly, the task can be allocated to the DPU of the DIMM. The DPU in the DIMM is scheduled to process data stored in the DIMM, so that processing in memory (Processing in Memory) or near memory computing (Near Memory Computing) can be implemented). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul with the teachings of Zhong to improve processing efficiency (see Zhong [0121] Different dedicated processors are good at processing different tasks. Therefore, when Scheduling Policy 2 is used, whether a computing feature of a subtask matches a dedicated processor is considered, and the subtask is scheduled to a dedicated processor matching the computing feature of the subtask for execution, so that the dedicated processor can process a task that the dedicated processor is good at processing. In this way, a performance advantage of the dedicated processor is utilized, and data processing efficiency is improved.). Paul and Zhong fail to teach an AI cloud computing system, comprising: a cloud computing center, wherein the cloud computing center is connected to the plurality of AI computing platforms; receive the package as modified by another near-memory computing module of the plurality of near-memory computing modules, and transmit the package as modified directly to either (i) a subsequent near-memory computing module for processing a subsequent subtask, or (ii) the processor after processing a last subtask of the plurality of ordered subtasks. However, Reddy teaches an AI cloud computing system, comprising: a cloud computing center, wherein the cloud computing center is connected to the plurality of AI computing platforms ([0018] The machine learning services 130 can be provided by Machine Learning as a Service (MLaaS) platforms (e.g., SAP AI Core, AZURE, Google Cloud ML, etc.). In some cases, the machine learning control plane 105 can have more than one set of machine learning services (e.g., machine learning services from different MLaaS platforms); [0020] The machine learning worker plane 120 runs in a distributed remote environment 155, which can be any of Edge, Cloud (or public cloud), or On-Premise (or private cloud) environment; [0002] AI platforms provide tools to build, deploy, and manage machine learning models in the cloud.). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul and Zhong with the teachings of Reddy since the cloud provides scalability (see Reddy [0013] guaranteed auto-scaling behaviors with their own cloud infrastructure). Paul, Zhong, and Reddy fail to teach receive the package as modified by another near-memory computing module of the plurality of near-memory computing modules, and transmit the package as modified directly to either (i) a subsequent near-memory computing module for processing a subsequent subtask, or (ii) the processor after processing a last subtask of the plurality of ordered subtasks. However, Bayat teaches receive the package as modified by another near-memory computing module of the plurality of near-memory computing modules, transmit the package as modified directly to either (i) a subsequent near-memory computing module for processing a subsequent subtask ([0054] Any digital accelerator (Di) in the plurality of digital accelerators 103 or any IMC accelerator (Ai) in the plurality of IMC accelerators 102 may receive inputs either from an internal memory, such as central memory 106 or an external memory (not shown), or from the processor/controller 101, or directly from an internal memory or buffer of the Di or Ai accelerators and send back the results of the computation either to the internal or external memory, or to the processor/controller 101, or directly to any of the Di or Ai accelerators; [0002] In-Memory Computing (IMC) and digital accelerators to be used for the acceleration of AI algorithms; [0052] In some embodiments, the results produced by one accelerator may be directly routed to the input of another accelerator; [0047] These accelerators may work together to implement the same layer of the network or they may be pipelined to implement different layers of a network.). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul, Zhong, and Reddy with the teachings of Bayat to save power (see Bayat [0052] In some embodiments, the results produced by one accelerator may be directly routed to the input of another accelerator. Skipping the transfer of results to memory may result in further power saving.). Paul, Zhong, Reddy, and Bayat fail to teach transmit the package as modified directly to the processor after processing a last subtask of the plurality of ordered subtasks. However, Manipatruni teaches transmit the package as modified directly to the processor after processing a last subtask of the plurality of ordered subtasks ([0019] The analog in-memory AI processor 140 comprises one or more NN layers, which may be configured as convolutional neural network (CNN) layers and/or full connected (e.g., all-to-all) NN layers, in any combination, as will be described in greater detail below. The results of the NN processing (e.g., an image classification or recognition) are provided back to the CPU 110 as outputs 150; [0030] AI processor 140 is shown to implement a multi-layer layer CNN comprising N CNN layers 140a. The multi-layer layer CNN is mapped to an in-memory data path, wherein the output of each layer is coupled to the next layer through a digital access circuit 210. In some embodiments, AI processor 140 may also include one or more fully connected (e.g., all-to-all) layers 140b coupled in series with the CNN layers 140a to implement a neural network of any desired complexity; bottom table on page 3 PNG media_image1.png 102 534 media_image1.png Greyscale ). It would have been obvious to one having ordinary skill in the art before the effective filling date of the claimed invention to have combined Paul, Zhong, Reddy, and Bayat with the teachings of Manipatruni to improve efficiency (see Manipatruni [0014] To this end, the disclosed techniques for implementing a hybrid processor, with an extended AI instruction set to perform analog in-memory processing, provide for reduced latency and improved efficiency in AI applications). Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to HSING CHUN LIN whose telephone number is (571)272-8522. The examiner can normally be reached Mon - Fri 9AM-5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aimee Li can be reached at (571) 272-4169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /H.L./Examiner, Art Unit 2195 /Aimee Li/Supervisory Patent Examiner, Art Unit 2195
Read full office action

Prosecution Timeline

Mar 28, 2023
Application Filed
Aug 09, 2025
Non-Final Rejection — §101, §103, §112
Nov 13, 2025
Response Filed
Jan 24, 2026
Final Rejection — §101, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12554523
REDUCING DEPLOYMENT TIME FOR CONTAINER CLONES IN COMPUTING ENVIRONMENTS
2y 5m to grant Granted Feb 17, 2026
Patent 12547458
PLATFORM FRAMEWORK ORCHESTRATION AND DISCOVERY
2y 5m to grant Granted Feb 10, 2026
Patent 12468573
ADAPTIVE RESOURCE PROVISIONING FOR A MULTI-TENANT DISTRIBUTED EVENT DATA STORE
2y 5m to grant Granted Nov 11, 2025
Patent 12461785
GRAPHIC-BLOCKCHAIN-ORIENTATED SHARDING STORAGE APPARATUS AND METHOD THEREOF
2y 5m to grant Granted Nov 04, 2025
Patent 12443425
ISOLATED ACCELERATOR MANAGEMENT INTERMEDIARIES FOR VIRTUALIZATION HOSTS
2y 5m to grant Granted Oct 14, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
59%
Grant Probability
99%
With Interview (+79.8%)
3y 4m
Median Time to Grant
Moderate
PTA Risk
Based on 108 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month