Last updated: May 29, 2026

Application No. 18/265,219

NEURAL NETWORK NEAR MEMORY PROCESSING

Non-Final OA §102§112

Filed

Jun 02, 2023

Priority

Dec 02, 2020 — nonprovisional of PCTCN2020133406

Examiner

SNYDER, STEVEN G

Art Unit

2184

Tech Center

2100 — Computer Architecture & Software

Assignee

Alibaba Group Holding Limited

OA Round

1 (Non-Final)

Interview Optional

— -8.3% interview lift. Interview lift (-8.3%) is below the 15.0% threshold. A written response is recommended.

Based on 860 resolved cases, 2023–2026

Examiner Intelligence

SNYDER, STEVEN G View full profile →

Grants 80% — above average

Career Allowance Rate

691 granted / 860 resolved

+25.3% vs TC avg

Minimal -8% lift

Without

With

+-8.3%

Interview Lift

resolved cases with interview

Typical timeline

2y 8m

Avg Prosecution

13 currently pending

Career history

882

Total Applications

across all art units

Statute-Specific Performance

§101

2.0%

-38.0% vs TC avg

§103

86.0%

+46.0% vs TC avg

§102

4.4%

-35.6% vs TC avg

§112

3.6%

-36.4% vs TC avg

Black line = Tech Center average estimate • Based on career data from 860 resolved cases

Office Action

§102 §112

DETAILED ACTION
This is in response to the application filed on June 2, 2023 in which claims 1 – 24 are presented for examination.
Status of Claims
Claims 1 – 24 are pending, of which claims 1, 8, and 21 are in independent form.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 7/11/2023 and 10/9/2023 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Specification
The disclosure is objected to because of the following informalities: Applicant’s [0004] states “In addition, the conventional system as subject to high bandwidth utilization.”  The examiner recommends amending the specification at [0004] to state “In addition, the conventional system is subject to high bandwidth utilization.”
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 24 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 24 recites the limitation "the compute extension" in line 1.  There is insufficient antecedent basis for this limitation in the claim.  Note that “a read with compute extension or a write with compute extension” is found in claim 23.  However, claim 24 is listed as depending from claim 21, which does not introduce a compute extension.  Therefore, it is possible that claim 24 was meant to depend from claim 23.

Claim Objections
Claims 4 – 7 are objected to because of the following informalities: claim 4 states “compute the aggregation operation on the attributes base on the first memory access to generate result data.”  The examiner recommends amending claim 4 to state “compute the aggregation operation on the attributes based on the first memory access to generate result data.”  Claims 5 – 7 inherit this objection based on their dependencies.  Appropriate correction is required.
Claim 12 is objected to because of the following informalities: claim 12 introduces the acronyms “GenZ/CXL” and “DDR” is no further definition for these acronyms.  The examiner suggests an amendment to define these acronyms, such as “Compute Express Link (GenZ/CXL)” and “double data rate (DDR).”  Appropriate correction is required.
Claim 12 is objected to because of the following informalities: claim 12 does not end with a period.  Appropriate correction is required.
Claims 19 – 20 are objected to because of the following informalities: claim 19 states “determining, by a central core, a neural network stage and data associated with a graph node its neighbor nodes; writing, by the central core, the data associated with the graph node its neighbor nodes to a given memory unit when the neural network stage is a first stage or one of a first group of stages.”  The examiner recommends amending claim 19 to state “determining, by a central core, a neural network stage and data associated with a graph node and its neighbor nodes; writing, by the central core, the data associated with the graph node and its neighbor nodes to a given memory unit when the neural network stage is a first stage or one of a first group of stages.”  Claim 20 inherits this objection based on its dependencies.  Appropriate correction is required.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 4 – 18, and 21 – 24 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by ‘GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing’ by Zhou et al. (hereinafter referred to as Zhou).

Referring to claim 1, Zhou discloses “A neural network processing system” (Figure 5 GNNear system with Centralized Acceleration Engine (CAE) and NMP-enabled DIMM with Near-Memory Engines (NME)) “comprising: a central core” (Figure 5 CAE); “and one or more memory units coupled to the central core, wherein respective memory units include: one or more memory devices” (Figure 5 DIMMs include DRAMs); “and a controller coupled to the one or more memory devices” (Figure 5 NME is part of the buffer chip of DIMM) “and configured to perform aggregation operations on data stored in the one or more memory device of the respective memory unit offloaded from the central core” (Abstract teaches “we offload the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs).”  As taught in section 2.2, “The Reduce operations aggregate features/gradients along the edges of each destination vertex.” Further, Figure 5 and section 4.1 ‘Base Workflow’ section teaches “For Reduce, CAE sends customized instructions through the memory interfaces to NMEs. NMEs decode the instructions and perform partial reduction with the assigned source vertices locally.”  Also section 4.3 teaches “The Execution Unit (EU for short) in each NME is responsible for near-memory partial reduction computation of Reduce operations”).

	As per claim 4, Zhou discloses “the controller is further configured to: receive a first memory access including an aggregation operation” (Figure 5 and section 4.1 ‘Base Workflow’ section teaches “For Reduce, CAE sends customized instructions through the memory interfaces to NMEs. NMEs decode the instructions and perform partial reduction with the assigned source vertices locally.”  Also section 4.3 teaches “The Execution Unit (EU for short) in each NME is responsible for near-memory partial reduction computation of Reduce operations”); access attributes in the respective one or more memory devices based on the first memory access; compute the aggregation operation on the attributes base[d] on the first memory access to generate result data” (Abstract teaches “we offload the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs).” Also section 4.3 teaches “The Execution Unit (EU for short) in each NME is responsible for near-memory partial reduction computation of Reduce operations”); “and output the result data based on the first memory accesses to the central core” (Section 4.1 “the partial results are read out by CAE.” Figure 7 output buffer of each NME. Figure 15 NMEs sending results to result FIFOs in CAE).

	As per claim 5, Zhou discloses “the central core is configured to: schedule the first memory access including the aggregation operation” (Section 4.2 CAE Architecture describes the controller of the CAE schedules the training. Section 5.2 ‘Window-based Scheduling’ section teaches “We allow the CAE-side controller to issue instructions for interval i + 1 immediately after a DIMM finishes interval i, if interval i + 1 is within the Processing window” and “Once every DIMM's results of interval i are merged, CAE commits interval i and right-shifts the Processing window. By this means, we can schedule multiple intervals concurrently and mitigate the load-imbalance problems caused by graphs' irregularity”); “send the first memory access including the aggregation operation to the controller” (Figure 5 and section 4.1 ‘Base Workflow’ section teaches “For Reduce, CAE sends customized instructions through the memory interfaces to NMEs. NMEs decode the instructions and perform partial reduction with the assigned source vertices locally”); “and receive the result data based on the first memory accesses from the controller” (Section 4.1 “the partial results are read out by CAE.” Figure 7 output buffer of each NME. Figure 15 NMEs sending results to result FIFOs in CAE).

	As per claim 6, Zhou discloses “the central core is further configured to: compute a further aggregation operation on the result data received from the controller” (Section 4.2 the CAE merges partial reduction results).

	As per claim 7, Zhou discloses “the controller is further configured to: receive a second memory access request; access attributes in the respective one or more memory devices based on the second memory access; and output the attributes based on the second memory request to the central core” (Figure 5 and section 4.3 “Apart from near-memory processing, if NME receives standard DDR commands from the CAE-side memory controller, it will bypass execution units and directly conducts Read/Write/Precharge commands, etc.”).

Referring to claim 8, Zhou discloses “A near memory processing method” (Figure 5 DIMMs with Near-Memory Engine (NME)) “comprising: receiving, by a controller, a first memory access including an aggregation operation; accessing, by the controller, attributes based on the first memory access; computing, by the controller, the aggregation operation on the attributes based on the first memory access to generate result data” (Abstract teaches “we offload the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs).”  As taught in section 2.2, “The Reduce operations aggregate features/gradients along the edges of each destination vertex.” Further, Figure 5 and section 4.1 ‘Base Workflow’ section teaches “For Reduce, CAE sends customized instructions through the memory interfaces to NMEs. NMEs decode the instructions and perform partial reduction with the assigned source vertices locally.”  Also section 4.3 teaches “The Execution Unit (EU for short) in each NME is responsible for near-memory partial reduction computation of Reduce operations”); “and outputting, from the controller, the result data based on the first memory accesses” (Section 4.1 “the partial results are read out by CAE.” Figure 7 output buffer of each NME. Figure 15 NMEs sending results to result FIFOs in CAE).

	As per claim 9, Zhou discloses “the aggregation operation comprises a graph neural network aggregation operation” (Abstract teaches GNN training and “we offload the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs).” Section 2.2, “The Reduce operations aggregate features/gradients along the edges of each destination vertex”).

	As per claim 10, Zhou discloses “the memory access including the aggregation operation comprises a read with compute extension or a write with compute extension” (section 4.3 NME receives NMP instructions from CAE, decodes the instructions, and starts local execution. The controller also generates signals to access local DRAMs. Figures 10 and 11 show instructions with data being accessed and bits to specify an operation).

	As per claim 11, Zhou discloses “the compute extension can include a data address, data count and data stride” (Figures 10 and 11 DIMM, Vector_size, index, bank, etc.).

	As per claim 12, Zhou discloses “the compute extension is embedded in a GenZ/CXL data package, or extended DDR command” (section 4.3 NME receives NMP instructions from CAE, decodes the instructions, and starts local execution. The controller also generates signals to access local DRAMs according to DDR). 

	As per claim 13, Zhou discloses “a mode of the first memory access including the aggregation operation includes a complete compute mode or a partial compute mode” (section 2.3 Full-batch vs. Mini-batch Training).

Note, claim 14 recites the corresponding limitations of claim 7.  Therefore, the rejection of claim 7 applies to claim 14.

	As per claim 15, Zhou discloses “the second memory access includes a read or write” (Figure 5 and section 4.3 “Apart from near-memory processing, if NME receives standard DDR commands from the CAE-side memory controller, it will bypass execution units and directly conducts Read/Write/Precharge commands, etc.”).

	As per claim 16, Zhou discloses “a mode of the second memory access includes a no compute mode” (Figure 5 and section 4.3 “Apart from near-memory processing, if NME receives standard DDR commands from the CAE-side memory controller, it will bypass execution units and directly conducts Read/Write/Precharge commands, etc.”).

Note, claim 17 recites the corresponding limitations of claim 5.  Therefore, the rejection of claim 5 applies to claim 17.

Note, claim 18 recites the corresponding limitations of claim 6.  Therefore, the rejection of claim 6 applies to claim 18.

	Referring to claim 21, Zhou discloses “A controller” (Figure 5 Near-Memory Engine (NME)) “comprising: a plurality of computation units” (Figure 7 NME with multiple processing elements (PEs)); “and control logic configured to: receive a first memory access including an aggregation operation” (section 4.3 NME receives NMP instructions from CAE, decodes the instructions, and starts local execution including “near-memory partial reduction computation of Reduce operations.”  As taught in section 2.2, “The Reduce operations aggregate features/gradients along the edges of each destination vertex.”); “access attributes based on the first memory access; configure one or more of the plurality of computation units to compute the aggregation operation on the attributes based on the first memory access to generate result data” (Abstract teaches “we offload the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs).” Further, Figure 5 and section 4.1 ‘Base Workflow’ section teaches “For Reduce, CAE sends customized instructions through the memory interfaces to NMEs. NMEs decode the instructions and perform partial reduction with the assigned source vertices locally.”  Also section 4.3 teaches “The Execution Unit (EU for short) in each NME is responsible for near-memory partial reduction computation of Reduce operations”); “and output the result data based on the first memory accesses” (Section 4.1 “the partial results are read out by CAE.” Figure 7 output buffer of each NME. Figure 15 NMEs sending results to result FIFOs in CAE).

Note, claim 22 recites the corresponding limitations of claim 7.  Therefore, the rejection of claim 7 applies to claim 22.

Note, claim 23 recites the corresponding limitations of claim 10.  Therefore, the rejection of claim 10 applies to claim 23.

Note, claim 24 recites the corresponding limitations of claim 11.  Therefore, the rejection of claim 11 applies to claim 24.

Allowable Subject Matter
Claims 2, 3, 19, and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
U.S. Patent Application 20210150323 teaches neural network processing and near-memory configuration enables performing matrix calculations without numerous off-chip data reads and/or writes.
U.S. Patent Application 20230162024 teaches graph neural network processing with an aggregation phase using Ternary Content Addressable Memory (TCAM)-based training to greatly reduce data movement.
U.S. Patent Application 20240005127 and Patent 12518130 teach operations that accelerate GNN memory access in a near-memory manner.
U.S. Patent Application 20250307180 teaches co-processing acceleration with an in-memory compute element to perform in-memory processing on graph data.
	Kazi Asifuzzaman et al., 'A survey on processing-in-memory techniques: Advances and challenges' describes Processing-in-memory (PIM) techniques.
Zhe Zhou et al., 'GCNear: A Hybrid Architecture for Efficient GCN Training with Near-Memory Processing' teaches Graph Convolutional Networks (GCNs) and offloading reduce operations to near memory elements.
	Sungmin Yun et al., ‘GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks’ teaches Graph Convolutional Networks (GCNs) and accelerating memory-intensive aggregation operations by locating processing elements near the DRAM datapath.
Machine Translation of WIPO Publication WO 201115589 A1 teaches in-memory processing including extended commands.
Machine Translation of Chinese Patent Application CN 118761448 A teaches an in-memory calculation (CIM) architecture can effectively avoid frequent data transportation, and a  common solution is to expand new instructions suitable for AI calculation based on the existing CPU instruction set.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEVEN G SNYDER whose telephone number is (571)270-1971.  The examiner can normally be reached on M-F 8:00am-4:30pm (flexible).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Henry Tsai can be reached on 571-272-4176.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/STEVEN G SNYDER/Primary Examiner, Art Unit 2184

Read full office action

Prosecution Timeline

Jun 02, 2023

Application Filed

Apr 01, 2026

Non-Final Rejection mailed — §102, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/425,239

Patent 12639073

RETURN ADDRESS RESTORATION

2y 3m to grant Granted May 26, 2026

18/490,628

Patent 12639069

PROVIDING ADDITIONAL OPERATIONS FOR A FUNCTIONAL UNIT OF A PROCESSOR CORE

2y 7m to grant Granted May 26, 2026

18/938,353

Patent 12639254

MULTI-CAST SNOOP VECTORS WITHIN A MESH TOPOLOGY

1y 6m to grant Granted May 26, 2026

18/230,316

Patent 12626114

Discrete Three-Dimensional Processor

2y 9m to grant Granted May 12, 2026

18/783,056

Patent 12625838

RETIMER PATH CONTROL METHOD, APPARATUS, AND SYSTEM

1y 9m to grant Granted May 12, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

80%

Grant Probability

72%

With Interview (-8.3%)

2y 8m (~0m remaining)

Median Time to Grant

Low

PTA Risk

Based on 860 resolved cases by this examiner. Grant probability derived from career allowance rate.