Detailed Action
1. This office action is in response to communication filed January 15, 2026. Claims 1-2, 4-15, and 17-22 are currently pending and claims 1, 14, and 20 are the independent claims.
Notice of Pre-AIA or AIA Status
2. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
3. This Final Office Action is in response to the applicant’s remarks and arguments filed on
January 15, 2026.
Claims 1, 6, 14, and 20 are amended. Claims 3 and 16 have been cancelled. Claims 21 and 22 are new. Claims 1-2, 4-15, and 17-22 remain pending in the application. Claims 2, 4-5, 7-13, 15, and 17-19 filed on April 17, 2023 are being considered on the merits along with amended claims 1, 6, 14, and 20 and new claims 21 and 22.
Response to Arguments
4. Applicant's arguments filed January 15, 2026 have been fully considered but they are not persuasive.
The Applicant respectfully submitted on pages 10-12 of the Remarks under section Claim Rejections - 35 U.S.C. 102 that “the pending claims include at least one limitation that is not disclosed by Ravishankar” in independent claim 1.
The Examiner respectfully understands the Applicant’s statement and has further updated the rejection for independent claim 1 to now be a 103 rejection with Ravishankar in view of Li. The rejection exists in detail in section 5 of the Office Action below.
The Applicant respectfully submitted on pages 12-13 of the Remarks under section Claim Rejections - 35 U.S.C. 103 that “none of other references disclose or suggest at least ‘wherein the constraint relationship indicates at least one of: a relationship between available memory space of one piece of tensor data and available memory space of another piece of tensor data is reusable, the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable, or the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable and continuous.’”
The Examiner respectfully disagrees with the Applicant’s assessment that none of the other references used in prior 103 rejections disclose the amendment to independent claims 1, 14, and 20. The rejection exists in detail in section 5 of the Office Action below via Ravishankar in view of Li. The Applicant specifically stated that Ravishankar fails to disclose or teach the amended independent claim in the 102 rejection Remarks section. Additionally, the Applicant specifically stated that Ravishankar in view of Pang, Ravishankar in view of Li, Ravishankar in view of Li and Pang, and Ravishankar in view of Matthews fail to disclose or teach the amended independent claim in the 103 rejection Remarks section. The Examiner has submitted a rejection for amended claim 1 via Ravishankar in view of Li. The Examiner specifically believes the following citation to disclose the amended claim limitation: [0160] “In some implementations, when an intermediate result (e.g., intermediate tensor and/or gradient result) is no longer used (e.g., determined based on the LDG subgraph), memory allocated for the intermediate result may be reused for other nodes to improve memory efficiency.” The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the reusability of memory space is possible when intermediate results are no longer used, so they are reused by other nodes needing tensor memory space. As such, the relationship between available memory space of one piece of tensor data and available memory space of another piece of tensor data being reusable is made clear. As such, the amended independent claims that now contain the claimed limitation from cancelled claim 3 has an updated prior art rejection in this Final Office Action. Claims 1-2, 4-15, and 17-22 remain rejected in this Final Office Action.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
5. Claims 1-2, 6, 8, 10, 12-15, and 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Ravishankar et al. (U.S. Patent No. 12,423,007) – hereinafter “Ravishankar” in view of Li (U.S. Pub. No. 2020/0293916).
Regarding independent claim 1, Ravishankar discloses:
A method of memory allocation, comprising:
obtaining a computation graph corresponding to a neural network, wherein the computation graph comprises N nodes and a directed edge that connects different nodes, the directed edge of the computation graph carrying a piece of tensor data of M pieces of tensor data, the computation graph comprising the M pieces of tensor data, wherein M is an integer greater than 1; and (Col. 5, Lines 4-24 “In at least one embodiment, representation of a computer program 108 is structured data (e.g., data according to a predetermined format and/or syntax) that represents an entire computer program. In at least one embodiment, representation of a computer program 108 is structured data that represents a portion of a computer program rather than an entire computer program, where the representation can define a directed acyclic graph (DAG) to indicate a use of tensor data in a deep learning neural network. In at least one embodiment, each node of DAG represents an operation producing some tensor output, and each edge represents a tensor producer-consumer relation. In at least one embodiment, a client that uses system 100 (e.g., an application that uses system 100 to compile and/or run a deep learning neural network training and/or inferencing technique) allocates memory for tensors that are not produced within neural networks used by application (e.g., live-in tensors) and for tensors used outside neural network (e.g., live-out tensors). In at least one embodiment, system 100 allocates memory for tensors needed to hold intermediate results based, at least in part, on memory allocation plan 104.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the DAG indicates use of tensor data in a deep learning neural network. Each node represents an operation producing tensor data and each edge represents tensor producer-consumer relations.
sequentially allocating memory space to the M pieces of tensor data based on a sorting result of the M pieces of tensor data, wherein in response to that at least one part of allocated memory space can be reused for one piece of tensor data of the M pieces of tensor data, the at least one part of the memory space that can be reused for the one piece of tensor data is allocated to the piece of tensor data, wherein the allocated memory space is memory space that has been allocated to at least one piece of tensor data of the M pieces of tensor data before allocating memory space to the piece of tensor data, … (Col. 9, Lines 15-37 “FIG. 6 illustrates a flowchart of a technique 600 of determining tensor memory allocation, according to at least one embodiment. In at least one embodiment, technique 600 is performed by at least one circuit, at least one system, at least one processor, at least one graphics processing unit, at least one parallel processor, and/or at least some other processor or component thereof described and/or shown herein. In at least one embodiment, tensor memory allocator 102 performs global memory allocation for set of root tensors based, at least in part, on a dynamic memory allocation scheme. In at least one embodiment, at a block 602, technique 600 includes maintaining a list of regions previously allocated to a set of dead tensors, but that are free because tensors in set of tensors are no longer live. In at least one embodiment, list of regions is an ordered list, ordered based, at least in part, on a start location of a free region. In at least one embodiment, at a decision block 604, when a new tensor is to be allocated, technique 600 includes determining whether new tensor fits into a slot (e.g., region in list of regions). In at least one embodiment, if, at decision block 604, it is determined that new tensor fits into a slot, technique 600 includes, at a block 606, allocating new tensor to slot.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator 102 allocates memory for tensors based on a dynamic memory allocation scheme. If a set of tensors are no longer live, their memory region can be reallocated to new tensors that fit into the slot.
… wherein the sorting result indicates a sequence of allocating memory space to the M pieces of tensor data, wherein the sorting result is related to information about each piece of tensor data of the M pieces of tensor data, wherein the information about each piece of tensor data indicates at least one of: a constraint relationship corresponding to each piece of tensor data and a quantity of nodes to which each piece of tensor data flows, or the constraint relationship indicating a relationship between available memory space of one piece of tensor data of the M pieces of tensor data and available memory space of other one or more pieces of tensor data in the M pieces of tensor data; (Col. 8, Line 63 – Col. 9, Line 8 “In at least one embodiment, with respect to technique 200 of FIG. 2, technique 300 of FIG. 3, technique 400 of FIG. 4 and/or technique 500 of FIG. 5, tensor memory allocator 102 determines an ordering of basic blocks. In at least one embodiment, tensor memory allocator 102 generates a topological sort order. In at least one embodiment, topological sort order starts from entry and is ordered toward exit. In at least one embodiment, multiple valid sort orders are possible, and tensor memory allocator 102 generates topological sort order based, at least in part, on a heuristic. In at least one embodiment, tensor memory allocator 102 generates topological sort order such that each operation has a unique global order.” and Col. 14, Lines 16-29 “In at least one embodiment, tensor memory allocator 102 uses bin-packing to reuse space occupied by tensors after tensors become dead. In at least one embodiment, each bin has a virtual level property that specifies how much of bin has been filled. In at least one embodiment, tensor memory allocator 102 splits a set of tensors into categories based on normalized size of tensors in set of tensors. In at least one embodiment, tensor memory allocator 102 splits set of tensors into three categories based on normalized tensor size. In at least one embodiment, tensor memory allocator 102 splits set of tensors into a tiny category with size in interval (0, ⅓], a medium category with size in interval (⅓, ⅔], and a large category with size in interval (⅔, 1].”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator 102 generates a topological sort order and can split the tensors into sets of tensors with similar size to indicate a relationship between the available memory space for a set of pieces of tensor data.
Ravishankar does not explicitly disclose:
wherein the constraint relationship indicates at least one of: a relationship between available memory space of one piece of tensor data and available memory space of another piece of tensor data is reusable, the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable, or the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable and continuous.
However, Li discloses:
wherein the constraint relationship indicates at least one of: a relationship between available memory space of one piece of tensor data and available memory space of another piece of tensor data is reusable, … ([0160] “In some implementations, when an intermediate result (e.g., intermediate tensor and/or gradient result) is no longer used (e.g., determined based on the LDG subgraph), memory allocated for the intermediate result may be reused for other nodes to improve memory efficiency.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the reusability of memory space is possible when intermediate results are no longer used, so they are reused by other nodes needing tensor memory space.
… the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable, or the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable and continuous.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add wherein the constraint relationship indicates at least one of: a relationship between available memory space of one piece of tensor data and available memory space of another piece of tensor data is reusable, the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable, or the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable and continuous as seen in Li's invention into Ravishankar's invention because these modifications allow the use of a known technique to improve similar devices in the same way such that tensor data memory that is no longer in use can be reusable for another piece of tensor data to optimize storage capabilities.
Regarding claim 2, Ravishankar discloses the method according to claim 1, wherein in response to that the allocated memory space cannot be reused for the piece of tensor data, other memory space is allocated to the piece of tensor data, wherein the other memory space is different from the allocated memory space. (Col. 9, Lines 37-40 “In at least one embodiment, if, at decision block 604, it is determined that new tensor does not fit into any slots in list of regions, technique 600 includes, at a block 608, allocating new tensor in a new space (e.g., a new memory region).”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the new tensor not being able to fit in reallocated memory space leads to the allocation of a tensor to a newly created memory region.
Regarding claim 6, Ravishankar discloses the method according to claim 1, wherein the computation graph comprises a plurality of computing subtasks, wherein the computing subtask indicates a computing function by using a group of nodes and an edge related to the group of nodes, (Col. 5, Lines 4-15 “In at least one embodiment, representation of a computer program 108 is structured data (e.g., data according to a predetermined format and/or syntax) that represents an entire computer program. In at least one embodiment, representation of a computer program 108 is structured data that represents a portion of a computer program rather than an entire computer program, where the representation can define a directed acyclic graph (DAG) to indicate a use of tensor data in a deep learning neural network. In at least one embodiment, each node of DAG represents an operation producing some tensor output, and each edge represents a tensor producer-consumer relation.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the graph contains nodes that represent operations and each edge represents a tensor producer-consumer relation.
and wherein an execution relationship between the plurality of computing subtasks is parallel; (Col. 3, Lines 41-48 “In at least one embodiment, a stream scheduler 106 uses a representation of a computer program 108 to generate a modified representation of a computer program 110. In at least one embodiment, stream scheduler 106 assigns at least one operation from representation of a computer program 108 to run in parallel on two or more streams, and modified representation of a computer program 110 includes at least one indication of a parallelized operation.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the computer program operations are ran in parallel.
and wherein the method further comprises:
obtaining the information about each piece of tensor data of the M pieces of tensor data based on the updated computation graph. (Col. 14, Lines 16-29 “In at least one embodiment, tensor memory allocator 102 uses bin-packing to reuse space occupied by tensors after tensors become dead. In at least one embodiment, each bin has a virtual level property that specifies how much of bin has been filled. In at least one embodiment, tensor memory allocator 102 splits a set of tensors into categories based on normalized size of tensors in set of tensors. In at least one embodiment, tensor memory allocator 102 splits set of tensors into three categories based on normalized tensor size. In at least one embodiment, tensor memory allocator 102 splits set of tensors into a tiny category with size in interval (0, ⅓], a medium category with size in interval (⅓, ⅔], and a large category with size in interval (⅔, 1].”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator 102 can split the tensors into sets of tensors with similar size to indicate a relationship between the available memory space for a set of pieces of tensor data to gain information on each tensor following the graph update.
Ravishankar does not explicitly disclose:
in one computing subtask, in response to that there is no directed edge between two adjacent nodes, adding a directed edge between the two adjacent nodes, and updating the computation graph to obtain an updated computation graph, wherein each added directed edge carries corresponding piece of tensor data, and wherein the two adjacent nodes are two nodes that are adjacent in an execution sequence in the computing subtask;
However, Li discloses:
in one computing subtask, in response to that there is no directed edge between two adjacent nodes, adding a directed edge between the two adjacent nodes, and updating the computation graph to obtain an updated computation graph, wherein each added directed edge carries corresponding piece of tensor data, and wherein the two adjacent nodes are two nodes that are adjacent in an execution sequence in the computing subtask; ([0070] “The DSGRCE retrieves these input rules, creates an in-memory logical dependency graph, and adds nodes corresponding to each input rule, without adding their precedents that are not part of the inputs. If an input node depends on another input node, an edge is added between them to capture the dependency. The result is an inner logical dependency graph that contains the input nodes and edges between them. The declaration of the packaged atom also specifies an output rule (e.g., the rule in the last return statement at 170). The output node and its precedents are recursively added to the inner logical dependency graph.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, an edge is added between nodes if an input node depends on another input node and then output nodes and its precedents are added to fill out the updated graph.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add in one computing subtask, in response to that there is no directed edge between two adjacent nodes, adding a directed edge between the two adjacent nodes, and updating the computation graph to obtain an updated computation graph, wherein each added directed edge carries corresponding piece of tensor data, and wherein the two adjacent nodes are two nodes that are adjacent in an execution sequence in the computing subtask as seen in Li’s invention into Ravishankar's invention because these modifications allow applying a known technique to a known device ready for improvement to yield predictable results such that updating of a graph’s edges between nodes can be added to continue to update and improve the neural network.
Regarding claim 8, Ravishankar discloses the method of claim 1, wherein in the computation graph, a value of an identifier of a production node of a piece of tensor data is less than a value of an identifier of a consumption node of the piece of tensor data, and the production node of the tensor data and the consumption node of the piece of tensor data are two adjacent nodes. (Col. 13, Lines 57-67 “In at least one embodiment, tensor memory allocator 102 adds all visited nodes during BFS traversal to a cache to record that visited nodes are reachable from sr. In at least one embodiment, if traversal reached s.sub.w then tensor memory allocator 102 determines that s.sub.i happens before s.sub.j. In at least one embodiment, on subsequent queries for happens before, tensor memory allocator 102 checks cache to see if a traversal from a previous query already found this information. In at least one embodiment, if there was no cache entry then tensor memory allocator traverses graph G as described.” and Col. 29, Lines 27-30 “In at least one embodiment, bus 1102 may be configured to have dozens or even hundreds of nodes, each with its own unique identifier (e.g., a CAN ID).”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator adds all visited nodes to a cache and compares their node identifier to determine if a previous traversal already found this information that the nodes are adjacent.
Regarding claim 10, Ravishankar discloses the method according to claim 1, wherein the information about each piece of tensor data indicates the constraint relationship corresponding to each piece of tensor data, the method further comprising: (Col. 14, Lines 16-29 “In at least one embodiment, tensor memory allocator 102 uses bin-packing to reuse space occupied by tensors after tensors become dead. In at least one embodiment, each bin has a virtual level property that specifies how much of bin has been filled. In at least one embodiment, tensor memory allocator 102 splits a set of tensors into categories based on normalized size of tensors in set of tensors. In at least one embodiment, tensor memory allocator 102 splits set of tensors into three categories based on normalized tensor size. In at least one embodiment, tensor memory allocator 102 splits set of tensors into a tiny category with size in interval (0, ⅓], a medium category with size in interval (⅓, ⅔], and a large category with size in interval (⅔, 1].”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator 102 can split the tensors into sets of tensors with similar size to indicate a relationship between the available memory space for a set of pieces of tensor data.
obtaining, based on the constraint relationship corresponding to each piece of tensor data, constraint amounts respectively corresponding to the M pieces of tensor data, wherein each constraint amount of the constraint amounts is an amount of pieces of tensor data of the M pieces of tensor data for which same memory space cannot be reused with a piece of tensor data of the M pieces of tensor data; and (Col. 14, Lines 16-29 “In at least one embodiment, tensor memory allocator 102 uses bin-packing to reuse space occupied by tensors after tensors become dead. In at least one embodiment, each bin has a virtual level property that specifies how much of bin has been filled. In at least one embodiment, tensor memory allocator 102 splits a set of tensors into categories based on normalized size of tensors in set of tensors. In at least one embodiment, tensor memory allocator 102 splits set of tensors into three categories based on normalized tensor size. In at least one embodiment, tensor memory allocator 102 splits set of tensors into a tiny category with size in interval (0, ⅓], a medium category with size in interval (⅓, ⅔], and a large category with size in interval (⅔, 1].”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator 102 can split the tensors into sets of tensors with similar size to indicate what size a tensor is such that it cannot be placed into a bin without proper available memory space.
sorting the M pieces of tensor data based on the constraint amounts respectively corresponding to the M pieces of tensor data, to obtain the sorting result of the M pieces of tensor data. (Col. 8, Line 63 – Col. 9, Line 15 “In at least one embodiment, with respect to technique 200 of FIG. 2, technique 300 of FIG. 3, technique 400 of FIG. 4 and/or technique 500 of FIG. 5, tensor memory allocator 102 determines an ordering of basic blocks. In at least one embodiment, tensor memory allocator 102 generates a topological sort order. In at least one embodiment, topological sort order starts from entry and is ordered toward exit. In at least one embodiment, multiple valid sort orders are possible, and tensor memory allocator 102 generates topological sort order based, at least in part, on a heuristic. In at least one embodiment, tensor memory allocator 102 generates topological sort order such that each operation has a unique global order. In at least one embodiment, heuristic is deterministic and reduces lifetimes of tensors. In at least one embodiment, tensor memory allocator 102 optimizes heuristic to reduce a live range for a tensor. In at least one embodiment, tensor memory allocator 102 optimizes heuristic to reduce aggregate live ranges for a plurality of tensors.” and Col. 14, Lines 16-29 “In at least one embodiment, tensor memory allocator 102 uses bin-packing to reuse space occupied by tensors after tensors become dead. In at least one embodiment, each bin has a virtual level property that specifies how much of bin has been filled. In at least one embodiment, tensor memory allocator 102 splits a set of tensors into categories based on normalized size of tensors in set of tensors. In at least one embodiment, tensor memory allocator 102 splits set of tensors into three categories based on normalized tensor size. In at least one embodiment, tensor memory allocator 102 splits set of tensors into a tiny category with size in interval (0, ⅓], a medium category with size in interval (⅓, ⅔], and a large category with size in interval (⅔, 1].”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator can generate a sort based on the category of tensor size.
Regarding claim 12, Ravishankar discloses the method according to claim 1, the method further comprising:
sorting the M pieces of tensor data based on the information about each piece of tensor data according to a heuristic algorithm, to obtain the sorting result of the M pieces of tensor data within a preset time period. (Col. 8, Line 63 – Col. 9, Line 15 “In at least one embodiment, with respect to technique 200 of FIG. 2, technique 300 of FIG. 3, technique 400 of FIG. 4 and/or technique 500 of FIG. 5, tensor memory allocator 102 determines an ordering of basic blocks. In at least one embodiment, tensor memory allocator 102 generates a topological sort order. In at least one embodiment, topological sort order starts from entry and is ordered toward exit. In at least one embodiment, multiple valid sort orders are possible, and tensor memory allocator 102 generates topological sort order based, at least in part, on a heuristic. In at least one embodiment, tensor memory allocator 102 generates topological sort order such that each operation has a unique global order. In at least one embodiment, heuristic is deterministic and reduces lifetimes of tensors. In at least one embodiment, tensor memory allocator 102 optimizes heuristic to reduce a live range for a tensor. In at least one embodiment, tensor memory allocator 102 optimizes heuristic to reduce aggregate live ranges for a plurality of tensors.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator generates a topological sort order based on a heuristic algorithm.
Regarding claim 13, Ravishankar discloses the method according to claim 12, wherein the sorting result is a sorting result obtained after an optimization, (Col. 8, Line 63 – Col. 9, Line 15 “In at least one embodiment, with respect to technique 200 of FIG. 2, technique 300 of FIG. 3, technique 400 of FIG. 4 and/or technique 500 of FIG. 5, tensor memory allocator 102 determines an ordering of basic blocks. In at least one embodiment, tensor memory allocator 102 generates a topological sort order. In at least one embodiment, topological sort order starts from entry and is ordered toward exit. In at least one embodiment, multiple valid sort orders are possible, and tensor memory allocator 102 generates topological sort order based, at least in part, on a heuristic. In at least one embodiment, tensor memory allocator 102 generates topological sort order such that each operation has a unique global order. In at least one embodiment, heuristic is deterministic and reduces lifetimes of tensors. In at least one embodiment, tensor memory allocator 102 optimizes heuristic to reduce a live range for a tensor. In at least one embodiment, tensor memory allocator 102 optimizes heuristic to reduce aggregate live ranges for a plurality of tensors.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator generates a topological sort order based on a heuristic algorithm following an optimization of the heuristic.
and wherein a size of maximum memory that needs to be occupied by the neural network and that corresponds to the sorting result obtained after the optimization is less than a size of maximum memory that needs to be occupied by the neural network and that is determined based on a sorting result existing before the optimization. (Col. 9, Lines 8-25 “In at least one embodiment, heuristic is deterministic and reduces lifetimes of tensors. In at least one embodiment, tensor memory allocator 102 optimizes heuristic to reduce a live range for a tensor. In at least one embodiment, tensor memory allocator 102 optimizes heuristic to reduce aggregate live ranges for a plurality of tensors. FIG. 6 illustrates a flowchart of a technique 600 of determining tensor memory allocation, according to at least one embodiment. In at least one embodiment, technique 600 is performed by at least one circuit, at least one system, at least one processor, at least one graphics processing unit, at least one parallel processor, and/or at least some other processor or component thereof described and/or shown herein. In at least one embodiment, tensor memory allocator 102 performs global memory allocation for set of root tensors based, at least in part, on a dynamic memory allocation scheme.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the tensor memory allocator generates a topological sort order based on a heuristic algorithm following an optimization of the heuristic which requires a smaller maximum memory size allocated by the tensor memory allocator following the optimization that reduces the lifetime of tensors that will be replaced or have memory freed up.
Regarding claim 14, it is a memory allocation apparatus claim having the same limitations as cited in method claim 1. Thus, claim 14 is also rejected under the same rationale as addressed in the rejection of claim 1 above.
Regarding claim 15, it is a memory allocation apparatus claim having the same limitations as cited in method claim 2. Thus, claim 15 is also rejected under the same rationale as addressed in the rejection of claim 2 above.
Regarding claim 19, it is a memory allocation apparatus claim having the same limitations as cited in method claim 6. Thus, claim 19 is also rejected under the same rationale as addressed in the rejection of claim 6 above.
Regarding claim 20, it is a non-transitory computer-readable storage medium claim having the same limitations as cited in method claim 1. Thus, claim 20 is also rejected under the same rationale as addressed in the rejection of claim 1 above.
Regarding claim 21, it is a non-transitory computer-readable storage medium claim having the same limitations as cited in method claim 2. Thus, claim 21 is also rejected under the same rationale as addressed in the rejection of claim 2 above.
6. Claims 4-5, 7, 17-18, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Ravishankar et al. (U.S. Patent No. 12,423,007) – hereinafter “Ravishankar” in view of Li (U.S. Pub. No. 2020/0293916), further in view of Pang et al. (U.S. Patent No. 12,026,604) – hereinafter “Pang”.
Regarding claim 4, Ravishankar discloses the method according to claim 1, but does not explicitly disclose:
wherein the constraint relationship is carried in a constraint relationship table; the constraint relationship table comprising identifiers of the M pieces of tensor data; and wherein in the constraint relationship table, a first value indicates that the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is reusable, a second value indicates that the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable, and a third value indicates that the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable and continuous.
However, Pang discloses:
wherein the constraint relationship is carried in a constraint relationship table; the constraint relationship table comprising identifiers of the M pieces of tensor data; and wherein in the constraint relationship table, a first value indicates that the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is reusable, a second value indicates that the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable, and a third value indicates that the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable and continuous. (Col. 17, Lines 29-49 “For each input tensor, the correspondence table of outputs and the number of times of citation 806 is searched for a mapping relationship between an identifier (which is actually the identifier of the output tensor of the previous layer cited by the input tensor, if the input tensor is transferred from the output tensor of the previous layer) of the input tensor and the number of times of citation. ii. If the mapping relationship is found in the table 806, the number of times of citation in the mapping relationship are reduced by 1, for example, so as to subtract the citation (the number of times of citation of this layer is 1) of the output by the input to this layer first. If the number of times of citation become 0 after subtracting 1, it indicates that the output tensor of the previous layer is not cited again subsequently, and the memory block identifier corresponding to the output tensor may become idle. Therefore, the memory block identifier may be put into the list of idle memory block identifiers, thereby implementing memory reuse during the forward calculation of the neural network.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the table represents the number of times the mapping relationship is cited such that a number greater than 1 represents the memory space is non-reusable and continuously used, a number equal to 1 represents the memory space is non-reusable, and a number equal to 0 represents the memory space is reusable for the neural network.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add wherein the constraint relationship is carried in a constraint relationship table; the constraint relationship table comprising identifiers of the M pieces of tensor data; and wherein in the constraint relationship table, a first value indicates that the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is reusable, a second value indicates that the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable, and a third value indicates that the relationship between the available memory space of the one piece of tensor data and the available memory space of the another piece of tensor data is non-reusable and continuous as seen in Pang's invention into Ravishankar's invention because these modifications allow the use of a known technique to improve similar devices in the same way such that the user has the capability to track the constraint relationship between pieces of tensor data to ensure the neural network reuses memory optimally when available.
Regarding claim 5, Ravishankar discloses the method according to claim 1, but does not explicitly disclose:
wherein in response to that all consumption nodes of first tensor data are upstream nodes of a production node of second tensor data, or in response to that all consumption nodes of the second tensor data are downstream nodes of a production node of the first tensor data, memory space allocated to the second tensor data can be reused for the first tensor data; and
wherein in response to that all the consumption nodes of the first tensor data are not the upstream nodes of the production node of the second tensor data, or in response to that all the consumption nodes of the second tensor data are not the downstream nodes of the production node of the first tensor data, the memory space allocated to the second tensor data cannot be reused for the first tensor data, wherein the first tensor data and the second tensor data are any two pieces of tensor data of the M pieces of tensor data, wherein the consumption node is a node to which tensor data flows, and the production node is a node from which tensor data flows.
However, Pang discloses:
wherein in response to that all consumption nodes of first tensor data are upstream nodes of a production node of second tensor data, or in response to that all consumption nodes of the second tensor data are downstream nodes of a production node of the first tensor data, memory space allocated to the second tensor data can be reused for the first tensor data; and (Col. 15, Lines 33-43 “For the memory pre-allocation task of the forward calculation for a neural network, a neural network structure may be first analyzed to determine a relationship between each operation node and each operation node comprised in the structure, and then a one-dimensional list of operation vectors may be constructed, for example [operation1, operation2, . . . ]. The list of vectors may store all the operations of the neural network and the execution order of these operations. Through each operation vector (operation class) in the list of vectors, a set of inputs and a set of inputs in each operation class are obtained.” and Col. 5, Lines 9-17 “Before the forward calculation of the neural network is performed, the calculation operation S100 may be performed at each layer for an operation of the layer. In the calculation operation S100, a memory reuse method is used. Memory to be allocated to each output and memory to be released when forward calculation is performed are calculated in advance for each output from the layer (for subsequent layers to use to implement memory reuse).”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the relationship between each node’s relationship is determined such that a one-dimensional list of operations is constructed. The list shows the node relationship and tensor data memory space can lead to memory reuse for tensor data.
wherein in response to that all the consumption nodes of the first tensor data are not the upstream nodes of the production node of the second tensor data, or in response to that all the consumption nodes of the second tensor data are not the downstream nodes of the production node of the first tensor data, the memory space allocated to the second tensor data cannot be reused for the first tensor data, wherein the first tensor data and the second tensor data are any two pieces of tensor data of the M pieces of tensor data, wherein the consumption node is a node to which tensor data flows, and the production node is a node from which tensor data flows. (Col. 15, Lines 33-43 “For the memory pre-allocation task of the forward calculation for a neural network, a neural network structure may be first analyzed to determine a relationship between each operation node and each operation node comprised in the structure, and then a one-dimensional list of operation vectors may be constructed, for example [operation1, operation2, . . . ]. The list of vectors may store all the operations of the neural network and the execution order of these operations. Through each operation vector (operation class) in the list of vectors, a set of inputs and a set of inputs in each operation class are obtained.” and Col. 5, Lines 9-17 “Before the forward calculation of the neural network is performed, the calculation operation S100 may be performed at each layer for an operation of the layer. In the calculation operation S100, a memory reuse method is used. Memory to be allocated to each output and memory to be released when forward calculation is performed are calculated in advance for each output from the layer (for subsequent layers to use to implement memory reuse).” and Col. 14, Lines 28-37 “For example, operations for each layer in a neural network may be abstracted into an operation class through TensorFlow (a symbolic mathematical system based on data flow programming). Each operation class may comprise the above set of inputs 804 and set of outputs 805, and each input and each output in the set of inputs 804 and set of outputs 805 can be abstracted into a tensor class. The tensor class comprises an identifier of a tensor (for example, a name of the tensor), a shape of the tensor, and an address pointer of the tensor.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the relationship between each node’s relationship is determined such that a one-dimensional list of operations is constructed. If the list shows the node relationship leads in multiple directions, then the tensor data memory space cannot be reused since it is still needed for execution.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add wherein in response to that all consumption nodes of first tensor data are upstream nodes of a production node of second tensor data, or in response to that all consumption nodes of the second tensor data are downstream nodes of a production node of the first tensor data, memory space allocated to the second tensor data can be reused for the first tensor data; and wherein in response to that all the consumption nodes of the first tensor data are not the upstream nodes of the production node of the second tensor data, or in response to that all the consumption nodes of the second tensor data are not the downstream nodes of the production node of the first tensor data, the memory space allocated to the second tensor data cannot be reused for the first tensor data, wherein the first tensor data and the second tensor data are any two pieces of tensor data of the M pieces of tensor data, wherein the consumption node is a node to which tensor data flows, and the production node is a node from which tensor data flows as seen in Pang's invention into Ravishankar's invention because these modifications allow applying a known technique to a known device ready for improvement to yield predictable results such that memory space is reused only if tensor data has a continuous flow upstream/downstream from a node such that memory allocation can be reduced with reusage.
Regarding claim 7, Ravishankar discloses the method according to claim 6, but does not explicitly disclose:
wherein the computation graph further comprises a first computing subtask and a second computing subtask that are in a serial execution relationship, and wherein the first computing subtask is before the second computing subtask in an execution sequence;
and wherein the updating the computation graph further comprises:
in response to that there is no directed edge between a last node in the first computing subtask and a first node in the second computing subtask, adding a directed edge between the last node in the first computing subtask and the first node in the second computing subtask.
However, Li discloses:
and wherein the updating the computation graph further comprises:
in response to that there is no directed edge between a last node in the first computing subtask and a first node in the second computing subtask, adding a directed edge between the last node in the first computing subtask and the first node in the second computing subtask. ([0070] “The DSGRCE retrieves these input rules, creates an in-memory logical dependency graph, and adds nodes corresponding to each input rule, without adding their precedents that are not part of the inputs. If an input node depends on another input node, an edge is added between them to capture the dependency. The result is an inner logical dependency graph that contains the input nodes and edges between them. The declaration of the packaged atom also specifies an output rule (e.g., the rule in the last return statement at 170). The output node and its precedents are recursively added to the inner logical dependency graph.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, an edge is added between nodes if an input node depends on another input node and then output nodes and its precedents are added to fill out the updated graph.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add wherein the updating the computation graph further comprises: in response to that there is no directed edge between a last node in the first computing subtask and a first node in the second computing subtask, adding a directed edge between the last node in the first computing subtask and the first node in the second computing subtask as seen in Li’s invention into Ravishankar's invention because these modifications allow applying a known technique to a known device ready for improvement to yield predictable results such that updating of a graph’s edges between nodes can be added to continue to update and improve the neural network.
In addition, Pang discloses:
wherein the computation graph further comprises a first computing subtask and a second computing subtask that are in a serial execution relationship, and wherein the first computing subtask is before the second computing subtask in an execution sequence; (Col. 15, Lines 33-43 “For the memory pre-allocation task of the forward calculation for a neural network, a neural network structure may be first analyzed to determine a relationship between each operation node and each operation node comprised in the structure, and then a one-dimensional list of operation vectors may be constructed, for example [operation1, operation2, . . . ]. The list of vectors may store all the operations of the neural network and the execution order of these operations. Through each operation vector (operation class) in the list of vectors, a set of inputs and a set of inputs in each operation class are obtained.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the operation vectors are placed in the execution order in the list.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add wherein the computation graph further comprises a first computing subtask and a second computing subtask that are in a serial execution relationship, and wherein the first computing subtask is before the second computing subtask in an execution sequence as seen in Pang’s invention into Ravishankar's invention because these modifications allow applying a known technique to a known device ready for improvement to yield predictable results such that a graph including subtasks can serially execute in comparison to other tasks that can be executed in parallel.
Regarding claim 17, it is a memory allocation apparatus claim having the same limitations as cited in method claim 4. Thus, claim 17 is also rejected under the same rationale as addressed in the rejection of claim 4 above.
Regarding claim 18, it is a memory allocation apparatus claim having the same limitations as cited in method claim 5. Thus, claim 18 is also rejected under the same rationale as addressed in the rejection of claim 5 above.
Regarding claim 22, it is a non-transitory computer-readable storage medium claim having the same limitations as cited in method claim 4. Thus, claim 22 is also rejected under the same rationale as addressed in the rejection of claim 4 above.
7. Claims 9 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Ravishankar et al. (U.S. Patent No. 12,423,007) – hereinafter “Ravishankar” in view of Li (U.S. Pub. No. 2020/0293916), further in view of Matthews et al. (U.S. Patent No. 10,931,588) – hereinafter “Matthews”.
Regarding claim 9, Ravishankar discloses the method according to claim 8, but does not explicitly disclose:
wherein an identifier of each node in the computation graph is used to determine the information about each piece of tensor data of the M pieces of tensor data.
However, Matthews discloses:
wherein an identifier of each node in the computation graph is used to determine the information about each piece of tensor data of the M pieces of tensor data. (Col. 7, Lines 37-43 “In an embodiment, the metadata may include a worker set identifier that identifies the worker set associated with the compute data, and/or a worker identifier that identifies the specific node 110 that generated the compute data. The metadata may further specify, in some embodiments, an operation type, a data type, or other configuration data.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the worker identifier identifies the node that generated the compute data and can thus give information about operation type, data type, etc.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add wherein an identifier of each node in the computation graph is used to determine the information about each piece of tensor data of the M pieces of tensor data as seen in Matthew’s invention into Ravishankar's invention because these modifications allow the use of a known technique to improve similar devices in the same way such that information is determined and tracked for each node in the computation graph.
Regarding claim 11, Ravishankar discloses the method according to claim 1, but does not explicitly disclose:
wherein the information about each piece of tensor data indicates a quantity of nodes to which each piece of tensor data flows, the method further comprising:
sorting the M pieces of tensor data based on quantities of consumption nodes that respectively correspond to the M pieces of tensor data, to obtain the sorting result of the M pieces of tensor data.
However, Matthew discloses:
wherein the information about each piece of tensor data indicates a quantity of nodes to which each piece of tensor data flows, the method further comprising: (Col. 8, Lines 33-43 “In an embodiment, the metadata may include a variety of other elements, such as a batch identifier that indicates a specific batch of data that was processed to generate the compute data, a timestamp that indicates when the compute data was generated, a transaction length that specifies a length of the transaction (e.g., a number of compute data sets, data units, bytes, etc.), an operation identifier that indicates the collective action that is to be performed on the compute data set to which a container belongs, data type identifier(s) of compute data elements in the container, node status information, and so forth.”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the compute data includes node status information such that each piece of data indicates the number of nodes it flows to.
sorting the M pieces of tensor data based on quantities of consumption nodes that respectively correspond to the M pieces of tensor data, to obtain the sorting result of the M pieces of tensor data. (Col. 10, Lines 3-9 “In an embodiment, a compute instruction may specify complex collective actions comprising multiple sub-actions that the compute subsystem 124 should perform on the associated compute data, and the order in which the sub-actions are performed. For example, the compute instruction may specify that the values of a compute data element should be sorted…”) The citation is interpreted to read on the claimed invention because under broadest reasonable interpretation, the compute data that includes node status information such that each piece of data indicates the number of nodes it flows to and can be sorted based on that.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to add wherein the information about each piece of tensor data indicates a quantity of nodes to which each piece of tensor data flows, the method further comprising: sorting the M pieces of tensor data based on quantities of consumption nodes that respectively correspond to the M pieces of tensor data, to obtain the sorting result of the M pieces of tensor data as seen in Matthew’s invention into Ravishankar's invention because these modifications allow the use of a known technique to improve similar devices in the same way such that multiple sorting methods for the tensor data are used to ensure memory reusage if tensor data does not flow to many nodes.
Conclusion
8. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Such prior art includes Baskaran et al. (U.S. Patent No. 10,936,569) which discloses a subset of reusable tensors such that a total memory storage memory requirement for all of the reusable tensors in the subset does not exceed a preselected threshold and Zhou et al. (U.S. Pub. No. 2021/0224185) which discloses reallocating nodes on memory blocks for data layout optimization on processing in memory architecture for executing a neural network model.
Examiner has cited particular columns/paragraphs/sections and line numbers in the references applied and not relied upon to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner.
When responding to the Office action, applicant is advised to clearly point out the patentable novelty the claims present in view of the state of the art disclosed by the reference(s) cited or the objections made. A showing of how the amendments avoid such references or objections must also be present. See 37 C.F.R. 1.111(c).
When responding to this Office action, applicant is advised to provide the line and page numbers in the application and/or reference(s) cited to assist in locating the appropriate paragraphs.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL B TRAINOR whose telephone number is (571)272-3710. The examiner can normally be reached Monday-Friday 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Vital can be reached at (571) 272-4215. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/D.T./Examiner, Art Unit 2198
/PIERRE VITAL/Supervisory Patent Examiner, Art Unit 2198