Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for priority to U.S. Provisional Patent Application No. 63343014 filed on 05/17/2022.
Specification
The title of the invention is not descriptive. A new title is required that is clearly indicative of the invention to which the claims are directed.
The following title is suggested: PROXY SYSTEMS AND METHODS FOR LOAD BALANCING NEURAL NETWORK INFERENCE REQUESTS ACROSS MULTIPROCESSING ARCHITECTURES.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1-8, 10-18, and 20 are rejected under 35 U.S.C. 102(a)(1) and (a)(2) as being anticipated by Performance Debug for Networks by Diamant et al.(US 10846201 B1), hereafter Diamant.
Regarding claim 1, Diamant teaches:
A method comprising: receiving a neural network model from a client computing system; (Column 7, lines 52-56, “Memory 212 may be configured to store executable instructions, input data (e.g., pixel data of images), and weights (e.g., the filter parameters) or other parameters of the trained neural network received from, for example, a host device.” A client computing system includes a host device. Receiving the instructions, input data, weights, and other parameters of a neural network is equivalent to receiving the neural network.)
assessing system resource availability on a plurality of processing devices; (Column 10, lines 9-11, “One or more neural network processors 202 may be used to implement a deep neural network that may include multiple sets of convolution, activation, and pooling layers.”; column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Neural network processors are processing devices. Maintaining a list of available hardware resources includes assessing system resource availability.)
selecting a subset of available processing devices based on the system resource availability; (Column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Selecting a subset of available processing devices based on system resource availability includes assigning operations of a neural network based on usage of hardware resources.)
loading the neural network model into each processing device in the subset; (Column 8, lines 14-20, “Host interface 214 may enable communications between the host device and neural network processor 202. For example, host interface 214 may be configured to transmit the memory descriptors including the memory addresses of the stored data (e.g., input data, weights, results of computations, etc.) between the host device and neural network processor 202.”; column 10, lines 9-23, “One or more neural network processors 202 may be used to implement a deep neural network that may include multiple sets of convolution, activation, and pooling layers. For example, a neural network processor 202 may first receive input data and instructions for implementing a first set of convolution, activation, and/or pooling layers. The input data may include the network parameters for the first set of network layers, such as the number of nodes, the weights, or the parameters of the filters, etc. The input data may also include the external input data to be processed by the neural network or intermediate output data from previous layers of the neural network. The instructions may include instructions for computing engine 224, activation engine 228a, and/or pooling engine 228b.” The input data and instructions received includes parameters of the neural network so is considered to include the neural network. Thus, transmitting input data and having it be received by a neural processor is loading the neural network model into a processing device.)
receiving an inference request from the client computing system; (Column 8, lines 14-20, “Host interface 214 may enable communications between the host device and neural network processor 202. For example, host interface 214 may be configured to transmit the memory descriptors including the memory addresses of the stored data (e.g., input data, weights, results of computations, etc.) between the host device and neural network processor 202.”; column 10, lines 9-23, “One or more neural network processors 202 may be used to implement a deep neural network that may include multiple sets of convolution, activation, and pooling layers. For example, a neural network processor 202 may first receive input data and instructions for implementing a first set of convolution, activation, and/or pooling layers. The input data may include the network parameters for the first set of network layers, such as the number of nodes, the weights, or the parameters of the filters, etc. The input data may also include the external input data to be processed by the neural network or intermediate output data from previous layers of the neural network. The instructions may include instructions for computing engine 224, activation engine 228a, and/or pooling engine 228b.” The input data and instructions received include data to be processed by a neural network which is interpreted as receiving an inference request.)
accessing a load state of each processing device in the subset; (Column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Accessing a load state is interpreted to mean recording or viewing system resource usage or available hardware resources. Thus, maintaining a list of available hardware resources includes accessing a load state of each processing device.)
selecting a target processing device from the subset based on the load states; and (Column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Selecting a subset of available processing devices based on the load states includes assigning operations of a neural network based on usage of hardware resources.)
transmitting the inference request to the target processing device. (Column 8, lines 14-20; column 10, lines 9-23, “One or more neural network processors 202 may be used to implement a deep neural network that may include multiple sets of convolution, activation, and pooling layers. For example, a neural network processor 202 may first receive input data and instructions for implementing a first set of convolution, activation, and/or pooling layers. The input data may include the network parameters for the first set of network layers, such as the number of nodes, the weights, or the parameters of the filters, etc. The input data may also include the external input data to be processed by the neural network or intermediate output data from previous layers of the neural network. The instructions may include instructions for computing engine 224, activation engine 228a, and/or pooling engine 228b.” Input data and instructions which includes data to be processed is being transmitted between a host device and a neural network processor through memory. The data to be processed is considered an inference request. Thus, an inference request is being transmitting to a processing device.)
Regarding claim 2, Diamant teaches all of the material disclosed in claim 1 and additionally teaches:
receiving an inference result generated by the target processing device after executing the inference request based on the neural network model; and (Column 8, lines 9-14, “Neural network processor 202 may also store the results of computations (e.g., one or more image recognition decisions or intermediary data) at memory 212, and provide the memory addresses for the stored results to the host device.” An inference result includes results of computations. The results being stored on memory is considered the memory receiving the results; column 7, lines 31-34, “...the CNN may go through the forward propagation step and output a probability for each class using the trained weights and parameters, which may be referred to as an inference...” A CNN going through a forward propagation step is executing an inference request based on a neural network model. Generating an inference result includes outputting a probability; column 10, lines 9-11, “One or more neural network processors 202 may be used to implement a deep neural network that may include multiple sets of convolution, activation, and pooling layers.” The deep neural network, which includes a CNN, is run by a processing device.)
transmitting the inference result to the client computing system. (Column 8, lines 9-14, “Neural network processor 202 may also store the results of computations (e.g., one or more image recognition decisions or intermediary data) at memory 212, and provide the memory addresses for the stored results to the host device.” Providing memory addresses for stored results to a host device is transmitting the results to a client computing system.)
Regarding claim 3, Diamant teaches all of the material disclosed by claim 2, and additionally teaches:
the inference result is an output tensor. (Column 5, lines 6-17, “The convolution operations in a CNN may be used to extract features from the input image. The convolution operations may preserve the spatial relationship between pixels by extracting image features using small regions of the input image. In a convolution, a matrix (referred to as a filter, a kernel, or a feature detector) may slide over the input image (or a feature map) at a certain step size (referred to as the stride). For every position (or step), element-wise multiplications between the filter matrix and the overlapped matrix in the input image may be calculated and summed to get a final value that represents a single element of an output matrix” An inference result is an output of the CNN. A tensor includes a matrix, so an output tensor includes an output matrix.)
Regarding claim 4, Diamant teaches all of the material disclosed in claim 1, and additionally teaches:
the neural network is a convolutional neural network or a neural network comprised of one or more linear algebra operators. (Column 3, lines 12-17, “Techniques disclosed herein may be used to debug any neural network or any other computing system that may include multiple processing engines or may perform a large number of calculations before yielding a final result, such as a convolutional neural network (also referred to as ConvNets or CNNs).”)
Regarding claim 5, Diamant teaches all of the material disclosed in claim 1, and additionally teaches:
automatically determining and negotiating a type of processing device interface associated with a processing device. (Fig. 2; column 8, lines 21-24, “Host interface 214 may include, for example, a peripheral component interconnect express (PCIe) interface or any suitable interface for communicating with the host device.” Fig. 2 shows the interconnect 218 connecting to both a host interface and a neural network processor. Thus, a PCIe interface or any other interface that is determined to be suitable for communicating with the host device is also an interface for the neural network processor; column 10, lines 45-55, “The neural network model may be compiled by a compiler to generate executable instructions. The compiler may convert a neural network model into machine-executable instructions, such as binary instructions, that may be executed by various functional blocks (e.g., processing engines) of the neural network. The compiler may manage the allocation of different operations of the neural network to various hardware resources (e.g., processing engines), the allocation of memory for storing neural network parameters and intermediate data, and the timing and synchronization conditions between the various hardware resources.” The compiler synchronizes conditions between different hardware resources, which could include the interface and processing device.)
Regarding claim 6, Diamant teaches all of the material disclosed in claim 5, and additionally teaches:
the processing device interface is any of a PCIe bus interface, a USB interface, or an IPC interface. (Column 8, lines 21-24, “Host interface 214 may include, for example, a peripheral component interconnect express (PCIe) interface or any suitable interface for communicating with the host device.” A PCIe interface is specifically mentioned as an interface.)
Regarding claim 7, Diamant teaches all of the material disclosed in claim 1, and additionally teaches:
the inference request includes an input tensor. (Column 3, lines 54-61, “An object 110 to be classified, such as an input image, may be represented by a matrix of pixel values. The input image may include multiple channels, each channel representing a certain component of the image. For example, an image from a digital camera may have a red channel, a green channel, and a blue channel. Each channel may be represented by a 2-D matrix of pixels having pixel values in the range of, for example, 0 to 255 (i.e., 8-bit).” A tensor includes a matrix, so an input tensor includes a matrix of pixel values. An object to be classified is an inference request.)
Regarding claim 8, Diamant teaches all of the material disclosed in claim 7, and additionally teaches:
the input tensor is an image generated by an image sensor. (Column 3, lines 54-61, “An object 110 to be classified, such as an input image, may be represented by a matrix of pixel values. The input image may include multiple channels, each channel representing a certain component of the image. For example, an image from a digital camera may have a red channel, a green channel, and a blue channel. Each channel may be represented by a 2-D matrix of pixels having pixel values in the range of, for example, 0 to 255 (i.e., 8-bit).” A digital camera is an image sensor.)
Regarding claim 10, Diamant teaches all of the material disclosed in claim 1, and additionally teaches:
assigning a model ID to the neural network model. (Column 20, lines 55-61, “In some embodiments, the notification packet may also include an identification of the instruction, and an identification of the processing engine that executes the instruction.” A notification packet including an identification an instruction, and of a processing engine that executes the instruction is considered an assignment of a model ID.
Regarding claim 11, Diamant teaches:
An apparatus comprising: a proxy computing system; (Fig. 2; column 7, lines 45-47, “Apparatus 200 may include a neural network processor 202 coupled to memory 212, a direct memory access (DMA) controller 216, and a host interface 214 via an interconnect 218.”; column 10, lines 45-55; column 17, lines 25-30, “According to certain embodiments, a notification 812 may be generated when state machine 800 leaves idle state 810, for example, when a new instruction is read from the instruction buffer by an instruction decoder. The notification may be generated by a debugging circuit in a control unit, such as control unit 305.” A combination of memory, a direct memory access controller, a compiler, a control unit and a host interface is being considered a proxy computing system.)
a client computing system communicatively coupled to the proxy computing system; and (Fig. 2; column 7, lines 45-47; column 8, lines 14-20, “Host interface 214 may enable communications between the host device and neural network processor 202. For example, host interface 214 may be configured to transmit the memory descriptors including the memory addresses of the stored data (e.g., input data, weights, results of computations, etc.) between the host device and neural network processor 202.”, a host device is considered a client computing system. The host device communicates to a neural network processor through a host interface coupled to a proxy computing system so is also coupled to the proxy computing system.)
a plurality of processing devices communicatively coupled to the proxy computing system, wherein: (Column 7, lines 45-47; column 10, lines 9-11, “One or more neural network processors 202 may be used to implement a deep neural network that may include multiple sets of convolution, activation, and pooling layers.” Neural network processors are processing devices.)
the proxy computing system receives a neural network model from the client computing system; (Column 7, lines 52-56, “Memory 212 may be configured to store executable instructions, input data (e.g., pixel data of images), and weights (e.g., the filter parameters) or other parameters of the trained neural network received from, for example, a host device.” Receiving the instructions, input data, weights, and other parameters of a neural network is equivalent to receiving the neural network.)
the proxy computing system assesses system resource availability on the processing devices; (Column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Maintaining a list of available hardware resources is assessing system resource availability.)
the proxy computing system selects a subset of available processing devices based on the system resource availability; (Column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Assigning operations based on functions and usage of the hardware resources is selecting a subset of available processing devices based on the system resource availability.)
the proxy computing system loads the neural network model into each processing device in the subset; (Column 8, lines 14-20, “Host interface 214 may enable communications between the host device and neural network processor 202. For example, host interface 214 may be configured to transmit the memory descriptors including the memory addresses of the stored data (e.g., input data, weights, results of computations, etc.) between the host device and neural network processor 202.”; column 10, lines 9-23, “One or more neural network processors 202 may be used to implement a deep neural network that may include multiple sets of convolution, activation, and pooling layers. For example, a neural network processor 202 may first receive input data and instructions for implementing a first set of convolution, activation, and/or pooling layers. The input data may include the network parameters for the first set of network layers, such as the number of nodes, the weights, or the parameters of the filters, etc. The input data may also include the external input data to be processed by the neural network or intermediate output data from previous layers of the neural network. The instructions may include instructions for computing engine 224, activation engine 228a, and/or pooling engine 228b.” The input data and instructions received includes parameters of the neural network so is considered to include the neural network. Thus, transmitting input data and having it be received by each neural processor is loading the neural network model into each processing device.)
the proxy computing system receives an inference request from the client computing system; (Column 8, lines 14-20, “Host interface 214 may enable communications between the host device and neural network processor 202. For example, host interface 214 may be configured to transmit the memory descriptors including the memory addresses of the stored data (e.g., input data, weights, results of computations, etc.) between the host device and neural network processor 202.”; column 10, lines 9-23, “One or more neural network processors 202 may be used to implement a deep neural network that may include multiple sets of convolution, activation, and pooling layers. For example, a neural network processor 202 may first receive input data and instructions for implementing a first set of convolution, activation, and/or pooling layers. The input data may include the network parameters for the first set of network layers, such as the number of nodes, the weights, or the parameters of the filters, etc. The input data may also include the external input data to be processed by the neural network or intermediate output data from previous layers of the neural network. The instructions may include instructions for computing engine 224, activation engine 228a, and/or pooling engine 228b.” the input data and instructions received also includes data to be processed by a neural network which is interpreted as receiving an inference request. The input data and instructions are being transmitted through memory which is part of a proxy computing system. Thus, the proxy computing system receives the inference request.)
the proxy computing system accesses a load state of each processing device in the subset; (Column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Accessing a load state is interpreted to mean recording or viewing system resource usage or available hardware resources. Thus, maintaining a list of available hardware resources includes accessing a load state of each processing device.)
the proxy computing system selects a target processing device from the subset based on the load states; (Column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Selecting a subset of available processing devices based on the load states includes assigning operations of a neural network based on usage of hardware resources.)
the proxy computing system transmits the inference request to the target processing device; and (Column 8, lines 14-20, “Host interface 214 may enable communications between the host device and neural network processor 202. For example, host interface 214 may be configured to transmit the memory descriptors including the memory addresses of the stored data (e.g., input data, weights, results of computations, etc.) between the host device and neural network processor 202.”; column 10, lines 9-23, “One or more neural network processors 202 may be used to implement a deep neural network that may include multiple sets of convolution, activation, and pooling layers. For example, a neural network processor 202 may first receive input data and instructions for implementing a first set of convolution, activation, and/or pooling layers. The input data may include the network parameters for the first set of network layers, such as the number of nodes, the weights, or the parameters of the filters, etc. The input data may also include the external input data to be processed by the neural network or intermediate output data from previous layers of the neural network. The instructions may include instructions for computing engine 224, activation engine 228a, and/or pooling engine 228b.” Input data and instructions received by the processing device includes data to be processed is being transmitted between a host device and a neural network processor through memory. Memory is considered part of a proxy computing system. The data and instructions to be processed is considered an inference request. Thus, an inference request is being transmitting to a processing device through the proxy computing system.)
the target processing device executes the inference request based on the neural network model. (Column 7, lines 31-34, “...the CNN may go through the forward propagation step and output a probability for each class using the trained weights and parameters, which may be referred to as an inference...” A CNN going through a forward propagation step is executing an inference request based on a neural network model.)
Regarding claim 12, Diamant teaches all of the material disclosed in claim 11, and additionally teaches:
the target processing device generates an inference result based on the execution; (Column 7, lines 31-34, “...the CNN may go through the forward propagation step and output a probability for each class using the trained weights and parameters, which may be referred to as an inference...” Generating an inference result includes outputting a probability.)
the target processing device transmits the inference result to the proxy computing system; and (Column 8, lines 9-14, “Neural network processor 202 may also store the results of computations (e.g., one or more image recognition decisions or intermediary data) at memory 212, and provide the memory addresses for the stored results to the host device.” A neural network processor storing the results of computations at memory is a processing device transmitting the inference result to a proxy system.)
the proxy computing system transmits the inference result to the client computing system. (Column 8, lines 9-14, “Neural network processor 202 may also store the results of computations (e.g., one or more image recognition decisions or intermediary data) at memory 212, and provide the memory addresses for the stored results to the host device.” Providing memory addresses for stored results to a host device is transmitting the results to the client computing system.)
Regarding claim 13, Diamant teaches all of the material disclosed in claim 12, and additionally teaches:
The inference result is an output tensor. (Column 5, lines 6-17, “The convolution operations in a CNN may be used to extract features from the input image. The convolution operations may preserve the spatial relationship between pixels by extracting image features using small regions of the input image. In a convolution, a matrix (referred to as a filter, a kernel, or a feature detector) may slide over the input image (or a feature map) at a certain step size (referred to as the stride). For every position (or step), element-wise multiplications between the filter matrix and the overlapped matrix in the input image may be calculated and summed to get a final value that represents a single element of an output matrix” An inference result is an output of the CNN. A tensor includes a matrix, so an output tensor includes an output matrix.)
Regarding claim 14, Diamant teaches all of the material disclosed in claim 11, and additionally teaches:
the neural network is a convolutional neural network or a neural network comprised of one or more linear algebra operators. (Column 3, lines 12-17, “Techniques disclosed herein may be used to debug any neural network or any other computing system that may include multiple processing engines or may perform a large number of calculations before yielding a final result, such as a convolutional neural network (also referred to as ConvNets or CNNs).”)
Regarding claim 15, Diamant teaches all of the material disclosed in claim 11, and additionally teaches:
a processing device in the plurality of processing devices is communicatively coupled to the proxy computing system via a processing device interface, and wherein the proxy computing system automatically determines and negotiates the type of the processing device interface. (Fig. 2; column 8, lines 21-24, “Host interface 214 may include, for example, a peripheral component interconnect express (PCIe) interface or any suitable interface for communicating with the host device.” Fig. 2 shows the interconnect 218 connecting to both a host interface and a neural network processor. Thus, a PCIe interface or any other interface that is determined to be suitable for communicating with the host device is also an interface for the neural network processor. In addition, the interconnect is communicatively coupled to the memory, which is a part of the proxy computing system; column 10, lines 45-55, the compiler synchronizes conditions between different hardware resources, which could include the interface and processing device.)
Regarding claim 16, Diamant teaches all of the material disclosed in claim 15, and additionally teaches:
the processing device interface is any of a PCIe bus interface, a USB interface, or an IPC interface. (Column 8, lines 21-24, a PCIe Column interface is specifically mentioned as an interface.)
Regarding claim 17, Diamant teaches all of the material disclosed in claim 11, and additionally teaches:
The inference request includes an input tensor. (Column 3, lines 54-61, “An object 110 to be classified, such as an input image, may be represented by a matrix of pixel values. The input image may include multiple channels, each channel representing a certain component of the image. For example, an image from a digital camera may have a red channel, a green channel, and a blue channel. Each channel may be represented by a 2-D matrix of pixels having pixel values in the range of, for example, 0 to 255 (i.e., 8-bit).” A tensor includes a matrix, so an input tensor includes a matrix of pixel values. An object to be classified is an inference request.)
Regarding claim 18, Diamant teaches all of the material disclosed in claim 17, and additionally teaches:
the input tensor is an image generated by an image sensor. (Column 3, lines 54-61, “An object 110 to be classified, such as an input image, may be represented by a matrix of pixel values. The input image may include multiple channels, each channel representing a certain component of the image. For example, an image from a digital camera may have a red channel, a green channel, and a blue channel. Each channel may be represented by a 2-D matrix of pixels having pixel values in the range of, for example, 0 to 255 (i.e., 8-bit).” A digital camera is an image sensor.)
Regarding claim 20, Diamant teaches all of the material disclosed in claim 11, and additionally teaches:
the proxy computing system assigns a model ID to the neural network model. (Column 20, lines 55-61, “In some embodiments, the notification packet may also include an identification of the instruction, and an identification of the processing engine that executes the instruction.”; column 17, lines 25-30, “According to certain embodiments, a notification 812 may be generated when state machine 800 leaves idle state 810, for example, when a new instruction is read from the instruction buffer by an instruction decoder. The notification may be generated by a debugging circuit in a control unit, such as control unit 305.” A notification packet including an identification an instruction, and of a processing engine that executes the instruction is considered an assignment of a model ID. The notification is generated by the control unit, which is a part of the proxy computing system.)
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Diamant as applied to claims 9 and 19 above, and further in view of Electronic device for processing Neural Network model and method of operating the same by Lee et al.(US 20230072337 A1), hereafter Lee.
Regarding claim 9, Diamant teaches all of the material disclosed in claim 1, and additionally teaches:
Allocation of memory for storing information concerning hardware resources ((Diamant) Column 10, lines 50-55, “The compiler may manage the allocation of different operations of the neural network to various hardware resources (e.g., processing engines), the allocation of memory for storing neural network parameters and intermediate data, and the timing and synchronization conditions between the various hardware resources.”), and selecting a subset of processing devices based on hardware resource availability. ((Diamant) Column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Selecting a subset of available processing devices based on system resource availability includes assigning operations of a neural network based on usage of hardware resources.).
However, Diamant does not explicitly teach:
selecting the subset based on analyzing a processing unit memory state of each of the plurality of processing devices.
Lee teaches:
Identifying available bandwidth of memory through a resource management unit included in a processor ((Lee) Paragraph [0054], “In operation 203, according to various embodiments, the electronic device 101 (e.g., the processor 120 of FIG. 1) may identify an available bandwidth of the memory 130 through a resource management unit included in the processor 120.” Analyzing a processing unit memory state of a processing device is interpreted as the same as identifying available bandwidth of memory in a processor) and failing to identify available bandwidth may result in not accurately calculating the required bandwidth to process a neural network. ((Lee) Paragraph [0007], “If failing to identify an available bandwidth of a memory in real time, a processor of an electronic device may not accurately calculate the memory bandwidth required to process the neural network model”)
Diamant and Lee are both in the same analogous field of invention as both are inventions dealing with resource management systems involving neural networks.
Thus, it would be obvious to a person of ordinary skill in the art to have improved the selection based on hardware resource usage, taught by Diamant, with identifying available bandwidth of memory of a processor, taught by Lee, for each processor, in order to make sure that all processors used are able to process a neural network. This would result in the predictable outcome of selecting a subset of processing devices based on analyzing a processing unit memory state of each of a plurality of processing devices, which is what is taught by the instant application.
Regarding claim 19, Diamant teaches all of the material disclosed in claim 11, and additionally teaches:
Allocation of memory for storing information concerning hardware resources ((Diamant) Column 10, lines 50-55, “The compiler may manage the allocation of different operations of the neural network to various hardware resources (e.g., processing engines), the allocation of memory for storing neural network parameters and intermediate data, and the timing and synchronization conditions between the various hardware resources.”), and selecting a subset of processing devices based on hardware resource availability. ((Diamant) Column 10, lines 58-63, “…the compiler may maintain a list of available hardware resources and the functions and usage of the hardware resources of the neural network, and assign operations of the neural network to appropriate hardware resources based on the functions and usage of the hardware resources.” Selecting a subset of available processing devices based on system resource availability includes assigning operations of a neural network based on usage of hardware resources.).
However, Diamant does not explicitly teach:
the subset is selected based on analyzing a processing unit memory state of each of the plurality of processing devices.
Lee teaches:
Identifying available bandwidth of memory through a resource management unit included in a processor ((Lee) Paragraph [0054], “In operation 203, according to various embodiments, the electronic device 101 (e.g., the processor 120 of FIG. 1) may identify an available bandwidth of the memory 130 through a resource management unit included in the processor 120.” Analyzing a processing unit memory state of a processing device is interpreted as the same as identifying available bandwidth of memory in a processor) and failing to identify available bandwidth may result in not accurately calculating the required bandwidth to process a neural network. ((Lee) Paragraph [0007], “If failing to identify an available bandwidth of a memory in real time, a processor of an electronic device may not accurately calculate the memory bandwidth required to process the neural network model”)
Diamant and Lee are both in the same analogous field of invention as both are inventions dealing with resource management systems involving neural networks.
Thus, it would be obvious to a person of ordinary skill in the art to have improved the selection based on hardware resource usage, taught by Diamant, with identifying available bandwidth of memory of a processor, taught by Lee, for each processor, in order to make sure that all processors used are able to process a neural network. This would result in the predictable outcome of where the subset is selected based on analyzing a processing unit memory state of each of a plurality of processing devices, which is what is taught by the instant application.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Patents and/or related publications are cited in the Notice of References Cited (Form PTO-892) attached to this action to further show the state of the art with respect to allocation of resources, proxy systems, neural networks, image processing, and multiprocessing architectures.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DYLAN H LAI whose telephone number is (571)272-8628. The examiner can normally be reached Monday - Friday 7:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 5712524241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
DYLAN H. LAI
Examiner
Art Unit 2144
/TAMARA T KYLE/Supervisory Patent Examiner, Art Unit 2144