Office Action Analysis: 18318841 — PROXY SYSTEMS AND METHODS FOR MULTIPROCESSING ARCHITECTURES

Office Action

§102
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant’s claim for priority to U.S. Provisional Patent
Application No. 63343014 filed on 05/17/2022.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by US 20220413899 A1 by Ranganathan, hereafter Ranganathan.

Regarding claim 1, Ranganathan teaches:
A method comprising a proxy computing system: (Fig. 4C, Proxy 425 and Accelerator Integration 436 are a proxy computing system)
receiving a neural network model from a client computing system (Fig. 4C, Core 460A-460D and the System Memory 441 are a client computing system); (Fig. 4C; Paragraph [0121], “The illustrated processor 407 includes a plurality of cores 460A-460D, each with a translation lookaside buffer 461A-461D and one or more caches 462A-462D. The cores may include various other components for executing instructions and processing data which are not illustrated to avoid obscuring the underlying principles of the components described herein (e.g., instruction fetch units, branch prediction units, decoders, execution units, reorder buffers, etc.).”; Paragraph [0095], “The tensor cores 371 may include a plurality of execution units specifically designed to perform matrix operations, which are the core compute operation used to perform deep learning operations. For example, simultaneous matrix multiplication operations may be used for neural network training and inferencing.” Tensor cores include neural networks. Fig. 4C. shows the cores sending data and instructions, which can include the neural network, to the proxy computing system.)
assessing system resource availability (Paragraph [0127], “Optionally, a virtualized graphics execution environment is provided in which the resources of the graphics processing engines 431-432, N are shared with multiple applications, virtual machines (VMs), or containers. The resources may be subdivided into “slices” which are allocated to different VMs and/or applications based on the processing requirements and priorities associated with the VMs and/or applications.” Sharing graphics processing engine resources is sharing and assessing system resource availability.) on a plurality of processing devices communicatively coupled to the proxy computing system, (Fig. 4C. Graphics Processing Engines 431, 432, N are processing devices and are coupled to the proxy computing system by INTF 435, INTF 437 and link 440) wherein the processing devices are external to and separate from the proxy computing system, with each processing device being a standalone computing unit; (Paragraph [0124], “The graphics processing engines 431, 432, N may each comprise a separate graphics processing unit (GPU).” A graphics processing unit is a processing device)
selecting a subset of available processing devices based on the system resource availability; (Paragraph [0156], “A selection between GPU bias and host processor bias may be driven by a bias tracker data structure. A bias table may be used, for example, which may be a page-granular structure (i.e., controlled at the granularity of a memory page) that includes 1 or 2 bits per GPU-attached memory page.” A bias selection is selecting a subset of processing devices.)
loading the neural network model into each processing device in the subset; (Fig. 6; Paragraph 0176], “The machine learning framework 604 can process input data received from the machine learning application 602 and generate the appropriate input to a compute framework 606.”; Paragraph [0069], “The processing cluster 214 can be configured to execute many threads in parallel, where the term “thread” refers to an instance of a particular program executing on a particular set of input data.”; Paragraph [0072], “A thread group executes the same program on different input data. Each thread within a thread group can be assigned to a different processing engine within a graphics multiprocessor 234.”; The program can be a machine learning framework which can include neural network models. The processing engines are the separate graphics processing engines so are the processing devices. Assigning the thread which is the program which is a neural network model is loading the neural network model.)
receiving an inference request from the client computing system; (Fig. 4D; Paragraph [0139], “A work descriptor (WD) 484 contained in the process element 483 can be a single job requested by an application or may contain a pointer to a queue of jobs.”; Paragraph [0097], “Different precision modes may be specified for the tensor cores 371 to ensure that an efficient precision is used for different workloads (e.g., such as inferencing workloads which can tolerate quantization to bytes and half-bytes).” A workload can be an inference so a work descriptor is an inference request. Process element 483 is in the system memory, which is part of the client computing system, and sends the WD to the accelerator integration slice, which is part of the proxy system.)
accessing a load state of each processing device in the subset; (Paragraph [0056], “Preferably, the host software can prove workloads for scheduling on the processing cluster array 212 via one of multiple graphics processing doorbells. In other examples, polling for new workloads or interrupts can be used to identify or indicate availability of work to perform.” Polling for new workloads is accessing a load state of each processing device.)
selecting a target processing device from the subset based on the load states; (Paragraph [0056], “The workloads can then be automatically distributed across the processing cluster array 212 by the scheduler 210 logic within the scheduler microcontroller.” Distributing to a processing device is selecting and transmitting to that device.)and
transmitting the inference request to the target processing device. (Paragraph [0140],  “For example, the technologies described herein may include an infrastructure for setting up the process state and sending a WD 484 to a graphics acceleration module 446 to start a job in a virtualized environment.”)

Regarding claim 2, Ranganathan teaches the material disclosed in claim 1, and additionally teaches:
the proxy computing system:
receiving (Fig. 4D; Paragraph [0142], “When performing graphics operations, an effective address 493 generated by a graphics processing engine 431-432, N is translated to a real address by the MMU 439.” The MMU 439 is part of the proxy computing system) an inference result generated by the target processing device after executing the inference request based on the neural network model; (Fig. 4C; Paragraph [0202], “The trained neural network 1108 can then be deployed to implement any number of machine learning operations to generate an inference result 1114 based on input of new data 1112.”) and
transmitting the inference result to the client computing system. (Paragraph [0155], “This arrangement allows the host processor 405 software to setup operands and access computation results” Computation results could include inference results. Processor 405 is a part of the client computing system and accessing inference results transmits them to it.)

Regarding claim 3, Ranganathan teaches the material disclosed in claim 2, and additionally teaches:
wherein the inference result is an output tensor. (Paragraph [0300], “In such embodiment, operations can be performed on non-zero input values and the resulting non-zero output values can be mapped into an output matrix.” A matrix is a tensor.)

Regarding claim 4, Ranganathan teaches the material disclosed in claim 1, and additionally teaches:
wherein the neural network is a convolutional neural network or a neural network comprised of one or more linear algebra operators. (Paragraph [0185], “A second example type of neural network is the Convolutional Neural Network (CNN).”)

Regarding claim 5, Ranganathan teaches the material disclosed in claim 1, and additionally teaches:
automatically determining and negotiating a type of processing device interface associated with a processing device. (Paragraph [0123], “In particular, an interface 435 provides connectivity to the proxy circuit 425 over high-speed link 440 (e.g., a PCIe bus, NVLink, etc.) and an interface 437 connects the graphics acceleration module 446 to the high-speed link 440.”)

Regarding claim 6, Ranganathan teaches the material disclosed in claim 5, and additionally teaches:
wherein the processing device interface is any of a PCIe bus interface, a USB interface, or an IPC interface. (Paragraph [0123], “In particular, an interface 435 provides connectivity to the proxy circuit 425 over high-speed link 440 (e.g., a PCIe bus, NVLink, etc.) and an interface 437 connects the graphics acceleration module 446 to the high-speed link 440.”)

Regarding claim 7, Ranganathan teaches the material disclosed in claim 1, and additionally teaches:
wherein the inference request (Paragraph [0097], “Different precision modes may be specified for the tensor cores 371 to ensure that an efficient precision is used for different workloads (e.g., such as inferencing workloads which can tolerate quantization to bytes and half-bytes)” Tensor cores include inference requests.) includes an input tensor. (Paragraph [0098], “The tensor cores 371 include support for sparse input matrices” Matrices are tensors.)

Regarding claim 8, Ranganathan teaches the material disclosed in claim 7, and additionally teaches:
wherein the input tensor is an image (Paragraph [0191], “As shown in FIG. 9A, an example CNN used to model image processing can receive input 902 describing the red, green, and blue (RGB) components of an input image.”) generated by an image sensor. (Paragraph [0384], “Camera and microphone arrays of computer device 2700 may be employed to observe gestures, record audio and video and to receive and transmit visual and audio commands.”)

Regarding claim 9, Ranganathan teaches the material disclosed in claim 1, and additionally teaches:
the proxy computing system selecting the subset based on analyzing a processing unit memory state of each of the plurality of processing devices. (Fig. 4F; Paragraph [0156], “A selection between GPU bias and host processor bias may be driven by a bias tracker data structure. A bias table may be used, for example, which may be a page-granular structure (i.e., controlled at the granularity of a memory page) that includes 1 or 2 bits per GPU-attached memory page.” A bias selection is selecting a subset of processing devices. A bias table is in memory so accessing it is analyzing processing unit memory state for each device.)

Regarding claim 10, Ranganathan teaches the material disclosed in claim 1, and additionally teaches:
the proxy computing system assigning a model ID to the neural network model. (Table 3; Paragraph [0150], “The operating system 495 then calls the hypervisor 496 with the information shown in Table 3.”; Fig. 6; Paragraph 0176], “The machine learning framework 604 can process input data received from the machine learning application 602 and generate the appropriate input to a compute framework 606.”; Paragraph [0069], “The processing cluster 214 can be configured to execute many threads in parallel, where the term “thread” refers to an instance of a particular program executing on a particular set of input data.”; The program can be a machine learning framework which can include neural network models. As the thread is a program which is a neural network model, a thread ID for the thread is also an ID for the model. So, the thread ID in Table 3 can be a model ID assigned to a neural network model.)

Regarding claim 11, Ranganathan teaches:
An apparatus comprising:
a proxy computing system; (Fig. 4C, Proxy 425 and Accelerator Integration 436 are a proxy computing system)
a client computing system communicatively coupled to the proxy computing system; Fig. 4C, Core 460A-460D and the System Memory 441 are a client computing system) and
a plurality of processing devices communicatively coupled to the proxy computing system, (Fig. 4C. Graphics Processing Engines 431, 432, N are processing devices and are coupled to the proxy computing system by INTF 435, INTF 437 and link 440) wherein the processing devices are external to and separate from the proxy computing system, with each processing device being a standalone computing unit, (Paragraph [0124], “The graphics processing engines 431, 432, N may each comprise a separate graphics processing unit (GPU).” A graphics processing unit is a processing device) and wherein:
the proxy computing system receives a neural network model from the client computing system; (Fig. 4C; Paragraph [0121], “The illustrated processor 407 includes a plurality of cores 460A-460D, each with a translation lookaside buffer 461A-461D and one or more caches 462A-462D. The cores may include various other components for executing instructions and processing data which are not illustrated to avoid obscuring the underlying principles of the components described herein (e.g., instruction fetch units, branch prediction units, decoders, execution units, reorder buffers, etc.).”; Paragraph [0095], “The tensor cores 371 may include a plurality of execution units specifically designed to perform matrix operations, which are the core compute operation used to perform deep learning operations. For example, simultaneous matrix multiplication operations may be used for neural network training and inferencing.” Tensor cores include neural networks. Fig. 4C. shows the cores sending data and instructions, which can include the neural network, to the proxy computing system.)
the proxy computing system assesses system resource availability on the processing devices; (Paragraph [0127], “Optionally, a virtualized graphics execution environment is provided in which the resources of the graphics processing engines 431-432, N are shared with multiple applications, virtual machines (VMs), or containers. The resources may be subdivided into “slices” which are allocated to different VMs and/or applications based on the processing requirements and priorities associated with the VMs and/or applications.” Sharing graphics processing engine resources is sharing and assessing system resource availability.)
the proxy computing system selects a subset of available processing devices based on the system resource availability; (Paragraph [0156], “A selection between GPU bias and host processor bias may be driven by a bias tracker data structure. A bias table may be used, for example, which may be a page-granular structure (i.e., controlled at the granularity of a memory page) that includes 1 or 2 bits per GPU-attached memory page.” A bias selection is selecting a subset of processing devices.)
the proxy computing system loads the neural network model into each processing device in the subset; (Fig. 6; Paragraph 0176], “The machine learning framework 604 can process input data received from the machine learning application 602 and generate the appropriate input to a compute framework 606.”; Paragraph [0069], “The processing cluster 214 can be configured to execute many threads in parallel, where the term “thread” refers to an instance of a particular program executing on a particular set of input data.”; Paragraph [0072], “A thread group executes the same program on different input data. Each thread within a thread group can be assigned to a different processing engine within a graphics multiprocessor 234.”; The program can be a machine learning framework which can include neural network models. The processing engines are the separate graphics processing engines so are the processing devices. Assigning the thread which is the program which is a neural network model is loading the neural network model.)
the proxy computing system receives an inference request from the client computing system; (Fig. 4D; Paragraph [0139], “A work descriptor (WD) 484 contained in the process element 483 can be a single job requested by an application or may contain a pointer to a queue of jobs.”; Paragraph [0097], “Different precision modes may be specified for the tensor cores 371 to ensure that an efficient precision is used for different workloads (e.g., such as inferencing workloads which can tolerate quantization to bytes and half-bytes).” A workload can be an inference so a work descriptor is an inference request. Process element 483 is in the system memory, which is part of the client computing system, and sends the WD to the accelerator integration slice, which is part of the proxy system.)
the proxy computing system accesses a load state of each processing device in the subset; (Paragraph [0056], “Preferably, the host software can prove workloads for scheduling on the processing cluster array 212 via one of multiple graphics processing doorbells. In other examples, polling for new workloads or interrupts can be used to identify or indicate availability of work to perform.” Polling for new workloads is accessing a load state of each processing device.)
the proxy computing system selects a target processing device from the subset based on the load states; (Paragraph [0056], “The workloads can then be automatically distributed across the processing cluster array 212 by the scheduler 210 logic within the scheduler microcontroller.” Distributing to a processing device is selecting and transmitting to that device.)
the proxy computing system transmits the inference request to the target processing device; (Paragraph [0140],  “For example, the technologies described herein may include an infrastructure for setting up the process state and sending a WD 484 to a graphics acceleration module 446 to start a job in a virtualized environment.”) and
the target processing device executes the inference request based on the neural network model. (Paragraph [0126], “A set of registers 445 store context data for threads executed by the graphics processing engines 431-432, N and a context management circuit 448 manages the thread contexts.” Threads executed by the graphics processing engines could be inference requests executed based on the neural network provided to the graphics processing engines(processing devices))

Regarding claim 12, Ranganathan teaches the material disclosed in claim 11, and additionally teaches:
wherein:
the target processing device generates an inference result based on the execution; (Fig. 4C; Paragraph [0202], “The trained neural network 1108 can then be deployed to implement any number of machine learning operations to generate an inference result 1114 based on input of new data 1112.”)
the target processing device transmits the inference result to the proxy computing system; (Fig. 4D; Paragraph [0142], “When performing graphics operations, an effective address 493 generated by a graphics processing engine 431-432, N is translated to a real address by the MMU 439.” The MMU 439 is part of the proxy computing system)
and the proxy computing system transmits the inference result to the client computing system. (Paragraph [0155], “This arrangement allows the host processor 405 software to setup operands and access computation results” Computation results could include inference results. Processor 405 is a part of the client computing system and accessing inference results transmits them to it.) 

Regarding claim 13, Ranganathan teaches the material disclosed in claim 12, and additionally teaches:
wherein the inference result is an output tensor. (Paragraph [0300], “In such embodiment, operations can be performed on non-zero input values and the resulting non-zero output values can be mapped into an output matrix.” A matrix is a tensor.)

Regarding claim 14, Ranganathan teaches the material disclosed in claim 11, and additionally teaches:
wherein the neural network is a convolutional neural network or a neural network comprised of one or more linear algebra operators. (Paragraph [0185], “A second example type of neural network is the Convolutional Neural Network (CNN).”)

Regarding claim 15, Ranganathan teaches the material disclosed in claim 11, and additionally teaches:
wherein a processing device in the plurality of processing devices is communicatively coupled to the proxy computing system via a processing device interface, and wherein the proxy computing system automatically determines and negotiates the type of the processing device interface. (Paragraph [0123], “In particular, an interface 435 provides connectivity to the proxy circuit 425 over high-speed link 440 (e.g., a PCIe bus, NVLink, etc.) and an interface 437 connects the graphics acceleration module 446 to the high-speed link 440.”)

Regarding claim 16, Ranganathan teaches the material disclosed in claim 15, and additionally teaches:
wherein the processing device interface is any of a PCIe bus interface, a USB interface, or an IPC interface. (Paragraph [0123], “In particular, an interface 435 provides connectivity to the proxy circuit 425 over high-speed link 440 (e.g., a PCIe bus, NVLink, etc.) and an interface 437 connects the graphics acceleration module 446 to the high-speed link 440.”)

Regarding claim 17, Ranganathan teaches the material disclosed in claim 11, and additionally teaches:
wherein the inference request (Paragraph [0097], “Different precision modes may be specified for the tensor cores 371 to ensure that an efficient precision is used for different workloads (e.g., such as inferencing workloads which can tolerate quantization to bytes and half-bytes)” Tensor cores include inference requests.) includes an input tensor. (Paragraph [0098], “The tensor cores 371 include support for sparse input matrices” Matrices are tensors.)

Regarding claim 18, Ranganathan teaches the material disclosed in claim 17, and additionally teaches:
wherein the input tensor is an image (Paragraph [0191], “As shown in FIG. 9A, an example CNN used to model image processing can receive input 902 describing the red, green, and blue (RGB) components of an input image.”) generated by an image sensor. (Paragraph [0384], “Camera and microphone arrays of computer device 2700 may be employed to observe gestures, record audio and video and to receive and transmit visual and audio commands.”)

Regarding claim 19, Ranganathan teaches the material disclosed in claim 11, and additionally teaches:
wherein the subset is selected based on analyzing a processing unit memory state of each of the plurality of processing devices. (Fig. 4F; Paragraph [0156], “A selection between GPU bias and host processor bias may be driven by a bias tracker data structure. A bias table may be used, for example, which may be a page-granular structure (i.e., controlled at the granularity of a memory page) that includes 1 or 2 bits per GPU-attached memory page.” A bias selection is selecting a subset of processing devices. A bias table is in memory so accessing it is analyzing processing unit memory state for each device.)

Regarding claim 20, Ranganathan teaches the material disclosed in claim 11, and additionally teaches:
Wherein the proxy computing system assigning a model ID to the neural network model. (Table 3; Paragraph [0150], “The operating system 495 then calls the hypervisor 496 with the information shown in Table 3.”; Fig. 6; Paragraph 0176], “The machine learning framework 604 can process input data received from the machine learning application 602 and generate the appropriate input to a compute framework 606.”; Paragraph [0069], “The processing cluster 214 can be configured to execute many threads in parallel, where the term “thread” refers to an instance of a particular program executing on a particular set of input data.”; The program can be a machine learning framework which can include neural network models. As the thread is a program which is a neural network model, a thread ID for the thread is also an ID for the model. So, the thread ID in Table 3 can be a model ID assigned to a neural network model.)

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Patents and/or related publications are cited in the Notice of References Cited (Form PTO-892) attached to this action to further show the state of the art with respect to proxy systems and neural networks. Any inquiry concerning this communication or earlier communications from the examiner should be directed to DYLAN H LAI whose telephone number is (571)272-8628. The examiner can normally be reached Monday - Friday 7:30am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 5712524241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

D. H. L.
Examiner
Art Unit 2144



/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
PROXY SYSTEMS AND METHODS FOR MULTIPROCESSING ARCHITECTURES

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

PROXY SYSTEMS AND METHODS FOR MULTIPROCESSING ARCHITECTURES

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email