Prosecution Insights
Last updated: April 19, 2026
Application No. 18/702,517

DEPLOYMENT OF MACHINE LEARNED MODELS TO PLURALITY OF DEVICES

Non-Final OA §101§103
Filed
Apr 18, 2024
Examiner
DARWISH, AMIR ELSAYED
Art Unit
2187
Tech Center
2100 — Computer Architecture & Software
Assignee
Scailable B V
OA Round
1 (Non-Final)
60%
Grant Probability
Moderate
1-2
OA Rounds
4y 0m
To Grant
99%
With Interview

Examiner Intelligence

Grants 60% of resolved cases
60%
Career Allow Rate
3 granted / 5 resolved
+5.0% vs TC avg
Strong +67% interview lift
Without
With
+66.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
37 currently pending
Career history
42
Total Applications
across all art units

Statute-Specific Performance

§101
34.9%
-5.1% vs TC avg
§103
44.0%
+4.0% vs TC avg
§102
7.3%
-32.7% vs TC avg
§112
6.2%
-33.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 5 resolved cases

Office Action

§101 §103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Priority Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed Application No. EP21204421.8 filed on 10/25/2021. Information Disclosure Statement The information disclosure statement (IDS) submitted on 04/18/2024 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner. Examiner’s Note (EN) The prior art rejections below cite particular paragraphs, columns, and/or line numbers in the references for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 14 and 15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because the claims are directed to a signal per se in that it covers both transitory or non-transitory computer readable medium. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-3, 6, 10-11, and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over SUI et al. (CN-110766147-A) in view of Kelur et al. (US20230083345A1) Regarding Claim 1, Sui teaches A computer-implemented method for enabling deployment of a machine learned model to a plurality of different types of devices, comprising: providing a representation of the machine learned model in form of a computational graph, wherein nodes of the computational graph define operations of the machine learned model and edges of the computational graph define data relations between the operations ([0005] "The neural network algorithms from different deep learning frameworks at the front end are transformed into a general computation graph. The computation graph is then optimized and reconstructed, and finally mapped into instructions and machine code that can be executed by the back-end hardware platform. This completes the compilation of the algorithm for the hardware platform." [0012] "The first intermediate representation can be generated as a node representation feature graph, and the edge representation is a computation graph of computational operations. The attributes of the node may include at least one of the following: the dimension information and width and height channel information of the feature map; the computational operation represented by the edge may include at least one of the following: convolution, pooling, dimension transformation, eltwise addition, deconvolution, rearrangement, nonlinearity, batch normalization, scaling; the attributes of the edge may include the parameters of the computational operation, and may include at least one of the following: convolution kernel size, padding, stride, grouping, dilation. The first IR constructed using the special structure of this invention is beneficial for subsequent memory optimization, and is especially suitable for neural network parallel computing platforms where the clock cycle required for data access is much longer than that required for data execution." [0011] and [0089-0090] "Similar to Figure 4, the compiler architecture 500 may also include a computation graph construction module 510, a computation graph optimization module 520, and an instruction generation module 530. Furthermore, the computation graph construction module 510 may include a model file parsing module 511 and a computation graph generation module 512. The model file parsing module 511 may include parsing sub-modules, each corresponding to a type of model file, and each parsing sub-module is used to parse the model file of the corresponding type. As shown in the figure, the model file parsing module 511 may include a Caffe parser and a TensorFlow parser, which are used to parse model files obtained via the Caffe and TensorFlow deep learning frameworks, respectively. The model file parsing module 511 may also include parsers for other deep learning frameworks (shown as "others" in the figure). In subsequent applications, if support for new deep learning frameworks is required, only a parser for that model needs to be added. Most of the subsequent optimization methods will be framework-independent, thereby improving the scalability and compatibility of the compiler framework of this invention. The model file parsing module 511 can parse neural network models developed on different deep learning frameworks into framework-independent IRs, namely the first IR in this invention. Thus, the decoupling of the framework and compiler optimization method is fully realized at the first intermediate representation of the compiler framework, and the computation graph forms with different granularities of various deep learning frameworks are uniformly transformed into the fixed-granularity computation graph form in this invention (i.e., the first IR). The features of the Python scripting language make it easy to perform model parsing and IR conversion. Therefore, the computation graph generation module 512 can conveniently generate the first IR based on the parsing results of the corresponding parsing submodule."). converting the machine learned model to a binary intermediate representation which is executable compiling the third intermediate representation into instruction code that can be executed on the hardware platform." And [0133] “The parallel processing module 910 can be used to perform predetermined parallel computation processing on the input data and generate output data. The data storage module 920 can be used to cache the input data required by the data processing module or the intermediate data output by the data processing module. The control module 930 can control the parallel processing module and the data storage module based on the instruction code obtained according to the aforementioned compiler framework to execute the neural network calculation. In one embodiment, the specific architecture of the computing platform applying the instruction code of the present invention can be implemented by, for example, the programmable logic module shown in FIG8, wherein the parallel processing module 810 corresponds to the complex computing core that performs CNN inference operations, the data storage module 820 corresponds to the input/output buffer, and the control module 830 corresponds to the controller in the figure. In other embodiments, the computing platform that applies the instruction code of the present invention can be a dedicated AI chip, such as a deeply customized ASIC chip.”) providing a library of templates defining functions in a programming language which can be compiled to the binary intermediate representation, wherein each function represents an implementation of a possible operation defined by the computational graph ([0015] "Preferably, the computation graph optimization module is further configured to set a subgraph template capable of layer fusion, obtain at least one subgraph matching scheme for the computation graph of the first intermediate representation, and reconstruct the computation graph into a fused second intermediate representation based on the subgraph matching scheme. The subgraph template can be obtained based on the aforementioned hardware platform." [0009] provides a summary of how the compilation process works including the results of the optimization module described here and producing the binary. [0128] illustrates the functions representing operations as per the computational graph, which is what the optimizer takes as an input. "The scheduling optimization in step S730 may include: performing scheduling optimization based on the hardware platform to determine at least one of the following: a block execution scheme for feature maps and/or weights; and instruction dependencies between execution instructions. The third intermediate representation can be expressed by a language that writes each computational operation as a multi-loop representation. Accordingly, step S740 may include: mapping the block execution scheme and/or the instruction dependencies to the instruction code via the third intermediate representation"). parsing the computational graph to identify the operations of the machine learned model and the data relations between the operations ([0109-0111] "Therefore, the optimization of the instruction generation module 530 for the third IR is a scheduling optimization based on the hardware platform, in order to determine the block execution scheme for the feature map and/or weights, and/or the instruction dependencies between execution instructions. Preferably, the third IR is represented by a language that writes each computational operation as a multiple loop. By using an IR representation similar to Halide, each computational operation can be written as a multi-loop, so that during scheduling optimization, the blocks that allow the multi-loop to consume the least amount of memory and achieve the best computational efficiency can be determined. The above block division needs to fully consider the impact of on-chip computing resources, storage resources, and bandwidth, and be determined in conjunction with the parameters of the specific neural network structure. In a preferred embodiment, the third IR can be used to generate an automated scheduling strategy and achieve the same execution efficiency as handwritten instructions. Subsequently, the instruction generation module 510 maps the aforementioned block execution scheme and/or instruction dependencies to the instruction code of the hardware platform via encoding for the third IR."). dynamically generating code representing the machine learned model by including functions from the library which represent the operations of the computational graph and by generating execution logic which calls the functions in accordance with the data relations between the operations in the computational graph ([0109-0112] "Therefore, the optimization of the instruction generation module 530 for the third IR is a scheduling optimization based on the hardware platform, in order to determine the block execution scheme for the feature map and/or weights, and/or the instruction dependencies between execution instructions. Preferably, the third IR is represented by a language that writes each computational operation as a multiple loop. By using an IR representation similar to Halide, each computational operation can be written as a multi-loop, so that during scheduling optimization, the blocks that allow the multi-loop to consume the least amount of memory and achieve the best computational efficiency can be determined. The above block division needs to fully consider the impact of on-chip computing resources, storage resources, and bandwidth, and be determined in conjunction with the parameters of the specific neural network structure. In a preferred embodiment, the third IR can be used to generate an automated scheduling strategy and achieve the same execution efficiency as handwritten instructions. Subsequently, the instruction generation module 510 maps the aforementioned block execution scheme and/or instruction dependencies to the instruction code of the hardware platform via encoding for the third IR….The instruction code (also referred to as the fourth IR in this invention) is a specific instruction for the hardware platform and includes the representation method of mapping the IR of the optimal block method found during scheduling optimization to the specific instruction. The backend of this invention is mainly for FPGA and ASIC design of DPU, as well as embedded CPUs such as ARM." [0116] “For each deep learning framework, the results of floating-point operations will vary due to factors such as the order of operations and different truncation methods, because different underlying floating-point arithmetic libraries are used. Therefore, even with the same network structure, floating-point results vary greatly between different deep learning frameworks; for fixed-point operations, the above differences are smaller, but still exist. In addition, different deep learning frameworks have different computational parameters, such as the handling of pads, the order of data arrangement, the calculation method of mean pooling, and the transformation methods such as Reorg. As a module that provides standard answers, the prequel module needs to eliminate the differences between these frameworks. Alternatively, a more conservative approach could be to include the operator implementations of mainstream deep learning frameworks in the preamble module, with some modifications, to ensure that the computational operations are completely consistent with the behavior of the backend hardware, including shift operations and boundary expansion rules”). compiling the code to obtain the binary intermediate representation of the machine learned model for execution compiling" refers to the process of using a compiler to produce low-level target code for execution on a computing platform from a representation described by a high-level formal method. Since hardware computing platforms only process binary instruction codes, compilers are needed to convert the high-level language descriptions that people are familiar with into low-level binary code that computers can read. [0027] "According to another aspect of the present invention, a computing platform for a neural network is proposed, comprising: a parallel processing module for performing predetermined parallel computing processing on input data and generating output data; a data storage module for caching input data required by the data processing module or intermediate data output by the data processing module; and a control module for controlling the data processing module and the data storage module to execute the neural network computation based on instruction codes obtained according to any one of the above."). However, SUI is not relied on for: by a runtime interpreter Kelur teaches by a runtime interpreter ([0063] “In at least one embodiment, a DLA interpreter and compiler 206 comprises a model parser 208. In at least one embodiment, a model parser 208 is software instructions that, when executed, parse a model 204 input 202 to a DLA interpreter and compiler 206. In at least one embodiment, a model parser 208 parses, or breaks up into an intermediate representation (IR) to be used as input to a compiler and optimizer 210, model 204 data. In at least one embodiment, a model parser 208 reads an input 202 model 204 and generates an IR to be used by a compiler and optimizer 210 to generate an output 212.” [0092] “ In at least one embodiment, an application 604 is generated using a compiler specific to, or a compiler using libraries specific to, a parallel processing platform, such as compute uniform device architecture (CUDA) or any other parallel processing platform and/or library further described herein. In at least one embodiment, an application 604 is generated using a compiler specific to, or a compiler using libraries specific to, a processor architecture, such as a specific or general GPU architecture, a DLA architecture, or any other processor architecture further described herein. In at least one embodiment, an application 604 comprises executable code. In at least one embodiment, an application 604 comprises object code. In at least one embodiment, an application 604 comprises any other source code to be interpreted for execution using one or more processor 634, 636, 638, 640, 642 cores.” [0209] “In at least one embodiment, graphics processor 2100 receives batches of commands via ring interconnect 2102. In at least one embodiment, incoming commands are interpreted by a command streamer 2103 in pipeline front-end 2104. In at least one embodiment, graphics processor 2100 includes scalable execution logic to perform 3D geometry processing and media processing via graphics core(s) 2180A-2180N. In at least one embodiment, for 3D geometry processing commands, command streamer 2103 supplies commands to geometry pipeline 2136. In at least one embodiment, for at least some media processing commands, command streamer 2103 supplies commands to a video front end 2134, which couples with a media engine 2137” [0212] “ In at least one embodiment, an instruction prefetcher 2226 fetches instructions from memory and feeds instructions to an instruction decoder 2228 which in turn decodes or interprets instructions. For example, in at least one embodiment, instruction decoder 2228 decodes a received instruction into one or more operations called “micro-instructions” or “micro-operations” (also called “micro ops” or “uops”) for execution.”) SUI and Kelur are analogous art because they are from the same field of endeavor in code ML IR compilation and optimization for different hardware profiles. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, to combine SUI and Kelur to incorporate Kelur’s explicit interpreter and peripheral management with expected results. “At least one embodiment pertains to processing resources used to execute software instructions for a plurality of processor architectures using compute uniform device architecture (CUDA).” (Kelur, [0001]) Regarding Claim 2, Sui in view of Kelur teaches the method according to claim 1. Kelur further teaches wherein the library of templates comprises function definitions for the functions which are executable by a general-purpose central processing unit, and wherein method further comprises: for a subset of the functions for which hardware acceleration is available on one or more of the different types of devices, including function declarations in the code for hardware accelerated versions of the functions from the subset ([0294] "FIG. 34 is a more detailed illustration of compiling code to execute on one of programming platforms of FIGS. 28-31 , in accordance with at least one embodiment. In at least one embodiment, a compiler 3401 is configured to receive source code 3400, compile source code 3400, and output an executable file 3410. In at least one embodiment, source code 3400 is a single-source file, such as a .cu file, a .hip.cpp file, or a file in another format, that includes both host and device code. In at least one embodiment, compiler 3401 may be, but is not limited to, an NVIDIA CUDA compiler (“NVCC”) for compiling CUDA code in .cu files, or a HCC compiler for compiling HIP code in .hip.cpp files." And [0073] “ In at least one embodiment, a unified programming model uses a single package of software libraries to perform operations using different processors 426, 428, 430, such as a deep learning accelerator (DLA) software stack as described above in conjunction with FIG. 1 and a parallel computing library, both provided by a single package of libraries such as compute uniform device architecture (CUDA), to perform computing operations using a parallel processing unit (PPU), such as a graphics processing unit (GPU) 428 and/or a DLA 430.” EN: also see [0290-0291] [0347-0350] ). generating the execution logic to be able to switch to using a hardware accelerated version of a respective function if hardware acceleration for the function is available on a device executing the binary intermediate representation (EN: [0294] and Fig. 34 shows the generic version of the code and the accelerated version being compiled in parallel and combined into a single binary. [0105-0106] and [0107] "In at least one embodiment, once one or more kernels are performed 806, 808, 810 by one or more cores having one or more architecture types 804, each of said one or more kernels optionally synchronizes data and/or other computational results between each of said one or more cores 812 using shared pointers to memory managed by a parallel processing platform, as described above in conjunction with FIGS. 5B, 6, and 7 . In at least one embodiment, if no more kernels are to be performed in an execution graph, then said execution graph is finished 814 and a process 800 ends 816. In at least one embodiment, if additional kernels are to be performed in an execution graph, then a process 800 continues by determining which one or more processor cores having a specific architecture type 804 are to perform each subsequent kernel 806, 808, 810 in said execution graph." EN: Fig. 26A and [0300-0312] disclose the various permutations of functions optimized and non optimized running on CPUs or optimized processors. [0347-0350] Additionally discloses code that is capable of running on both optimized (device) and normal CPUs(host).). Regarding Claim 3, SUI in view of Kelur teaches the method of claim 2. Kelur further teaches wherein the method further comprises omitting, from the code, function definitions for the hardware accelerated versions of the functions from the subset ([0291-0292] " In at least one embodiment, source code 3300 may be included in a single-source file having a mixture of host code and device code, with locations of device code being indicated therein. In at least one embodiment, a single-source file may be a .cu file that includes CUDA code or a .hip.cpp file that includes HIP code. Alternatively, in at least one embodiment, source code 3300 may include multiple source code files, rather than a single-source file, into which host code and device code are separated. In at least one embodiment, compiler 3301 is configured to compile source code 3300 into host executable code 3302 for execution on a host and device executable code 3303 for execution on a device. In at least one embodiment, compiler 3301 performs operations including parsing source code 3300 into an abstract system tree (AST), performing optimizations, and generating executable code. In at least one embodiment in which source code 3300 includes a single-source file, compiler 3301 may separate device code from host code in such a single-source file, compile device code and host code into device executable code 3303 and host executable code 3302, respectively, and link device executable code 3303 and host executable code 3302 together in a single file, as discussed in greater detail below with respect to FIG. 34 ." [0269] “In at least one embodiment, hardware 2807 includes a host connected to one more devices that can be accessed to perform computational tasks via application programming interface (“API”) calls. A device within hardware 2807 may include, but is not limited to, a GPU, FPGA, AI engine, or other compute device (but may also include a CPU) and its memory, as opposed to a host within hardware 2807 that may include, but is not limited to, a CPU (but may also include a compute device) and its memory, in at least one embodiment.” EN: the device is the hardware accelerated version. Also see [0274] which describes compiling an executable version without the functions and providing the functions at runtime through an interpreter, and [0304]). Regarding Claim 6, SUI in view of Kelur teaches the method of claim 1. Kelur further teaches further comprising generating an application for a specific type of a device ([0068] "In at least one embodiment, an application 316 is executable code to be executed by a DLA runtime 314 using one or more drivers 318, 320 to interact with DLA hardware 324. In at least one embodiment, an application 316 is a loadable module 310 generated by a DLA compiler and optimizer 308 during compilation 302. In at least one embodiment, an application 316 is any other executable code generated to be executed using a DLA runtime 314 and DLA hardware 324. In at least one embodiment, a DLA runtime provides an interface 322 to facilitate interaction with one or more other software libraries to perform inferencing 312, as described above in conjunction with FIG. 1 " [0324] “In at least one embodiment, CUDA device executable code 3684 includes, without limitation, PTX code and is further compiled into binary code for a specific target device at runtime”). wherein the application includes the runtime interpreter and includes, or is configured to access, the binary intermediate representation of the machine learned model ([0068] " DLA runtime is software instructions that, when executed, load an application 316 to be executed by DLA hardware 324 using one or more drivers 318, 320, as described above in conjunction with FIG. 1 . In at least one embodiment, an application 316 is executable code to be executed by a DLA runtime 314 using one or more drivers 318, 320 to interact with DLA hardware 324. In at least one embodiment, an application 316 is a loadable module 310 generated by a DLA compiler and optimizer 308 during compilation 302. In at least one embodiment, an application 316 is any other executable code generated to be executed using a DLA runtime 314 and DLA hardware 324. In at least one embodiment, a DLA runtime provides an interface 322 to facilitate interaction with one or more other software libraries to perform inferencing 312, as described above in conjunction with FIG. 1." and [0067] " In at least one embodiment, one or more tasks to be performed during inferencing 312 comprise inferencing operations. In at least one embodiment, inferencing operations are neural network operations to compute one or more results using one or more neural networks. In at least one embodiment, neural network operations include, but are not limited to, image segmentation, classification, object identification, and/or any other neural network operation further described herein." EN: The loadable module is a compiled representation of the ML model. The DLA runtime is the interpreter. ). Regarding Claim 10, SUI in view of Kelur teaches the method of claim 6. Kelur teaches wherein the application is configured to detect and to signal the binary intermediate representation if the device comprises a hardware accelerator for executing one or more of the hardware accelerated versions of the functions from the subset ([0324] "In at least one embodiment, HIP compiler driver 3640 determines that target device 3646 is CUDA-enabled and generates HIP/NVCC compilation command 3642. In at least one embodiment, HIP compiler driver 3640 then configures CUDA compiler 3650 via HIP/NVCC compilation command 3642 to compile HIP source code 3630. In at least one embodiment, HIP compiler driver 3640 provides access to a HIP to CUDA translation header 3652 as part of configuring CUDA compiler 3650. In at least one embodiment, HIP to CUDA translation header 3652 translates any number of mechanisms (e.g., functions) specified in any number of HIP APIs to any number of mechanisms specified in any number of CUDA APIs. In at least one embodiment, CUDA compiler 3650 uses HIP to CUDA translation header 3652 in conjunction with a CUDA runtime library 3654 corresponding to CUDA runtime API 3602 to generate host executable code 3670(1) and CUDA device executable code 3684. In at least one embodiment, host executable code 3670(1) and CUDA device executable code 3684 may then be executed on, respectively, CPU 3690 and CUDA-enabled GPU 3694. In at least one embodiment, CUDA device executable code 3684 includes, without limitation, binary code. In at least one embodiment, CUDA device executable code 3684 includes, without limitation, PTX code and is further compiled into binary code for a specific target device at runtime." Also see [0299-0328] for the various permutations possible). Regarding Claim 11, SUI in view of Kelur teaches the method of claim 6. Kelur further teaches wherein the application is configured to provide the binary intermediate representation access to function definitions for the hardware accelerated versions of the functions from the subset, for example by providing the binary intermediate representation access to a library of accelerated functions ([0324] "In at least one embodiment, HIP compiler driver 3640 determines that target device 3646 is CUDA-enabled and generates HIP/NVCC compilation command 3642. In at least one embodiment, HIP compiler driver 3640 then configures CUDA compiler 3650 via HIP/NVCC compilation command 3642 to compile HIP source code 3630. In at least one embodiment, HIP compiler driver 3640 provides access to a HIP to CUDA translation header 3652 as part of configuring CUDA compiler 3650. In at least one embodiment, HIP to CUDA translation header 3652 translates any number of mechanisms (e.g., functions) specified in any number of HIP APIs to any number of mechanisms specified in any number of CUDA APIs. In at least one embodiment, CUDA compiler 3650 uses HIP to CUDA translation header 3652 in conjunction with a CUDA runtime library 3654 corresponding to CUDA runtime API 3602 to generate host executable code 3670(1) and CUDA device executable code 3684. In at least one embodiment, host executable code 3670(1) and CUDA device executable code 3684 may then be executed on, respectively, CPU 3690 and CUDA-enabled GPU 3694. In at least one embodiment, CUDA device executable code 3684 includes, without limitation, binary code. In at least one embodiment, CUDA device executable code 3684 includes, without limitation, PTX code and is further compiled into binary code for a specific target device at runtime."). Regarding Claim 14, SUI in view of Kelur teaches the method of claim 1. SUI teaches a transitory or non-transitory computer-readable medium comprising data, wherein the data comprises instructions arranged to cause a processor system to perform the computer-implemented method according to claim 1 ([0029] “According to one aspect of the invention, a non-transitory machine-readable storage medium is provided, on which executable code is stored, which, when executed by a processor of an electronic device, causes the processor to perform the method described in any of the foregoing.”) Claim 15 is a medium claim reciting limitations similar to claim 2 and is rejected under the same rationale. Regarding Claim 16, SUI in view of Kelur teaches the method of claim 1. SUI teaches a system for enabling deployment of a machine learned model to a plurality of different types of devices, comprising: a data storage interface for accessing a representation of the machine learned model in form of a computational graph, wherein nodes of the computational graph define operations of the machine learned model and edges of the computational graph define data relations between the operations; a processor subsystem configured to: ([0028] “According to another aspect of the present invention, a computing device is provided, comprising: a processor; and a memory having executable code stored thereon, which, when executed by the processor, causes the processor to perform the method as described in any one of the preceding claims.”) The remaining limitations are similar to claim 1 and are rejected under the same rationale. Claims 17 and 18 are system claims reciting limitation similar to claims 2, and (6 and 12) respectively and are rejected under the same rationale. Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over SUI et al. (CN-110766147-A) in view of Kelur et al. (US20230083345A1) and further in view of Github ONNX (ONNX IR Specification) Regarding Claim 4, SUI in view of Kelur teaches the method of claim 1. ONNX teaches wherein the computational graph is defined in ONXX format (Pg. 1, “ONNX is an open specification that consists of the following components: 1. A definition of an extensible computation graph model. 2. Definitions of standard data types. 3. Definitions of built-in operators. #1 and #2 together make up the ONNX Intermediate Representation, or 'IR', specification which is covered herein; the built-in operators are covered in documents listed at the end. Specifically, built-in operators are divided into a set of primitive operators and functions. A function is an operator whose semantics is formally expressed via expansion into a sub-graph (called the function body) using other operators (and functions). Functionality-wise, an ONNX compatible framework or runtime may inline a function body to execute it if it does not have corresponding implementation of the function.”) SUI, Kelur and Github ONNX are analogous art because they are from the same field of endeavor in ML IR (Intermediate Representation) and execution optimization. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, to combine SUI, Kelur and ONNX to utilize ONNX’s extended computational graph capabilities to build more extensible ML computational graphs that have wide compatibility in the industry as ONNX is a recognized industry standard and a recognized design choice. Claim 5 and 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over SUI et al. (CN-110766147-A) in view of Kelur et al. (US20230083345A1) and further in view of Tiwary (US20210124822A1) Regarding Claim 5, SUI in view of Kelur teaches the method of claim 1. Tiwary teaches wherein the binary instruction format is WebAssembly ([0003] “Web-Assembly (“WASM”) is a binary instruction format which executes from a browser and is used to guarantee execution safety with its language features. With recent developments in enabling WASM to be executed outside the browser (enabling WASM to make system calls), support for a variety of languages to be compiled in WASM may make it suitable to be executed in a serverless fashion.” [0025] “WASM is a binary format of compilation target for high level languages and a low-level bytecode for the web. It is designed as an abstraction for an underlying hardware architecture and runs in an isolated sandbox environment, providing platform independence for programmers. Most high-level languages (e.g., C, C++, and RUST) that run on a system can also be converted to WASM to offer near-native speed of execution by leveraging common hardware capabilities. The main problem that WASM aims to solve is that while large applications take a long time to start, a WASM solution should provide a significant increase in compilation time but execute faster.”) SUI, Kelur and Tiwary are analogous art because they are from the same field of endeavor in application code execution management and optimization. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, to combine SUI, Kelur and Tiwary to benefit from WebAssembly’s features and advantages as disclosed by Tiwary including sand box execution for additional security as well as linear memory usage. “Web-Assembly (“WASM”) is a binary instruction format which executes from a browser and is used to guarantee execution safety with its language features. With recent developments in enabling WASM to be executed outside the browser (enabling WASM to make system calls), support for a variety of languages to be compiled in WASM may make it suitable to be executed in a serverless fashion” Tiwary [0003]) Regarding Claim 7, SUI in view of Kelur teaches the method of claim 6. Tiwary teaches wherein the application is configured to establish a sandbox for execution of the binary intermediate representation of the machine learned model on the device ( [0004] " the WASM runtime process may use instruction set secure enclaves to secure an access host such that, even if a root is compromised, an attacker cannot access a sandbox memory heap." [0056] "multiple tenants may operate in separate sandboxes (with access to different memories) improving the security of the system. Given the recent advancements using web-assembly to make system calls, in some embodiments a RUST based execution runtime may facilitate execution of serverless functions via web-assembly modules. The runtime may facilitate the placements of WASM functions based on data locality or gravity, and (at the same time) offer computing resource isolations in terms of CPU, memory and file system to help achieve multi-tenancy in terms of executing serverless functions. Further, some embodiments leverage the runtime features to reduce performance impact and may utilize streaming or machine learning based applications. Note that cloud systems may be are highly resource constrained in terms of computing, storage, and/or network, and a WASM-based framework may allow for better resource utilization." [0042] ". While loading the module to memory, the dynamic loader calls the memory scaller module, which knows the maximum memory to be allocated to any given WASM module. The memory scaller creates a heap of memory of any step size (e.g., initially using calloc( )) or malloc( ) and assigns the continuous memory heap to the WASM module for execution. Anytime during execution when the WASM modules try to access memory whose offset is more than the initial memory allocated (the WASM module requires more memory than the initial step size assigned during runtime), the scaller checks if the current heap size is less than the maximum limit. If so, the scaller creates another, larger memory heap (e.g., the initial step size plus another unit size) and reassigns the WASM module to it. If, however, the memory has already reached the limit, the scaller instead returns segmentation fault."). For motivation to combine please see claim 5. Regarding Claim 8, SUI in view of Kelur and further in view of Tiwary teaches the method of claim 7. Tiwary further teaches wherein the application is configured to provide a linear memory space which is accessible from inside and outside of the sandbox to enable the binary intermediate representation of the machine learned model to access a peripheral or linear memory model similar to languages such as C and C++ which mostly perform dynamic memory allocations via a memory allocator such that opcodes accessing memory are not given a memory address (instead they are given an offset to the beginning of a linear segment). In this way, the linear memory is sandboxed and isolated from other data structures and concurrent processes." [0031] "This WASM module executes inside a sandbox 240 having its own memory space and isolation. However, at this point this “create_table_db.wasm” cannot execute as expected because the module is in a sandbox mode (and cannot access the system calls on the host system). Thus, it needs a glue that translates the call to trigger a call on the host system. WASI is a standardized implementation provided for implementing the necessary system calls and, in turn, exposing functions that could be consumed by a WASM module. For example, if the “create table db.wasm” uses a function that opens a file and writes content, e.g., “open_file( )” this function cannot invoke an open file instruction on the host system. Instead, the WASI module must expose/export a function called “open_file( )” which must then be consumed by the WASM module for the flow to be intact. Internally, the “open_file( )” function in the WASI layer implements the low-level system calls and acts as a uniform wrapper to be consumed by any WASM module. The runtime environment of these WASM modules is defined by the WASM runtime 230 (e.g., WASMer, Lucet, etc.)" [0043-0044] describe interprocess communication through files. " embodiments may adopt inter-process communication in Unix by using the concept of domain sockets. Most of the modern databases, such as PostgreSQL, allow talking to database process directly via domain sockets. With domain sockets, a common file descriptor may be exposed to both processes which then active listen for the same file descriptor."). [0051] “The programs 712, 714 may furthermore include other program elements, such as an operating system, clipboard application, a database management system, and/or device drivers used by the processor 710 to interface with peripheral devices.” EN: SUI also teaches accessing hardware accelerator by writing directly to linear memory DDR, on-chip RAM, registers etc. please see SUI [0113]). Regarding Claim 9, SUI in view of Kelur and further in view of Tiwari teaches the method of claim 8. Kelur further teaches wherein the peripheral comprises a sensor, and wherein the application is configured to: read sensor data from the peripheral; and write the sensor data to the linear memory space to be accessible by the binary intermediate representation of the machine learned model ([0124] "In at least one embodiment, platform controller hub 1030 enables peripherals to connect to memory device 1020 and processor 1002 via a high-speed I/O bus. In at least one embodiment, I/O peripherals include, but are not limited to, an audio controller 1046, a network controller 1034, a firmware interface 1028, a wireless transceiver 1026, touch sensors 1025, a data storage device 1024 (e.g., hard disk drive, flash memory, etc.). In at least one embodiment, data storage device 1024 can connect via a storage interface (e.g., SATA) or via a peripheral bus, such as PCI, or PCIe. In at least one embodiment, touch sensors 1025 can include touch screen sensors, pressure sensors, or fingerprint sensors." Sharing memory access is covered in claim 8. Kelur additionally discloses sharing memory access please see [0263] and [0342]) Claims 12-13 are rejected under 35 U.S.C. 103 as being unpatentable over SUI et al. (CN-110766147-A) in view of Kelur et al. (US20230083345A1) and further in view of Barfiled (US20200249936A1) Regarding Claim 12, SUI in view of Kelur teaches the method of claim 6. Barfield teaches further comprising making the application available for download via an online platform ([0018] "This invention describes a platform for hosting user supplied algorithms or models … a system to allow a user to deploy, scale and manage user supplied algorithms automatically or pseudo-automatically…. Within context of the platform for hosting at the forefront within are defined by choosing the best platform for a user supplied algorithm to be hosted with the methods described herein and showing the tradeoffs to the user of a particular platform to host the user supplied algorithm using a Device API and a Cloud API; a method deciding when to perform inference on the device and when and why the system should perform inference in the cloud; and a method to automatically create apps, websites, demonstration products, and embedded system demonstration applications from a user supplied algorithm deployed in the system.") wherein the application is configured to assign a unique device identifier to the device and register with the online platform using the unique device identifier ([0032] "Block 216 defines certificates associated with the API or a Blockchain validation approach to the API provides security for the API connections to either the device API or the cloud API. Blockchain could allow a group of users and verify identity, and then a user could generate a token for the API. Further, tokens may be set by the user or generated with a hash (e.g., sh256, MD5) to serve as a token-based login for the algorithm API. Certificates that provide encryption over a restful web service or stream (e.g., Kafka) may provide a private and public key for message encryption while transmission of data occurs between the device(s) and cloud APIs. Block 218 dictates the selection of security type such as Blockchain token based security validation, HTTPS with public and private keys for network security, basic authorization with user logins, OAUTH, AES, hashed token based security logins between the device and cloud API, or any combination of the securities listed together. API security is important if data is transmitted for inference across a network, such as a public internet, to create secure transmission of data between the device and the cloud APIs. Block 220 defines a step in which the user supplied algorithm, a generated by method 200 API, determined software dependencies, and an interface between the user supplied algorithm or model is created and is then packaged into a software container that defines a standalone run-time for the algorithm as a software application. This software container can be versioned, tested by passing an input API data type defined by the user or determined by the system through code analysis and stored for deployment in a variety of platforms for the user defined requirements for algorithm deployment. As a part of creating containers, multiple container types may be created suited for different hardware or software platforms like the ones described in 120, 125, and 130, or devices described in 105. Containers provide a standard way to package configurations, application code, and dependencies into a single software object.") SUI, Kelur and Barfield are analogous art because they are from the same field of endeavor in application code execution management and optimization. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, to combine SUI, Kelur and Barfield to benefit from Barfield extensive treatment for deploying machine learning models across multiple devices with differing capabilities. “The embodiment specifies a system to scale manage and deploy user supplied algorithms in a cloud infrastructure or on a device. The system describes methods to manage when to perform user supplied algorithm inference on a device or in the cloud.” (Barfield [0005]) Regarding Claim 13, SUI in view of Kelur and further in view of Barfield teaches the method of claim 12. Barfield further teaches comprising enabling a user to, via the online platform, assign one or more machine learned models to the device ([0029] "Starting with 202, a user interface is specified to let the user do one or many of the following functions: upload algorithm code, algorithm models, interface algorithm functions to APIs, manage monitoring statistics, compare algorithm models, select generated minimum viable products generated for the algorithm data type, select the deployment platform type, manage algorithm API security, manage or download generated APIs for particular device types, download or manage data collected from algorithms, restrict IP address to API's, select API type, select a cloud provider for algorithm hosting, and select a hosting platform type for the algorithm. The following functions are also included: read algorithm attributes such as memory, processor, or algorithm hardware acceleration attributes; edit algorithm configuration files; pay bills; monitor algorithm usage statistics; test uploaded algorithm code; test algorithm environments/containers; specify algorithm dependencies; and serve as an interface to one of the methods described herein.") wherein said assignment is based on the unique device identifier, and wherein the application is configured to retrieve the binary intermediate representation of any machine learned model which is assigned to the device from the online platform ([0032] "Block 216 defines certificates associated with the API or a Blockchain validation approach to the API provides security for the API connections to either the device API or the cloud API. Blockchain could allow a group of users and verify identity, and then a user could generate a token for the API. Further, tokens may be set by the user or generated with a hash (e.g., sh256, MD5) to serve as a token-based login for the algorithm API. Certificates that provide encryption over a restful web service or stream (e.g., Kafka) may provide a private and public key for message encryption while transmission of data occurs between the device(s) and cloud APIs. Block 218 dictates the selection of security type such as Blockchain token based security validation, HTTPS with public and private keys for network security, basic authorization with user logins, OAUTH, AES, hashed token based security logins between the device and cloud API, or any combination of the securities listed together. API security is important if data is transmitted for inference across a network, such as a public internet, to create secure transmission of data between the device and the cloud APIs. Block 220 defines a step in which the user supplied algorithm, a generated by method 200 API, determined software dependencies, and an interface between the user supplied algorithm or model is created and is then packaged into a software container that defines a standalone run-time for the algorithm as a software application. This software container can be versioned, tested by passing an input API data type defined by the user or determined by the system through code analysis and stored for deployment in a variety of platforms for the user defined requirements for algorithm deployment. As a part of creating containers, multiple container types may be created suited for different hardware or software platforms like the ones described in 120, 125, and 130, or devices described in 105. Containers provide a standard way to package configurations, application code, and dependencies into a single software object." [0046] " the Device APIs may sense the device type or version and if the device has the computing ability to perform the user supplied algorithm function. If the device has the ability (as indicated in Device 405 but not in 400), the Device or Device API may download an algorithm model onto the device from the cloud API contained in platform 200, either in the same algorithm model form as the algorithm form in the Cloud API or specific to device, device version, device model, or the device Algorithm Acceleration hardware model." Also see [0047]). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. McMullen et al. (US20190272179A1): discloses isolation techniques for applications. Liu et al. (US11429902B2): discloses deploying ML applications across different compute platforms. Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMIR DARWISH whose telephone number is (571)272-4779. The examiner can normally be reached 7:30-5:30 M-Thurs. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emerson Puente can be reached on 571-272-3652. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /A.E.D./Examiner, Art Unit 2187 /LEWIS A BULLOCK JR/Supervisory Patent Examiner, Art Unit 2199
Read full office action

Prosecution Timeline

Apr 18, 2024
Application Filed
Jan 22, 2026
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12475391
METHOD AND SYSTEM FOR EVALUATION OF SYSTEM FAULTS AND FAILURES OF A GREEN ENERGY WELL SYSTEM USING PHYSICS AND MACHINE LEARNING MODELS
2y 5m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 1 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+66.7%)
4y 0m
Median Time to Grant
Low
PTA Risk
Based on 5 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month