DETAILED ACTION
Claims 1-18 rejected under 35 USC § 112(b), as indefinite.
Claims 1, 3-4, 9-10, 12-13, and 17-18 rejected under 35 USC § 102.
Claims 2, 5-8, 11, 14-16 rejected under 35 USC § 103.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 1-18
Applicant’s claim language is indefinite because it is confusing. Examiner cannot determine what Applicant’s intent of saying: “at least a recently used trained model,” “a plurality of least used models,” and “the at least recently used trained model” in claims 1, 10, and 18. Then, claim 3 recites that the selectively loading of “the at least recently used trained model” utilizes a “least recently used (LRU) technique.” Using the word ‘least’ and ‘used’ in different ways to describe different (or the same?) models renders the claim indefinite.
For purposes of examination, Examiner interprets:
“a plurality of least used models” as the models in memory used the least.
“at least a recently used trained model” and “the at least recently used trained model” as any model in memory that is not considered a “least used model” (i.e., recently used models).
Further, Applicant’s recitation of “at least a recently used trained model” is indefinite. The term “recently used” is representative of a subjective term, because defining what is “recent” depends solely on the unrestrained, subjective opinion of a particular individual practicing the invention. The specification does not provide any standard for measuring the scope of the term, nor any restrictions on the exercise of subjective judgment. See MPEP 2173.05(b)(IV).
Accordingly, claims 1, 10, and 18 are indefinite; claim(s) 2-9 and 11-17 are rejected under 35 USC 112(b) as dependent on an indefinite claim.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claims 1, 3-4, 9-10, 12, and 17-18 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Adibowo, U.S. PG-Publication No. 2022/0215008 A1.
Claim 1
Adibowo discloses a system (110) for resource prediction, the system (110) comprising: a processor (202) operatively coupled with a memory (204), wherein said memory (204) stores instructions which when executed by the processor (202) causes the processor (202) to: receive one or more requests from one or more computing devices (104), wherein one or more users (102) operate the one or more computing devices (104). Adibowo discloses “a cloud-based machine learning (ML) inference platform for large scale, multi-model ML inference services.” The platform implements a method comprising the step of “receiving, by an API server of a plurality of API servers, a prediction request from a client system.” Adibowo, ¶¶ 5-6.
Adibowo discloses wherein the received one or more requests are based on a training of one or more models via a machine learning (ML) engine (108) operatively coupled to the processor (202).ML models are “trained with different data or trained for different purposes or clients” and “loaded into memory when receiving a prediction request corresponding to the ML model.” ML models “are loaded … to the memory based on a management mechanism.” Id. at ¶ 28.
Adibowo discloses determine at least a recently used trained model from the one or more trained models and unload a plurality of least used models from the one or more trained models based on the received one or more requests. API servers 232 “can each execute a custom node selection to determine which … model server 234 is chosen to perform inference service in response to an inference request.” The model servers 234 “execute a least-recently-used (LRU) replacement algorithm to manage replacement of ML models,” wherein “the LRU replacement algorithm is executed to replace an in-use ML model that has not been used for the longest period of time relative to other in-use ML models.” Id. at ¶ 34.
Adibowo discloses utilize a caching mechanism to optimize a memory space associated with the memory (204) by selectively loading the at least recently used trained model. The LRU replacement algorithm “optimizes the server system 230 in providing ML inference services using heterogeneous ML models to heterogenous clients.” Id. at ¶ 35. If the memory at model server 234 is full, the server “can unload one or more in-use ML models that are not recently used based on the LRU replacement algorithm.” If the selected ML model corresponding to the inference request “is not loaded in memory, a cache miss is recorded (316), the ML model is retrieved (e.g., from the model storage 240) and is loaded into memory (318), and inference is executed (322)” (i.e., selectively loading the recently used trained model). Id. at ¶ 47; FIG. 3.
Adibowo discloses predict, via the at least recently used trained model from the one or more trained models, resource data based on the optimized memory space; and enable the resource prediction based on the predicted resource data. The platform implements a method comprising steps of “determining, by the model server, that the ML model is loaded in memory, and in response, incrementing a cache hit; actions further include determining, by the model server, that the ML model is not loaded in memory, and in response incrementing a cache miss, retrieving the ML model from a model storage, and loading the ML model to the memory of the model server.” The cache misses and cache hits (i.e., resource data based on the optimized memory space) are used for “periodically calculating a ratio … and adjusting a number of model servers in the plurality of model servers based on the ratio; and at least one of a maximum number of model servers and a minimum number of model servers is determined based on the ratio, and the number of model servers is adjusted based on the maximum number of model servers and the minimum number of model servers (adjusted number of servers → predicted resource data).
Claim 3
Adibowo discloses wherein the processor (202) is configured to utilize a least recently used (LRU) technique as the caching mechanism to optimize the memory space. API servers 232 “can each execute a custom node selection to determine which … model server 234 is chosen to perform inference service in response to an inference request.” The model servers 234 “execute a least-recently-used (LRU) replacement algorithm to manage replacement of ML models,” wherein “the LRU replacement algorithm is executed to replace an in-use ML model that has not been used for the longest period of time relative to other in-use ML models.” Adibowo, ¶ 34. The LRU replacement algorithm “optimizes the server system 230 in providing ML inference services using heterogeneous ML models to heterogenous clients.” Id. at ¶ 35
Claim 4
Adibowo discloses wherein the memory (204) is a random access memory (RAM) for storing the one or more trained models. Instructions and data are “received from … a random access memory.” Adibowo, ¶ 64; See Also ¶ 25 (“main memory includes random access memory”).
Claim 9
Adibowo discloses wherein the processor (202) is configured with a conditional lock functionality to process, in a successive order, at least a model from the one or more trained models based on the received one or more requests. The method “provides for selection of the same model node for inference that calls for a particular ML model in combination with LRU caching within each node,” wherein the routing “is achieved by hashing a combination of the node’s identity … with the ML model’s identity and then ordering the hashed combination” (ordering the hashed node/model combination → process in a successive order).” The “ordering is fixed as long as the pool of nodes is made constant,” and “this fixed set of ordering makes effective use of the LRU cache within a node.” In embodiments, “the selection can be based on hashing a combination of model server identifier and the ML model identifier and ordering the model servers based on respective hash values,” such that “the inference request for a particular ML model would more frequently be sent to the same model server 234.” Adibowo, ¶¶ 48-49.
Claims 10, 12, and 17
Claims 10, 12, and 17 are rejected utilizing the aforementioned rationale for Claims 1, 3, and 9; the claims are directed to a method performed by the system.
Claim 18
Claim 18 is rejected utilizing the aforementioned rationale for Claim 1; the claim is directed to a “user equipment” comprising the same elements of the system.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 2 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Adibowo, U.S. PG-Publication No. 2022/0215008 A1, in view of Braz et al., U.S. PG-Publication No. 2020/0175387 A1.
Claim 2
Braz discloses wherein the processor (202) is configured to utilize a versioning logic mechanism to determine the at least recently used trained model from the one or more trained models and process the received one or more requests. Braz discloses a method “for managing and deploying AI models, including storing at least one artificial intelligence (AI) model in a model store memory in a plurality of different versions, each different version having a different level of fidelity; receiving a prediction request to process the AI model; determining … which version of the AI model to use for processing the received prediction request … using the determined version of the AI model; and responding to the received prediction request with a result of the processing … using the determined AI model version.” Braz, ¶ 5. Further, this versioning method uses “eviction/loading policies for which AI model to evict that is currently residing in the memory used for storing models available for immediate execution, when a determination is made to move another AI model into that memory for execution,” including a “least-recently-used (LRU) policy” or “a policy that considers a potential gain in confidence level or fidelity level between different versions of AI models.” Id. at ¶ 39.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify the multi-model inference services of Adibowo to incorporate multi-model versioning services as taught by Braz. One of ordinary skill in the art would be motivated to integrate multi-model versioning services into Adibowo, with a reasonable expectation of success, in order to reduce latency in responding to inference requests, by allowing “for serving a large number of models with low delay at a temporary cost of performance, as well as a mechanism that permits accuracy to improve with subsequent requests from [a] client.” See Braz, ¶¶ 24-25.
Claim 11
Claim 11 is rejected utilizing the aforementioned rationale for Claim 2; the claim is directed to a method performed by the system.
Claims 5-8 and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Adibowo, U.S. PG-Publication No. 2022/0215008 A1, in view of Brand et al., U.S. PG-Publication No. 2016/0071027 A1.
Claim 5
Brand discloses wherein the processor (202) comprises a base manager to store one or more trained models, and enable one or more parallel processes to utilize the one or more trained models. Brand discloses “methods … used to perform real-time analysis and modeling of large events streams of events.” The event stream is partitioned “among multiple local modelers that perform the same set of operations,” so that “processing of the event stream can be performed in parallel, increasing throughput.” Brand, ¶¶ 35-36. The method “initializes each central modeler, e.g., registers each local modeler in communication with the central modeler, obtains machine learning models identified in the configuration file, and so on,” wherein “a central modeler can provide the machine learning model to local modelers in communication with the central modeler” (central modeler → base manager). The method provides “a machine learning model to local modelers by executing a central modeler function to provide the machine learning model, and executing a local modeler function to receive and store the machine learning model” (i.e., enable one or more parallel processes to utilize the trained models). Id. at ¶ 94; See Also ¶ 92 (“multiple local modelers can execute in parallel to increase throughput”), ¶ 99 (“each local modeler performing the same operations in parallel on received events”).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify multi-model inference services of Adibowo to incorporate parallel processing using trained models as taught by Brand. One of ordinary skill in the art would be motivated to integrate parallel processing using trained models into Adibowo, with a reasonable expectation of success, in order to improve performance by using “multiple local modelers that perform the same set of operations,” enabling “processing of the event stream … in parallel, increasing throughput.” See Brand, ¶ 36; See Also ¶ 91 (“multiple local modelers can execute in parallel to increase throughput”).
Claim 6
Brand discloses wherein the processor (202) is configured with a common loading functionality for loading the one or more trained models. Figure 6 illustrates system 600 “for processing an event stream by an example routing strategy using context data,” wherein system 600 “includes multiple local modelers 610a-n- of a stream processing system in communication with a central modeler 620.” System 600 further comprises a “routing node 604” that “receives an event stream and routes each event in the event stream according to the routing strategy, e.g., to a particular local modeler that stores context data related to the processing of the event” (routing strategy using context data → common loading functionality). Brand, ¶ 111. System 600 “can partition context data so that particular context data related to a particular event is likely to be located on a same local modeler as other context data related to the particular event, e.g., context data needed for processing the event.” Id. at ¶ 113.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to modify multi-model inference services of Adibowo to incorporate parallel processing using trained models as taught by Brand. One of ordinary skill in the art would be motivated to integrate parallel processing using trained models into Adibowo, with a reasonable expectation of success, in order to improve performance by using “multiple local modelers that perform the same set of operations,” enabling “processing of the event stream … in parallel, increasing throughput.” See Brand, ¶ 36; See Also ¶ 91 (“multiple local modelers can execute in parallel to increase throughput”).
Claim 7
Brand discloses wherein the base manager is configured to use a race condition avoidance solution to prevent a read or write operation of the one or more parallel processes in a directory. Brand discloses that since “the context data is maintained in operational memory, the system can quickly obtain the requested context data, avoid data locking issues and race conditions, and avoid having to call and obtain context data from an outside database.” The system performs an operation using the context data.” Brand, ¶ 129; See Also ¶ 23 (system partitions context data into local memories of local modelers of the stream processing system to “reduce latency due to data locking issues, and race conditions”).
Claim 8
Brand discloses wherein the common loading functionality is configured with a multi-processing lock functionality to prevent the one or more parallel processes from utilizing the one or more trained models simultaneously. Brand discloses that since “the context data is maintained in operational memory, the system can quickly obtain the requested context data, avoid data locking issues and race conditions, and avoid having to call and obtain context data from an outside database.” The system performs an operation using the context data.” Brand, ¶ 129; See Also ¶ 23 (system partitions context data into local memories of local modelers of the stream processing system to “reduce latency due to data locking issues, and race conditions”).
Claims 13-16
Claims 13-16 are rejected utilizing the aforementioned rationale for Claims 5-8; the claims are directed to a method performed by the system.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See Kozhaya et al., U.S. PG-Publication No. 2019/0391956 (abstract describing system for selecting machine learning models for service use).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FRANK D MILLS whose telephone number is (571)270-3172. The examiner can normally be reached M-F 10-6 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KEVIN YOUNG can be reached at (571)270-3180. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/FRANK D MILLS/Primary Examiner, Art Unit 2194 December 5, 2025