Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR
1.17(e), was filed in this application after final rejection. Since this application is eligible for
continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely
paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.
Applicant's submission filed on 09/12/2025 has been entered.
Response to argument
Applicant's arguments filed 09/12/2025 ("Arguments/Remarks") have been fully considered but they are not persuasive.
Argument – 1: page: 10, regarding 35 U.S.C 112 rejection, applicant contends: “Claims 7, 9, 18 and 20 are rejected under 35 U.S.C. 112(b) as allegedly being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventors or applicant regards as the invention. The Applicant respectfully disagrees. Nevertheless, in an effort to move prosecution forwards Claims 7, 9, 18 and 20 are amended to address the Examiner’s comments. In particular, the Examiner alleges that “global feasible performance region,” “balance,” and “first feasible performance region” are unsupported by the Applicant’s specification. Applicant submits that “global feasible performance region,” “balance,” and “first feasible performance region” are fully supported at least at Paras. [0003], [0007]-[0008], [003 1]-[0033], [0042]-[0045] of the Applicant’s specification.”
Regarding argument – 1, the Examiner notes that, claim 7 and its analogues claim 18 recite determining a “global feasible performance region.” The claim further states that the global feasible performance region is associated with “balanced values of the resource level.” However, the term “balanced value” has not been previously introduced or defined in the claims, nor is there a clear and definite description in the specification of how such “balanced values” are determined. While the specification may reference allocation or efficiency concepts, it does not provide sufficient guidance to inform one of ordinary skill in the art with reasonable certainty as to the meaning of “balanced values” in the context of the claims. Moreover, “the global feasible performance region” could represent set of all achievable performance outcomes, one of ordinary skill in the art would not be able to ascertain the intended scope. Further clarification is needed.
Similarly, claim 9 and analogous claim 20 recite a “first feasible performance region.” these are terms that one of ordinary skill in the art would not be able to ascertain the intended scope because the terms are not terms of art and a criterion for determining them as claimed in unclear from the claim language. Further clarification is needed.
Argument – 2, Page: 12 applicant contends: “Qin fails to cure the deficiencies of Guodong and Zhang because Qin also does not teach or suggest “a first set of resourcing levels corresponding to a first ratio or value per action or cost per action associated with the first set of models.” At best Qin describes an advertising setup where, for example, a company (e.g., a shoe manufacturer) that buys advertisements on a website (e.g., search engine and advertisement company) pays the website when a potential customer performs an action such as completing a sale (e.g., buys the shoes). This is referred to as a “cost per action” because in this setup the company pays the advertiser (i.e., incurs a cost) when the customer takes an action. As such, Qin’s cost per action is entirely different from the claimed “the first model including a first plurality of submodels trained at differing resource levels corresponding to a first ratio or value per action or cost per action associated with the first plurality of submodels, wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization.” Accordingly, no combination of Qin, Guodong, and Qin teaches or suggests all of the elements of amended Claim 1 and so Claim 1 is fully patentable over the cited prior art. Claims 12 and 21 are similarly amended and also allowable over the cited prior art.”
Regarding argument – 2, Qin ¶[0004], describes a cost per action model in which advertiser resources are expended only when a measurable action occurs. In particular, Qin explains that advertises submit ads with associated actions, ad auctions are conducted and advertisers are charged a fee for each reported action attributed to a winning ad. Thus expressly discloses a system that relates resources consumed (advertiser spend) to actions obtained (user conversions).
Applicant’s specification ¶[0047] describes the use of “cost per action” and “value per action” in the context of allocating resource across different function, such as pay-per-click and direct marketing. ¶[0135] … A linear regression model is then used to estimate the action rate for the cost-per-action ads data (i.e.: using a model to estimate the action rate for cost per action ads. (i.e.: cost per action associated with the model)). In both Qin and Applicant’s disclosure, the fundamental principle is the same: determining the efficiency of resource use by relating the cost expended to the number or value of actions achieved, and then using this relationship to guide allocation decisions.
Claim Rejections - 35 USC § 112: New Matter
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and
of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and
process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claim(s) 7 and 18, are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AlA), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AlA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Claim(s) 7 and 18 recite “… wherein the first and second set of resourcing levels are compatible;” that is considered new matter because the original disclosure does not appear to support for … wherein the first and second set of resourcing levels are compatible, the specification does not describe the first and second resourcing levels are compatible. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ),
second paragraph: The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim(s) 7, 9, 18 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
Regarding claims 7 and 18 recite limitations “a global feasible performance region”, “balanced values” and “wherein the first and second set of resourcing levels are compatible”. There is insufficient antecedent basis for the limitations in the claims, which arises from the ambiguity of reference.
“a global feasible performance region” is indefinite because the specification does not explain what constitutes the boundaries of this “region” or how its determined.
“balanced values” is indefinite because the term has not been previously introduced or defined in the claims, nor is there a clear and definite description in the specification of how such “balanced values” are determined.
“wherein the first and second set of resourcing levels are compatible” the specification does not define “compatible” or provide any guidance on how to determine compatibility.
These are terms that one of ordinary skill in the art would not be able to ascertain the intended scope because they are not term of art, lack clear definition in the specification and provide no objective boundaries for one ordinary skill in the art to ascertain their scope.
Claim 7 and 18 also recite limitations “wherein the global performance region comprises a plot of a range of performance indices as a function of constraints for the organization”. There is insufficient antecedent basis for the limitation in the claim, which arises from the ambiguity of reference. Specifically, the term "the global performance region " lack clarity because it is not clearly defined or linked to a prior element in the claim. For a more precise understanding, the claim should explicitly define or identify terms before refereeing it. For examination purposes, the examiner will interpret “the global performance region” as the global feasible performance region.
Regarding claim(s) 9 and 20 recite limitation “a first feasible performance region”. There is insufficient antecedent basis for the limitation in the claims, which arises from the ambiguity of reference. The term that one of ordinary skill in the art would not be able to ascertain the intended scope because the term is not terms of art and a criterion for determining it as claimed in unclear from the claim language. For examination purposes, the examiner interprets a performance region of a model with the scope of the claim limitation.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1 – 3, 5, 12 – 14 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Guodong, et al. "Federated learning for open banking", (hereafter Guodong) in view of Zhang et al., "Personalized federated learning with first order model optimization", (hereafter Zhang), Qin et al., Pub. No.: US20130246167A1, (hereafter Qin) and Achin et al., Pub. No.: US9489630B2, (hereafter Achin).
Regarding claim 1, Guodong teaches: A method comprising:
receiving data characterizing a first output of one or more of a first set of models associated with a first sub-organization of an organization,
(Guodong, page: 6, “4 Statistical heterogeneity in federated learning One challenge of federated learning is statistical heterogeneity in which users have different data distribution. Statistical heterogeneity is an inherent characteristic of a user’s behavior. It is also identified as a non-IID problem. Conventional machine learning is built upon the IID assumption of a uniform dataset. The stochastic gradient descent (SGD) optimization used in vanilla federated learning is not specifically designed and optimized for tackling non-IID data. As described in Fig. 1, each participant’s data are generated from different distributions [receiving data characterizing a first output of one or more of a first set of models associated with a first sub-organization of an organization]. Each local model should then be initialized by the global model that represents a particular distribution.”)
the one or more of the first set of models trained on a first dataset;
training one or more of a second set of models associated with a second organization based on a second dataset,
(Guodong, page: 9, “4.2 Personalized modelling: When a service provider wants to provide a service that is the best for each individual customer [a second organization], the model trained in the central server needs to be personalized or customized. The simplest solution is to treat the global model as a pre-trained model [the one or more of the first set of models trained on a first dataset] (i.e.: the global model is trained with global dataset before distributed to local models), and then use local data [based on a second dataset] to fine-tune the global model, which will derive a personalized model [training one or more of a second set of models associated with a second organization]. However, in most cases, each participant just has a limited number of instances, and the fine-tuning operation will cause over-fitting or increase the generalization error. Another solution is to treat each customer as a target task and the pre-trained global model as a source task, then to apply transfer learning [28] or domain adaptation [3] methods to fine-tune each personalized model. These methods will further leverage the global information to improve the fine-tuning process for each participant. [25] discusses two approaches, namely Data Interpolation and Model Interpolation, to learn a personalized federated learning model by weighting two components between local and global servers in terms of data distributions or models respectively. Personalization layers In general, a model can be decomposed into two parts: a representation learning part and a decisive part. For example, CNN is composed of convolution layers for representation extraction and fully-connected layers for classification decision. In a federated learning setting, heterogeneity could impact one of the two parts. [2] proposes to share representation layers across participants, and then keep decision layers as a personalized part. [24] thinks representation layers should be the personalized part, and then the decision layers could be shared across participants.”)
global constraints, and the first output;
(Guodong, page: 6, “3 Problem formulation: The learning process of federated learning is decomposed into two parts that occur in different places: server (coordinator) and nodes (participants). These two parts are linked to each other via a specifically designed mechanism. In particular, the participant i can train a local model hi using its own dataset Di = {(xi· , yi·)}. The model hi is initialized by a globally shared model parameter W [global constraints, and the first output] (i.e.: initial parameters set by a global model and shared to participants models) which is then fine-tuned to a new local model with parameters Wi using the data from node i. It is proposed that the coordinator in federated learning can learn a global model controlled by W that could be shared with all participants on distributed nodes. Through a few rounds of communication, the global model has been gradually improved to better suit all participants, and the final global model is an optimal solution that could directly be deployed on each participant for further use.”)
retraining the first set of models or a subset thereof.
(Guodong, page: 12, “In an open banking data marketplace, the use of participants’ data may be charged by times. For example, in a federated learning process, if the coordinator asks the participant to train the local model three times, they should be charged by three times as well. Moreover, the participant’s data is a dynamically changing profile including its banking-related activities. To capture the dynamic changes of the customers, the federated learning may take an incremental or lifelong learning strategy to regularly use a participant’s data to refine the global model [retraining the first set of models or a subset thereof] (i.e.: the global model is updated and retrained by data updates from participants model training output). This will bring a new challenge for continuously sharing data in a long-term model training framework.”)
Guodong do not teach:
assessing, based on a second output of the one or more of the second set of models, performance of the one or more of second set of models;
a first set of resourcing levels corresponding to a first ratio or value per action or cost per action associated with the first set of models
wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization;
Zhang teaches: assessing, based on a second output of the one or more of the second set of models, performance of the one or more of second set of models; and
(Zhang, page: 2, “Key here is that after each federating round, we maintain the client-uploaded parameters individually, allowing clients in the next round to download these copies independently of each other. Each federated update is then a two-step process: given a local objective, clients (1) evaluate [performance of the one or more of second set of models;] how well their received models perform on their target task [assessing, based on a second output of the one or more of the second set of models] and (2) use these respective performances to weight each model’s parameters in a personalized update. We show that this intuitive process can be thought of as a particularly coarse version of popular iterative optimization algorithms such as SGD, where instead of directly accessing other clients’ data points and iteratively training our model with the granularity of gradient decent, we limit ourselves to working with their uploaded models. We hence propose an efficient method to calculate these optimal combinations for each client, calling it FedFomo, as (1) each client’s federated update is calculated with a simple first-order model optimization approximating a personalized gradient step, and (2) it draws inspiration from the “fear of missing out”, every client no longer necessarily factoring in contributions from all active clients during each federation round. In other words, curiosity can kill the cat. Each model’s personalized performance can be saved however by restricting unhelpful models from each federated update.”)
Zhang and Guodong are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Zhang with teachings of Guodong to enhance customizability and adaptability of a machine learning model. Moreover, it facilitates transfer learning outside of local data distributions, ensuring improved model performance while maintaining privacy (Zhang, Abstract).
Guodong in view of Zhang do not teach:
a first set of resourcing levels corresponding to a first ratio or value per action or cost per action associated with the first set of models;
wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization;
Qin teaches:
a first set of resourcing levels corresponding to a first ratio or value per action or cost per action associated with the first set of models;
(Qin, “[0135] The process described with reference to FIG. 10 relies on the search engine provider's ability to estimate the actual action rate associated with a particular ad. A variety of techniques may be employed to estimate the action rate. In an example implementation, various features associated with the cost-per-action ads are extracted [a first set of resourcing levels corresponding to a first ratio or value per action or cost per action] (i.e.: various features related to (cost-per-action) CPA ads are extracted (e.g., historical engagement metrics, ad attributes, contextual factors)). A linear regression model is then used to estimate the action rate for the cost-per-action ads data [associated with the first set of models]. The linear regression model may first be trained using a set of training for which the action rates are known. For instance, in one example implementation, given a training set of n ads for which action rates are known, for ad i, yi represents the action rate and xi represents the ad's feature vector. Using the training data, a weight vector β* is learned, which can be used to predict yi using xi according to:”)
Qin, Guodong and Zhang are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Qin with teachings of Guodong and Zhang to enable a cost-per-action (CPA) model, ensuring advertisers pay based on actual user engagement. Contextual ad auctions optimize placements, while periodic reports verify actions, enhancing transparency and ROI, (Qin, Abstract).
Guodong in view of Zhang and Qin does not teach:
wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization;
Achin teaches:
wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization;
((Achin, col. 20 line[45 – 55]), “The allocated processing resources may include temporal resources (e.g., execution cycles of one or more processing nodes, execution time on one or more processing nodes, etc.), physical resources (e.g., a number of processing nodes, an amount of machine-readable storage (e.g., memory and/or secondary storage), etc.), and/or other allocable processing resources. In some embodiments, the allocated processing resources may be processing resources of a distributed computing system and/or a cloud-based computing system. In some embodiments, costs may be incurred when processing resources are allocated [wherein the first set of resourcing levels comprises an allocation] and/or used (e.g., fees may be collected by an operator of a data center in exchange for using the data center's resources) [of one or more resources to the first sub-organization].”)
Achin, Guodong, Zhang and Qin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Achin with teachings of Guodong, Zhang and Qin to allocate resources more efficiently and improve accuracy in forecasting advertiser actions and costs. (Achin, Abstract).
Regarding claim 2, Guodong in view of Zhang, Qin and Achin teach the method of claim 1.
Guodong further teaches: further comprising providing information associated with the assessment and/or the second output to the first set of models.
PNG
media_image1.png
284
574
media_image1.png
Greyscale
[AltContent: textbox ([further comprising providing information associated with the assessment])][AltContent: textbox ([the second output to the first set of models])][AltContent: rect] (Guodong, Fig. 1)
Claim 13 recites analogous limitations as claim 2, so is rejected under similar rationale.
Regarding claim 3, Guodong in view of Zhang, Qin and Achin teach the method of claim 1.
Guodong further teaches: wherein the received data characterizing the first output of the one or more of the first set of models includes the global constraints and/or a first set of resourcing levels.
(Guodong, page: 6, “3 Problem formulation: The learning process of federated learning is decomposed into two parts that occur in different places: server (coordinator) and nodes (participants). These two parts are linked to each other via a specifically designed mechanism. In particular, the participant i can train a local model hi using its own dataset Di = {(xi· , yi·)}. The model hi is initialized [wherein the received data characterizing the first output of the one or more of the first set of models includes] (i.e.: first set of models (global models) initialize participants models (second sets of models)) by a globally shared model parameter W [includes the global constraints] (i.e.: initial parameters set by a global model and shared to participants models) which is then fine-tuned to a new local model with parameters Wi using the data from node i. It is proposed that the coordinator in federated learning can learn a global model controlled by W that could be shared with all participants on distributed nodes. Through a few rounds of communication, the global model has been gradually improved to better suit all participants, and the final global model is an optimal solution that could directly be deployed on each participant for further use.”)
Claim 14 recites analogous limitations as claim 3, so is rejected under similar rationale.
Regarding claim 5, Guodong in view of Zhang, Qin and Achin teach the method of claim 3.
Guodong further teaches: further comprising: training one or more of the first set of models, wherein the training is based on one or more of the global constraint, the second output of the one or more of the second set of models,
(Guodong, page: 6, “3 Problem formulation: The learning process of federated learning is decomposed into two parts that occur in different places: server (coordinator) and nodes (participants). These two parts are linked to each other via a specifically designed mechanism. In particular, the participant i can train a local model hi using its own dataset Di = {(xi· , yi·)}. The model hi is initialized by a globally shared model parameter W which is then fine-tuned to a new local model with parameters Wi using the data from node i. It is proposed that the coordinator in federated learning can learn a global model [training one or more of the first set of models] controlled by W [wherein the training is based on one or more of the global constraint] that could be shared with all participants on distributed nodes. Through a few rounds of communication, the global model has been gradually improved [the second output of the one or more of the second set of models] (i.e.: output form each participants send back to the global model for update) to better suit all participants, and the final global model is an optimal solution that could directly be deployed on each participant for further use.”)
the first set of resource levels, and training data associated with the first set of models.
(Guodong, Fig. 1)
Claim 16 recites analogous limitations as claim 5, so is rejected under similar rationale.
Regarding claim 12, Guodong teaches: A system comprising:
at least one data processor; and
memory storing computer executable instructions which, when executed by the at least one data processor causes the at least one data processor to perform operations comprising:
(Guodong, page: 4, “2.3 Federated learning for open banking Federated learning is a decentralized machine learning framework that can train a model [at least one data processor; and memory storing computer executable instructions] (i.e.: a federated learning framework involves components such as processors and storage units to facilitate decentralized training and data management) without direct access to users’ private data. The model coordinator and user/participant exchange model parameters that can avoid sending user data. However, the exchanging of model parameters, or gradients in machine learning terminology, may cause data leakage [12]. Therefore, differential privacy [11] technology is essential for federated learning to protect privacy from gradient based cyber-attack [1].”)
The rest of the limitations are analogous to claim 1, so is rejected under similar rationale.
Claim(s) 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Guodong in view of Zhang, Qin, Achin and in further view of Sui et al., "Feded: Federated learning via ensemble distillation for medical relation extraction", (hereafter Sui).
Regarding claim 4, Guodong in view of Zhang, Qin and Achin teach the method of claim 3.
Guodong in view of Zhang, Qin and Achin do not teach: wherein a second set of resourcing levels are determined based on the first set of resourcing levels.
Sui teaches: wherein a second set of resourcing levels are determined based on the first set of resourcing levels.
(Sui, page: 2, “In this paper, we introduce a privacy-preserving medical relation extraction model, named FedED. To prevent private information leakage, we leverage federated learning without sharing raw privacy sensitive medical texts. To overcome the communication bottleneck in federated relation extraction, we focus on reducing the size of transmitted messages at each communication round. To this end, we formulate the central aggregation process in federated learning as learning a compact central model (student) from the ensemble (Dietterich, 2000; Breiman, 2001) of multiple local models (teacher). From this perspective, only the predicted labels on a small dataset need to be uploaded to the central server [wherein a second set of resourcing levels] (i.e.: determination of the second set of resource levels, which involves uploading a small dataset to the central server is influenced by the characteristics of the first set of resource levels, which entail learning from a "teacher" model), because learning from a “teacher” model only requires the behavior of the “teacher” rather than the entire “teacher” network [are determined based on the first set of resourcing levels] (i.e.: the decision to upload only the predicted labels on a small dataset is based on the understanding that learning from a "teacher" model necessitates only the behavior of the "teacher" rather than the entire "teacher" network.) (Hinton et al., 2015). Besides, the ensemble model (teacher) is powerful, which defines the upper extreme of aggregating when limited to a single communication in federated learning (Yurochkin et al., 2019). To transfer the knowledge in the ensemble model to the central model, we leverage a strategy based on knowledge distillation (Hinton et al., 2015), which trains the central model by forcing it to have a similar prediction with the ensemble model.”)
Sui, Guodong, Zhang, Qin and Achin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Sui with teachings of Guodong, Zhang, Qin and Achin to mitigate privacy risks in relation to data extraction systems. By training local models with private data stored at individual platforms and aggregating model parameters centrally, the system ensures data privacy without uploading or storing sensitive information on a central server (Sui, Page: 2, “Previous relation extraction methods require centralizing the underlying training data from different medical platforms, such as hospitals and healthcare centers, on one server for training, while holding the centralized privacy-sensitive data puts patients’ privacy at risk. This is one of the reasons that hinder the use of relation extraction in clinical practice. As a possible solution, federated learning (McMahan et al., 2016) is proposed to make full use of privacy-sensitive data. Training local models with private data at local platforms and aggregating local models in the central server compose the federated learning process. In the framework of federated learning, no single piece of private data is uploaded to or stored on the central server, and only local models’ parameters are sent to the server for updating the central model.”)
Claim 15 recites analogous limitations as claim 4, so is rejected under similar rationale.
Claim(s) 6 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Guodong in view of Zhang, Qin, Achin and in further view of Rajkumar et al., Pub. No.: US20200311616A1, (hereafter Rajkumar).
Regarding claim 6, Guodong in view of Zhang, Qin and Achin teach the method of claim 5.
Guodong in view of Zhang, Qin and Achin do not teach: receiving a user input from a user associated with the second set of models, the input indicative of user constraints on the first output of the one or more of the first set of models; and
training, the one or more of the first set of models, based on the user input.
Rajkumar teaches: receiving a user input from a user associated with the second set of models, the input indicative of user constraints on the first output of the one or more of the first set of models; and
training, the one or more of the first set of models, based on the user input.
(Rajkumar, “[0003] In some implementations, robots can perform processes to learn new information and new abilities from situations they encounter. Techniques disclosed herein can enable a robot to incorporate new information so that the robot learns almost instantaneously from user feedback [receiving a user input from a user associated with the second set of models] (i.e.: when a user provides a robot with feedback for a previously unknown object, the robot may store the information and apply it to current and future tasks) or other sources of information. For example, a robot can have a local cache where certain types of learned information is stored. When the robot acquires new information, such as the classification for a previously unknown object, the robot can store a representation of the new information in the cache to make the information immediately available for the robot to use. This technique can allow near-instantaneous learning by a robot since the computational demands for incorporating new information, e.g., generating the representation and saving it in a cache accessed by robot systems, are extremely low. For example, the representation may have already been computed as part of the continuous onboard inference process of the robot. As a result, when a user provides a robot with a classification [the input indicative of user constraints on the first output of the one or more of the first set of models] (i.e.: the user's classification serves as a constraint or specification on how the robot (which can be considered a model) should interpret or output its understanding of the previously unknown object. The classification provided by the user dictates how the robot should adjust its behavior or output when it encounters that object in the future) for a previously unknown object, the robot may store the information and apply it to current and future tasks with minimal delay [training, the one or more of the first set of models, based on the user input] (i.e.: the robot incorporates the new information (user input) almost instantaneously and uses it for current and future tasks by training the models based on the user input, as the robot updates its knowledge base and improves its functionality based on the new information provided by the user).”)
Rajkumar, Guodong, Zhang, Qin and Achin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Rajkumar with teachings of Guodong, Zhang, Qin and Achin to add continuous and incremental learning to the machine learning model. By periodically updating the machine learning model and frequently providing new learned information, the system enables individual units to fine-tune their behavior regularly (Rajkumar, “[0006] In some implementations, the server can transmit the updated machine learning model to the fleet at a weekly or a fortnightly basis. The server can also provide update learned information more frequently, as embeddings or in other representations, each time a new set of robot learning is received by the server. Robots in the fleet can store this learned information in a cache and use it alongside the most recent version of the machine learning model. The receipt of new information for the cache can fine-tune the behavior of an overall perception system for each robot in the fleet on a daily basis or each time a new set of robot learning is provided to the robot. Thus, robots can use the updated machine learning model to incorporate the learning from the fleet, and can clear their local caches since the cached information has been incorporated into the model.”)
Claim 17 recites analogous limitations as claim 6, so is rejected under similar rationale.
Claim(s) 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Guodong in view of Zhang, Qin, Achin and in further view of Zhu et al., "Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance.", (hereafter Zhu) Featonby et al., Pub. No.: US20200310876A1, (hereafter Featonby) and Kaushik et al., Pub. No.: US10809936B1, (hereafter Kaushik).
Regarding claim 7, Guodong in view of Zhang, Qin and Achin teach the method of claim 5.
Guodong in view of Zhang, Qin and Achin do not teach: assessing a combined performance of the first set of models and the second set of models; determining, using the combined performance, a global feasible performance region, wherein the global feasible performance region is associated with balanced values of the first and a second set of resourcing levels; and
displaying the global feasible performance region.
wherein the first and second set of resourcing levels are compatible;
wherein the global performance region comprises a plot of a range of performance indices as a function of constraints for the organization.
Zhu teaches: assessing a combined performance of the first set of models and the second set of models; determining, using the combined performance, a global feasible performance region, wherein the global feasible performance region is associated with balanced values of the first and a second set of resourcing levels; and
(Zhu, page: 6, “3.3 Experimental procedure: The experiments are performed on a PC with a 3.19 GHz Intel Core i3 CPU and 1.92 GB RAM, using Windows XP operating system. Data mining toolkit Waikato Environment for Knowledge Analysis (WEKA) version 3.6.12 is sued for experiment. WEKA is a popular and free available suite of machine learning and data mining software which is written with Java and developed at the University of Waikato, New Zealand.
We firstly compare average accuracies of IEML methods (i.e., multi-boosting and RS–boosting) with other three EML methods (i.e., bagging, boosting and RS) and an IML method (i.e., DT), for predicting SMEs’ credit risk in SCF [assessing a combined performance of the first set of models and the second set of models] (i.e.: combine these performance metrics into a single composite score for each model. For instance, a weighted average or another aggregation method could be used to synthesize these metrics into one). Secondly, we compare type I and II errors of IEML methods with that of other three EML methods and an IML method. Finally, we compare area under ROC curves of IEML methods (i.e., multi-boosting and RS–boosting) with that of their base EML methods (i.e., bagging, boosting and RS) and IML (i.e., DT) method, respectively [determining, using the combined performance, a global feasible performance region, wherein the global feasible performance region is associated with balanced values of the first and a second set of resourcing levels] (i.e.: define a performance region where all models have acceptable combined scores. This region can be represented in a multidimensional space where each axis corresponds to a different performance metric or resource constraint). For implementation of bagging, boosting, RS and DT, we choose WEKA bagging module, i.e., WEKA ADBoostM1 module, WEKA random subspace module and WEKA J48, respectively. For implementation of RS–boosting, we use WEKA Package, i.e., WEKA.JAR and implement in Eclipse according to Wang and Ma [8]. For implementation of multi-boosting, we use WEKA MultiBoostAB. Meanwhile, we employ DT as base classifier of multi-boosting and RS–boosting according to Maclin and Opitze [20], Fu et al. [21] and Wang and Ma [8].”)
displaying the global feasible performance region.
(Zhu, page: 6, “4 Empirical results: In this section, we firstly evaluate two IEML methods on 377 data sets using both NN (i.e., Multilayer Perceptron) and DT (i.e., C4.5) as the base classifier for choosing the appropriate one. Then, we show [displaying the global feasible] that IEML methods compete quite outstanding against EML methods and IML method by analyzing prediction evaluation criteria [performance region] (i.e., average accuracy, type I error, II error, ‘precision’ rate, ‘recall’ rate, ‘F-Measure’ rate and ROC curve), which are excellent methods for predicting SMEs credit risk in SCF. Meanwhile, we compare the prediction evaluation criteria of multi-boosting with RS–boosting in order to find the better IEML method in predicting SMEs’ credit risk.”)
Zhu, Guodong, Zhang, Qin and Achin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Zhu with teachings of Guodong, Zhang, Qin and Achin to include individual, ensemble, and integrated ensemble techniques, to accurately predict risks and enhance reliability and decision-making, supporting smoother and more secure operations (Zhu, Abstract).
Guodong, Zhang, Qin, Achin and Zhu:
wherein the first and second set of resourcing levels are compatible;
wherein the global performance region comprises a plot of a range of performance indices as a function of constraints for the organization.
Featonby:
wherein the first and second set of resourcing levels are compatible;
(Featonby, “[0206] The optimization component 126 may then determine that the computational biases of the corresponding VM instances 114(1) and 114(2), and/or the workload categories 220(1) and 220(2), are computationally complimentary such that it is advantageous to have a same computing device 112 host the two VM instances 114(1) and 114(2). For example, and as illustrated, the compute-dimension utilizations 2004 for each of the resource-utilization models 224(1) and 224(2) may be compatible [wherein the first and second set of resourcing levels are compatible] such that the dimensions of compute utilized by workloads in the two workload categories 220(1) and 220(2) combine well to maximize the use of the resources provided by the computing device 112. As shown, five dimensions of compute 2006, 2008, 2010, 2012, 2014 may complement each other such that one dimension of compute 2006(1) for the resource-utilization model 224(1) may be relatively high, but the same dimension of compute 2006(2) for the resource-utilization model 224(2) may be relatively low.”)
Featonby, Guodong, Zhang, Qin, Achin and Zhu are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Featonby with teachings of Guodong, Zhang, Qin, Achin and Zhu to maintain efficient performance while minimizing resource and operation cost (Featonby, Abstract).
Guodong, Zhang, Qin, Achin, Zhu and Featonby do not teach:
wherein the global performance region comprises a plot of a range of performance indices as a function of constraints for the organization
Kaushik teaches:
wherein the global performance region comprises a plot of a range of performance indices as a function of constraints for the organization
(Kaushik, () “Storage systems may include storage processors, storage devices, front-end and back-end adapters, communication busses, etc. The storage processors receive and process 10 requests from hosts (e.g., client devices 104) that write data to and read data from the storage devices of the storage systems. Storage systems may be monitored during operation to display various performance characteristics. The output may be in the form of graphs plotting changes in one or more performance characteristics over time [wherein the global performance region comprises a plot of a range of performance indices as a function of constraints for the organization], or other visualizations of the monitored performance characteristics. The performance characteristics may include, by way of example, IOP S, latency over time, read percentage, etc. While a human operator can view the displayed output to try to detect trends corresponding to performance-impacting events, it is difficult to train a human operator to spot, diagnose and correct such issues. Further, some performance-impacting events may be difficult to discern via simple viewing of the displayed output.”)
Kaushik, Guodong, Zhang, Qin, Achin, Zhu and Featonby are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Kaushik with teachings of Guodong, Zhang, Qin, Achin, Zhu and Featonby to monitor workload and to classify performance impacting event to help maintain efficiency, (Kaushik, Abstract).
Claim 18 recites analogous limitations as claim 7, so is rejected under similar rationale.
Claim(s) 8, 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Guodong in view of Zhang, Qin, Achin and in further view of Rajkumar and Sui.
Regarding claim 8, Guodong in view of Zhang, Qin and Achin teach the method of claim 1.
Guodong in view of Zhang, Qin and Achin do not teach: further comprising: determining a first set of resourcing levels;
Sui teaches: further comprising: determining a first set of resourcing levels;
(Sui, page: 2, “In this paper, we introduce a privacy-preserving medical relation extraction model, named FedED. To prevent private information leakage, we leverage federated learning without sharing raw privacy sensitive medical texts. To overcome the communication bottleneck in federated relation extraction, we focus on reducing the size of transmitted messages at each communication round. To this end, we formulate the central aggregation process in federated learning as learning a compact central model (student) from the ensemble (Dietterich, 2000; Breiman, 2001) of multiple local models (teacher). From this perspective, only the predicted labels on a small dataset need to be uploaded to the central server [wherein a second set of resourcing levels] (i.e.: determination of the second set of resource levels, which involves uploading a small dataset to the central server is influenced by the characteristics of the first set of resource levels, which entail learning from a "teacher" model), because learning from a “teacher” model only requires the behavior of the “teacher” rather than the entire “teacher” network [determining a first set of resourcing levels] (Hinton et al., 2015). Besides, the ensemble model (teacher) is powerful, which defines the upper extreme of aggregating when limited to a single communication in federated learning (Yurochkin et al., 2019). To transfer the knowledge in the ensemble model to the central model, we leverage a strategy based on knowledge distillation (Hinton et al., 2015), which trains the central model by forcing it to have a similar prediction with the ensemble model.”)
Sui, Guodong, Zhang, Qin and Achin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Sui with teachings of Guodong, Zhang, Qin and Achin to mitigate privacy risks in relation to data extraction systems. By training local models with private data stored at individual platforms and aggregating model parameters centrally, the system ensures data privacy without uploading or storing sensitive information on a central server (Sui, Page: 2, “Previous relation extraction methods require centralizing the underlying training data from different medical platforms, such as hospitals and healthcare centers, on one server for training, while holding the centralized privacy-sensitive data puts patients’ privacy at risk. This is one of the reasons that hinder the use of relation extraction in clinical practice. As a possible solution, federated learning (McMahan et al., 2016) is proposed to make full use of privacy-sensitive data. Training local models with private data at local platforms and aggregating local models in the central server compose the federated learning process. In the framework of federated learning, no single piece of private data is uploaded to or stored on the central server, and only local models’ parameters are sent to the server for updating the central model.”)
Guodong in view of Zhang, Qin, Achin and Sui do not teach:
receiving user input from a user associated with the second set of models, the input indicative of a second set of resource levels;
selecting or training the first set of models using the first set of resourcing levels; and
selecting or training the second set of models using the second set of resourcing levels.
Rajkumar teaches:
receiving user input from a user associated with the second set of models, the input indicative of a second set of resource levels;
selecting or training the first set of models using the first set of resourcing levels; and
(Rajkumar, “[0003] In some implementations, robots can perform processes to learn new information and new abilities from situations they encounter. Techniques disclosed herein can enable a robot to incorporate new information so that the robot learns almost instantaneously from user feedback [receiving a user input from a user associated with the second set of models] (i.e.: when a user provides a robot with feedback for a previously unknown object, the robot may store the information and apply it to current and future tasks) or other sources of information. For example, a robot can have a local cache where certain types of learned information is stored. When the robot acquires new information, such as the classification for a previously unknown object, the robot can store a representation of the new information in the cache to make the information immediately available for the robot to use. This technique can allow near-instantaneous learning by a robot since the computational demands for incorporating new information, e.g., generating the representation and saving it in a cache accessed by robot systems, are extremely low. For example, the representation may have already been computed as part of the continuous onboard inference process of the robot. As a result, when a user provides a robot with a classification [the input indicative of a second set of resource levels] (i.e.: the user's classification serves as a constraint or specification on how the robot (which can be considered a model) should interpret or output its understanding of the previously unknown object. The classification provided by the user dictates how the robot should adjust its behavior or output when it encounters that object in the future) for a previously unknown object, the robot may store the information and apply it to current and future tasks with minimal delay [training the first set of models using the first set of resourcing levels] (i.e.: the robot incorporates the new information (user input) almost instantaneously and uses it for current and future tasks by training the models based on the user input, as the robot updates its knowledge base and improves its functionality based on the new information provided by the user).”)
selecting or training the second set of models using the second set of resourcing levels.
(Rajkumar, “[0005] The learned information collected from multiple robots can be used to re-train or update machine learning models [training the second set of models using the second set of resourcing levels] (i.e.: the learned information (second sets of resource levels) collected from multiple robots (sets of models) can be used to retrain or update machine learning models (second sets of models), which can then be distributed to each of the robots), which can then be distributed to each of the robots. For example, the server can periodically update a machine learning model used by the robots, at an interval significantly longer than the interval for sharing learning representations among robots. For example, if representations are shared daily among robots, the machine learning model may be updated each week, every two weeks, or each month. The updated machine learning model can incorporate the combined set of robot learning that occurred across the fleet over the previous interval, e.g., the last week or month.”)
Rajkumar, Guodong, Zhang, Qin, Achin and Sui are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Rajkumar with teachings of Guodong, Zhang, Qin, Achin and Sui to add continuous and incremental learning to the machine learning model. By periodically updating the machine learning model and frequently providing new learned information, the system enables individual units to fine-tune their behavior regularly (Rajkumar, ¶[0006].)
Claim 19 recites analogous limitations as claim 8, so is rejected under similar rationale.
Regarding claim 11, Guodong in view of Zhang, Qin and Achin teach the method of claim 1.
Guodong in view of Zhang, Qin and Achin do not teach:
further comprising: receiving data characterizing user input specifying a training objective; wherein the first set of models is trained based at least on the training objective.
Rajkumar teaches:
further comprising: receiving data characterizing user input specifying a training objective; wherein the first set of models is trained based at least on the training objective.
(Rajkumar, “[0003] In some implementations, robots can perform processes to learn new information and new abilities from situations they encounter. Techniques disclosed herein can enable a robot to incorporate new information so that the robot learns almost instantaneously from user feedback [receiving data characterizing user input specifying a training objective] (i.e.: feedback for a previously unknown object used to retrain or update the machine learning model) or other sources of information. For example, a robot can have a local cache where certain types of learned information is stored. When the robot acquires new information, such as the classification for a previously unknown object, the robot can store a representation of the new information in the cache to make the information immediately available for the robot to use. This technique can allow near-instantaneous learning by a robot since the computational demands for incorporating new information, e.g., generating the representation and saving it in a cache accessed by robot systems, are extremely low. For example, the representation may have already been computed as part of the continuous onboard inference process of the robot. As a result, when a user provides a robot with a classification for a previously unknown object, the robot may store the information and apply it to current and future tasks with minimal delay [wherein the first set of models is trained based at least on the training objective] (i.e.: the robot incorporates the new information (user input) almost instantaneously and uses it for current and future tasks by training the models based on the user input, as the robot updates its knowledge base and improves its functionality based on the new information provided by the user).”)
Rajkumar, Guodong, Zhang, Qin and Achin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Rajkumar with teachings of Guodong, Zhang, Qin and Achin to add continuous and incremental learning to the machine learning model. By periodically updating the machine learning model and frequently providing new learned information, the system enables individual units to fine-tune their behavior regularly (Rajkumar, ¶[0006].)
Claim(s) 9 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Guodong in view of Zhang, Qin, Achin and in further view of Zhu and Mao et al., Pub. No.: US20230034384A1, (hereafter Mao).
Regarding claim 9, Guodong in view of Zhang, Qin and Achin teach the method of claim 1.
Guodong in view of Zhang, Qin and Achin do not teach: further comprising
training the first set of models, the training comprising: receiving data characterizing the first set of models trained on the first dataset using a first set of resourcing levels, the first set of resourcing levels specifying a condition on outputs of the first set of models;
assessing, using the first set of resourcing levels, performance of the first set of models;
determining, using the assessment, a first feasible performance region, the first feasible performance region associating each resourcing level in the first set of resourcing levels with a model in the first set of models; and
Mao teaches: further comprising
receiving data characterizing the first set of models trained on the first dataset using a first set of resourcing levels, the first set of resourcing levels specifying a condition on outputs of the first set of models;
assessing, using the first set of resourcing levels, performance of the first set of models;
determining, using the assessment, a first feasible performance region, the first feasible performance region associating each resourcing level in the first set of resourcing levels with a model in the first set of models; and
(Mao, “[0007] In some examples, the methods further include evaluating a performance of the first machine learning model and training the second machine learning model using data determined in evaluating the performance of the first machine learning model. In these examples, evaluating the performance of the first machine learning model [assessing, performance of the first set of models] includes, for each of the multiple user profiles [using the first set of resourcing levels], determining a predicted label for the user profile [determining, using the assessment] (i.e.: determining performance (through residue values) for user profiles) and determining a residue value for the user profile indicating an error in the predicted label [a first feasible performance region, the first feasible performance region associating each resourcing level in the first set of resourcing levels with a model in the first set of models]. Also, in these examples, determining the predicted label for the user profile includes determining, by the first computing system, a first share of a predicted label for the user profile based at least in part on (i) a first share of the user profile, (ii) the first machine learning model, and (iii) one or more of the true labels for the user profiles, receiving, by the first computing system and from the second computing system, data indicating a second share of the predicted label for the user profile [receiving data characterizing the first set of models trained on the first dataset using a first set of resourcing levels, the first set of resourcing levels specifying a condition on outputs of the first set of models] (i.e.: receiving data about predicted labels) determined by the second computing system based at least in part on a second share of the user profile and the first set of one or more machine learning models maintained by the second computing system, and determining the predicted label for the user profile based at least in part on the first and second shares of the predicted label. Additionally, in such examples, determining the residue value for the user profile includes determining, by the first computing system, a first share of the residue value for the user profile based at least in part on the predicted label determined for the user profile and a first share of a true label for the user profile included in the true labels, receiving, by the first computing system and from the second computing system, data indicating a second share of the residue value for the user profile determined by the second computing system based at least in part on the predicted label determined for the user profile and a second share of the true label for the user profile, and determining the residue value for the user profile based at least in part on the first and second shares of the residue value. In the aforementioned examples, training the second machine learning model using data determined in evaluating the performance of the first machine learning model includes training the second machine learning model using data indicating the residue values determined for the user profiles in evaluating the performance of the first machine learning model.”)
Mao, Guodong, Zhang, Qin and Achin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Mao with teachings of Guodong, Zhang, Qin and Achin to enhance the accuracy of the prediction (Mao, “[0008] In some of the aforementioned examples, before evaluating the performance of the first machine learning model, the methods further include deriving a set of parameters of a function and configuring the first machine learning model to, given a user profile as input, generate an initial predicted label for the user profile and apply the function, as defined based on the derived set of parameters, to the initial predicted label for the user profile to generate, as output, a first share of a predicted label for the user profile. In at least some of these examples, deriving the set of parameters of the function includes (i) deriving, by the first computing system, a first share of the set of parameters of the function based at least in part on a first share of each of the multiple true labels, (ii) receiving, by the first computing system and from the second computing system, data indicating a second share of the set of parameters of the function derived by the second computing system based at least in part on a second share of each of the multiple true labels, and (iii) deriving the set of parameters of the function based at least in part on the first and second shares of the set of parameters of the function. In at least some of the aforementioned examples, the function is a second degree polynomial function.”)
Guodong in view of Zhang, Qin, Achin and Mao do not teach: displaying the first feasible performance region.
Zhu teaches: displaying the first feasible performance region.
(Zhu, page: 6, “4 Empirical results: In this section, we firstly evaluate two IEML methods on 377 data sets using both NN (i.e., Multilayer Perceptron) and DT (i.e., C4.5) as the base classifier for choosing the appropriate one. Then, we show [displaying the first feasible performance region] that IEML methods compete quite outstanding against EML methods and IML method by analyzing prediction evaluation criteria [performance region] (i.e., average accuracy, type I error, II error, ‘precision’ rate, ‘recall’ rate, ‘F-Measure’ rate and ROC curve), which are excellent methods for predicting SMEs credit risk in SCF. Meanwhile, we compare the prediction evaluation criteria of multi-boosting with RS–boosting in order to find the better IEML method in predicting SMEs’ credit risk.”)
Zhu, Guodong, Zhang, Qin, Achin and Mao are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Zhu with teachings of Guodong, Zhang, Qin and Mao to include individual, ensemble, and integrated ensemble techniques, to accurately predict risks and enhance reliability and decision-making, supporting smoother and more secure operations (Zhu, Abstract).
Claim 20 recites analogous limitations as claim 9, so is rejected under similar rationale.
Claim 10 rejected under 35 U.S.C. 103 as being unpatentable over Guodong in view of Zhang, Qin, Achin and in further view of Aryafar et al., "An ensemble-based approach to click-through rate prediction for promoted listings at Etsy", (hereafter Aryafar).
Guodong in view of Zhang, Qin and Achin teach the method of claim 1.
Guodong in view of Zhang, Qin and Achin do not teach: further comprising:
determining a first set of resourcing levels corresponding to a first ratio of value per action or cost per action associated with the first set of models;
determining a second set of resourcing levels such that a second ratio of value per action or cost per action associated with the second set of models; wherein the first ratio and the second ratio are equal.
Aryafar teaches: further comprising:
determining a first set of resourcing levels corresponding to a first ratio of value per action or cost per action associated with the first set of models;
determining a second set of resourcing levels such that a second ratio of value per action or cost per action associated with the second set of models; wherein the first ratio and the second ratio are equal.
(Aryafar, page: 2, “For both aspects, Etsy as a platform would operate on seller’s behalf. Etsy uses Cost-Per-Click (CPC) model [corresponding to a first ratio of value per action or cost per action associated with the first set of models], meaning that the site charges sellers budget when a buyer clicks on the promoted listing [determining a first set of resourcing levels] (i.e.: setting an initial resourcing level based on a fixed strategy to establish the first ratio of value per action (CPC)). A similar program exists in eBay5 while it uses Cost-Per-Action (CPA) model [such that a second ratio of value per action or cost per action associated with the second set of models]. As Etsy is using CPC model to operate promoted listings, in order to optimize the platform’s revenue [determining a second set of resourcing levels] (i.e.: adjusting the resourcing levels (bids and click rates) to achieve a higher ratio of value per action, reflecting the second set of models with optimized revenue goals), it implies that we need more clicks for each promoted listing and each such clicked listing pays more. In other words, we would like to have higher bl,q and θl,q for each clicked listing l to the query q. In this paper, we discuss the methodologies and systems to drive CTR θ, given a fixed bidding strategy which computes b [wherein the first ratio and the second ratio are equal] (i.e.: using a fixed bidding strategy (“b”) to manage promoted listings, “given a fixed bidding strategy which computes b” to maintain a consistent approach of determining bids, ensuring that the cost per click (CPC) does not change as click volume increases. Therefore, the “first ratio” of value per action (initial CPC) and the “second ratio” of value per action (CPC after optimization) are equal, to ensure that while the number of clicks increases, the cost per action remains stable, maintaining equal ratios). To our knowledge, our paper is the first study to systematically discuss how a promoted listings system can be built with practical considerations.”)
Aryafar, Guodong, Zhang, Qin and Achin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Aryafar with teachings of Guodong, Zhang, Qin and Achin to improve predictive accuracy using an ensemble learning model. By combining historical and behavioral data with content-based features, the system can enhance the prediction of key metrics such as Click-Through Rate (CTR). (Aryafar, Abstract).
Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Guodong in view of Qin, Achin and in further view of Zhang et al., "Slaq: quality-driven scheduling for distributed machine learning.", (hereafter Zhang H.)
Regarding claim 21, Guodong teaches: A system comprising: at least one data processor; and memory storing instructions which, when executed by the at least one data processor, causes the at least one data processor to perform operations comprising:
(Guodong, page: 1, “As stated by McKinsey & Company [6], open banking could bring benefits to banks in various ways, including better customer experience, increased revenue streams, and a sustainable service model for under-served markets. Open banking will form a new ecosystem for financial services by sharing banking data across organizations [a first organization] [a second organization] and providing new services. However, there are inherent risks in sharing banking data, which is sensitive, privacy-concerned, and valuable. It is critical to developing processes and governance underpinning the technical connections. Moreover, the European Union’s General Data Protection Regulation (GDPR) [12] enforces organizations to pay great attention when sharing and using customers’ data.”)
training a first model associated with a first sub-organization of an organization based on a first dataset, the first model including a first plurality of submodels trained at differing resource levels;
PNG
media_image1.png
284
574
media_image1.png
Greyscale
[AltContent: textbox ([training a first model associated with a first sub-organization of an organization based on a first dataset, the first model including a first plurality of submodels trained at differing resource levels])][AltContent: rect](Guodong, Fig. 1)
training a second model associated with a second sub-organization of an organization based on a second dataset, the second model including a second plurality of submodels trained at the differing resource levels;
PNG
media_image1.png
284
574
media_image1.png
Greyscale
[AltContent: textbox ([training a second model associated with a second sub-organization of an organization based on a second dataset, the second model including a second plurality of submodels trained at the differing resource levels;])][AltContent: rect](Guodong, Fig. 1)
Guodong does not teach:
determining a resource allocation between the first organization and the second organization such that a first level of resource is provided to the first organization and a second level of resource is provided to the second organization.
corresponding to a first ratio or value per action or cost per action associated with the first plurality of submodels;
wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization;
Zhang H. teaches:
determining a resource allocation between the first organization and the second organization such that a first level of resource is provided to the first organization and a second level of resource is provided to the second organization.
(Zhang H. page: 5,“SLAQ is a cluster management framework that hosts multi-tenant approximate ML training jobs running on shared resources. A centralized SLAQ scheduler coordinates the resource allocation of multiple ML training jobs [determining a resource allocation between the first organization and the second organization such that a first level of resource is provided to the first organization and a second level of resource is provided to the second organization]. As shown in Figure 4(a), each job is composed of a set of tasks. Each task processes data based on the ML algorithm on a small partition of the dataset, and can be scheduled to run on any node. The driver program contains the iterative training logic, generates tasks for each iteration, and tracks the overall progress of the job. In the case of training ML models, a task generates an update to the model parameters based on a partition of the training dataset. The duration of a task typically ranges from tens of milliseconds to a few seconds. When the tasks finish processing the data, the updates from all tasks are aggregated and sent back to the job driver program to update the primary copy of the model.”)
selecting a first subgroup from the first model that corresponds to the first resource level; and
selecting a second subgroup from the second model that corresponds to the second resource level.
PNG
media_image3.png
286
321
media_image3.png
Greyscale
[AltContent: textbox ([selecting a first subgroup from the first model that corresponds to the first resource level;])][AltContent: rect][AltContent: rect][AltContent: textbox ([selecting a second subgroup from the second model that corresponds to the second resource level])](Zhang H. Fig. 4)
Zhang H. and Guodong are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Zhang H. with teachings of Guodong to ensures faster and more efficient model improvements (Zhang H., Abstract).
Guodong in view of Zhang H. do not teach:
corresponding to a first ratio or value per action or cost per action associated with the first plurality of submodels;
wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization;
Qin teaches:
corresponding to a first ratio or value per action or cost per action associated with the first plurality of submodels;
(Qin, “[0031] In an example scenario, an advertiser 110 sends an ad submission 112 to search engine provider 102. For example, ad submission 112 [a first set of resourcing levels] (i.e.: the ad submission process involves assigning a value to an action (e.g., a user clicking an ad or completing a purchase)) may include an ad along with an associated bid value, and an associated keyword. If the advertiser will be tracking different types of actions, the ad submission 112 may also include an associated action indicator to distinguish which type of action the ad is to be associated with. Ad submission 112 may also include a redirection URL that specifies a landing page on the advertiser's website to which a user is to be redirected if the user selects the ad. According to the cost-per-action model [corresponding to a first ratio or value per action or cost per action associated with the first plurality of submodels], the advertiser will only be charged if the ad is displayed and the specified action is reported by the advertiser.”)
Qin, Guodong and Zhang H. are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Qin with teachings of Guodong and Zhang H. to enable a cost-per-action (CPA) model, ensuring advertisers pay based on actual user engagement. Contextual ad auctions optimize placements, while periodic reports verify actions, enhancing transparency and ROI, (Qin, Abstract).
Guodong, Zhang H and Qin do not teach:
wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization;
Achin teaches:
wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization
((Achin, col. 5 line[33 – 44] – col. 6 line[1 – 6]) “In some embodiments, the selected modeling procedures comprise first and second modeling procedures determined to have first and second suitabilities for the predicted problem, respectively, the first suitability of the first modeling procedure being greater than the second suitability of the second modeling procedure, and wherein the resource allocation schedule allocates resources [wherein the first set of resourcing levels comprises an allocation of one or more resources to the first sub-organization] of the processing nodes to the first and second modeling procedures based, at least in part, on the first and second suitabilities.”)
Achin, Guodong, Zhang H and Qin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Achin with teachings of Guodong, Zhang H and Qin to allocate resources more efficiently and improve accuracy in forecasting advertiser actions and costs. (Achin, Abstract).
Claim(s) 22 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Guodong in view of Zhang H., Qin, Achin and in further view of Sun et al., "Application of machine learning in wireless networks: Key techniques and open issues.", (hereafter Sun).
Regarding claim 22, Guodong in view of Zhang H., Qin and Achin teach the method of claim 21.
Guodong in view of Zhang H., Achin and Qin do not teach: wherein determining the resource allocation includes determining an optimal allocation of resources between the first organization and the second organization and based at least on a global constraint.
Sun teaches: wherein determining the resource allocation includes determining an optimal allocation of resources between the first organization and the second organization and based at least on a global constraint.
(Sun, page: 15, “Machine Learning Based Beamforming Considering the ever-increasing QoS requirements and the need for real-time processing in practical systems, authors in [86] propose a supervised learning based resource allocation framework to quickly output the optimal or a near optimal resource allocation [determining the resource allocation includes] solution for the current scenario. Specifically, the data related to historical scenarios is collected and the feature vector is extracted for each scenario. Then, the optimal or near optimal resource allocation [determining an optimal allocation of resources between the first organization and the second organization and based at least on a global constraint] (i.e.: the use of historical data, cloud computing, and transforming the problem into a classification problem implies that the solution takes into account overarching factors and constraints to find the optimal allocation) plan can be searched off-line by taking the advantage of cloud computing. After that, those feature vectors with the same resource allocation solution are labeled with the same class index. Up to now, the remaining task to determine resource allocation for a new scenario is to identify the class of its corresponding feature vector, and that is the resource allocation problem is transformed into a multi-class classification problem, which can be handled by supervised learning. To make the application of the proposal”)
Sun, Guodong, Zhang H., Qin and Achin are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Sun with teachings of Guodong, Zhang H., Qin and Achin to autonomously allocate resources (Sun, “Faced with the above issues, machine learning techniques including model-free reinforcement learning and NNs can be employed. Specifically, reinforcement learning can learn a good resource management policy based on only the reward/cost fed back by the environment, and quick decisions can be made for a dynamic network once a policy is learned. In addition, owing to the superior approximation capabilities of deep NNs, some high complexity resource management algorithms can be approximated, and similar network performance can be achieved but with much lower complexity. Moreover, NNs can be utilized to learn the content popularity, which helps fully make use of limited cache resource, and distributed Q-learning can endow each node with autonomous decision capability for resource allocation. In the following, the applications of machine learning in power control, spectrum management, backhaul management, beamformer design, computation resource management and cache management will be introduced”.)
Regarding claim 24, Guodong in view of Zhang H., Achin and Qin teach the method of claim 21.
Sun further teaches: wherein determining the resource allocation includes determining an optimal allocation of resources between the first organization and the second organization based at least on an organizational objective.
(Sun, page: 15, “Machine Learning Based Beamforming Considering the ever-increasing QoS requirements [based at least on an organizational objective] and the need for real-time processing in practical systems, authors in [86] propose a supervised learning based resource allocation framework to quickly output the optimal or a near optimal resource allocation [determining the resource allocation includes] solution for the current scenario. Specifically, the data related to historical scenarios is collected and the feature vector is extracted for each scenario. Then, the optimal or near optimal resource allocation [determining the resource allocation includes determining an optimal allocation of resources between the first organization and the second organization] (i.e.: the use of historical data, cloud computing, and transforming the problem into a classification problem implies that the solution takes into account overarching factors and constraints to find the optimal allocation) plan can be searched off-line by taking the advantage of cloud computing. After that, those feature vectors with the same resource allocation solution are labeled with the same class index. Up to now, the remaining task to determine resource allocation for a new scenario is to identify the class of its corresponding feature vector, and that is the resource allocation problem is transformed into a multi-class classification problem, which can be handled by supervised learning. To make the application of the proposal”.)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Sun with teaching of Guodong, Zhang H., Qin and Achin for the same reasons disclosed for claim 22.
Claim 23 is rejected under 35 U.S.C. 103 as being unpatentable over Guodong in view of Zhang H., Qin, Achin, Sun and in further view of Feng et al., Pub. No.: US11748666B2, (hereafter Feng).
Guodong in view of Zhang H., Qin, Achin and Sun teach the method of claim 22.
Guodong in view of Zhang H., Qin, Achin and Sun do not teach: the operations further comprising: receiving data characterizing a change to the global constraint or a new global constraint
Feng teaches: the operations further comprising: receiving data characterizing a change to the global constraint or a new global constraint.
(Feng, col.8 42:55ff, “Global parameters memory 408 a is a shared memory from which CPU 403 a-1 and CPU 403 a-2 are both able to read the global parameters received [receiving data characterizing] from global parameter server 401. Likewise, global parameters memory 408 b is a shared memory from which CPU 403 b-3 and CPU 403 b-4 are both able to read the global parameters [a change to the global constraint or a new global constraint] received from global parameter server 401. Thus, the bandwidth consumption between global parameter server 401 and machine 451-A and between global parameter server 401 and machine 451-B is greatly reduced, since CPU 403 a-1 and CPU 403 a-2 are able to share the same parameters found in global parameters memory 408 a (and since CPU 403 b-3 and CPU 403 b-4 are able to share the same parameters found in global parameters memory 408 b).”)
determining a second resource allocation between the first organization and the second organization such that a third level of resource is provided to the first organization and a fourth level of resource is provided to the second organization, wherein the determining the second resource allocation is based at least on the change to the global constraint or the new global constraint;
(Feng, col.11 2:18ff, “In an embodiment of the present invention, a second machine (e.g., machine 451-B) also receives the first set of global parameters [between the first organization] from the global parameter server, executes the algorithm using the first set of global parameters and a second mini-batch of data [and the second organization] [determining a second resource allocation] known to describe the entity type, in order to generate a second consolidated set of gradients that describe the accuracy of the algorithm in modeling the entity type when using the first set of global parameters. The second machine then transmits the second consolidated set of gradients to the global parameter server, which creates a third set of global parameters [such that a third level of resource is provided to the first organization and a fourth level of resource is provided to the second organization], which are received by and used by the first machine and the second machine [wherein the determining the second resource allocation]. This third set of global parameters is a modification of the first set of global parameters [is based at least on the change to the global constraint or the new global constraint] based on the first consolidated set of gradients and the second consolidated set of gradients.”)
selecting a third subgroup from the first model that corresponds to the third resource level; and selecting a fourth subgroup from the second model that corresponds to the fourth resource level.
(Feng, col.11 2:18ff, “In an embodiment of the present invention, a second machine [from the second model that corresponds to the fourth resource level] (e.g., machine 451-B) also receives the first set of global parameters [selecting a fourth subgroup] from the global parameter server, executes the algorithm using the first set of global parameters and a second mini-batch of data known to describe the entity type, in order to generate a second consolidated set of gradients that describe the accuracy of the algorithm in modeling the entity type when using the first set of global parameters. The second machine then transmits the second consolidated set of gradients to the global parameter server, which creates a third set of global parameters [selecting a third subgroup], which are received by and used by the first machine [from the first model that corresponds to the third resource level] and the second machine. This third set of global parameters is a modification of the first set of global parameters based on the first consolidated set of gradients and the second consolidated set of gradients.”)
Feng, Guodong, Zhang H., Qin, Achin and Sun are related to the same field of endeavor (i.e.: training machine learning models). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Feng with teachings of Guodong, Zhang H., Qin, Achin and Sun to promote faster convergence and collaborative learning, facilitating effective handling of large datasets and complex models (Feng, col.1 15:36ff, Summary).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Hubinette, et al. Pub. No.: US8572115B2, (2012).
Hubinette describes a system for finding negative keywords related to advertisements. It explains a method where keywords from search queries or web page content are identified. A function checks if these queries are irrelevant to the ads, scoring them accordingly. Based on these scores, exclusion keywords are created to avoid irrelevant content while still matching a few relevant queries. This helps in identifying negative keywords.
Datar, et al. Pub. No.: US8380563B2, (2008).
Datar describes a system that targets ads based on a user's current and past search queries. It can use the past query alongside the current one, or find it separately. The system checks if the past query is relevant to the current query. If it is, it combines both queries to identify relevant advertising keywords, which are then used to select ads to show the user along with their search results. This method can also apply to more than two search queries.
Any inquiry concerning this communication or earlier communications from the examiner
should be directed to MATIYAS T MARU whose telephone number is (571)270-0902. The examiner
can normally be reached Monday 8:00am - Friday 4:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to
use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,
Michelle Bechtold can be reached on (571)431-0762. The fax phone number for the organization were this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users.
To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit
https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and
https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional
questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like
assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA)
or 571-272-1000.
/M.T.M./ Examiner, Art Unit 2148
/MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148