DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This action is in response to the submission filed 10 September 2025 for application 17/366,249. Currently claims 1, 3, 10, 11, 16, and 17 have been amended. Claims 1-20 are pending and have been examined.
The §112(b) rejection of claims 3, 11, and 17 have been withdrawn in view of the amendments made.
Response to Arguments
Applicant’s arguments, filed 10 September 2025, with respect to the 35 U.S.C. 101 rejections, specifically on Page 9 Applicant argues that the Office Action rejects claims 1-20 under U.S.C. §101. The Office Actions states that the claims are directed to the abstract idea of mathematical processes and mental processes. Claim 1 includes the feature of "generating a prediction using machine learning of how a user associated with the first entity will respond to a communication from the first entity, the prediction associated with the first entity and being based on: the set of local samples; the set of non-local samples; the local sample weight; and the optimized non-local sample weight." Claim 1 is directed at improving the technical field of machine learning, specifically for generating predictions of user reactions to communications, such as emails and web ads, which is a technical field under MPEP 2106.04(d)(1), and the claims are thus directed to an improvement in a technical field without monopolizing any judicial exception. Improvements to a technical field in a claim show that the claim integrated judicial exceptions into a practical application per 2106.04(d)(1).
Examiner’s response: Applicant’s arguments have been fully considered but are not persuasive. Examiner respectfully disagrees that claim 1 is directed at improving the technical field of machine learning because firstly, as stated in the argument presented by the Applicant, the claim is directed to generating predictions of user reactions to communications, and therefore the improvement is not in the actual machine learning but is in the generating a prediction limitation which has been identified as an abstract idea in Step 2A, prong 1 of the 101 analysis. Improvement in an abstract idea is still an abstract idea, and as disclosed in MPEP 2106.05(a) it is important to note, the judicial exception alone cannot provide the improvement. The limitation of “using machine learning” has been identified as an additional element in Sep 2A, prong 2 because it represents no more than mere instructions to apply the judicial exception of generating a prediction on a computer using machine learning. Secondly, there are no details of the actual machine learning steps being recited. Thirdly, as discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application. Hence, the claims are rejected.
Applicant’s arguments, filed 10 September 2025, see pages 10 and 11, with respect to the 35 U.S.C. 103 rejections, Applicant argues specifically on Page 10 that the Office Action cites Li regarding the feature of "non-local samples" of the present claims. However, in the present claims, the "non-local samples" are samples that belong toa different entity than the first entity, to whom the "local samples" belong. In Li, "local information" in a data set includes "neighbor relations of each data sample", and "non-local information" in a data set is just relations between data samples that are neighbors, i.e., are more distant from each other within the data set. "Non-local information" in Li, and the data used to train the "non-local templates", do not from come other data sets, i.e., data sets belonging to different entity. Note that on page 3744 of Li, the description of the experiment involving yeast gene data sets does not use data includes "non-local samples" as described in the present claims, as both the "local information" and "non-local information" Li thus does not disclose or suggest using "local samples" belonging to a first entity and "non-local samples" that belong to a different entity, and are therefore from entirely different data sets than the "local samples."
For at least these reasons, claim 1 is allowable over Li, Wang, and Angermueller. Claims 11, and 17 are allowable for at least the same reasons as claim 1. All dependent claims are allowable for at least the same reasons as claims 1, 11, and 17.
Examiner’s response: Applicant’s arguments have been fully considered but are not persuasive because first entity and a different entity are broad terms and as explained in the previous rejection (dated 6/10/2025) the cited references teach local and non-local samples. For example, reference Li states in the Abstract that – “One popular way to tackle this kind of data is training a local kernel machine or a mixture of several locally linear models. However, both of these approaches heavily relies on local information, such as neighbor relations of each data sample, to capture potential data distribution. In this paper, we show the non-local information is more efficient for data representation. With an implementation of a winner-take all autoencoder, several non-local templates are trained to trace the data distribution and to represent each sample in different subspaces with a suitable weight”. Here local information shows that it is associated with a neighbor corresponding to a first entity and non-local templates are trained to trace the data distribution and to represent each sample in different subspaces shows that the non-local samples may be associated with a number of entities other than the first entity. These non-local templates correspond to non-local samples because the non-local template represents samples in different subspaces which correspond with a plurality of other entities other than the first entity. Page 3742, Column 1, Section II, Paragraph 1 of Li further states that in this section, we summarize several previously proposed mixtures of locally linear models into one formulation. The reason a classifier is called to be local is that it commonly has a linear form with a multiplication of a function defining its influential local region. ”. Here locally linear models shows that it is associated with its influential local region corresponding to a first entity. Hence, the cited references are relevant and teach each and every element of the independent claims 1, 10, and 15 and all other claims dependent therefrom, as shown in the detailed rejection below.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite mathematical concepts and mental processes. This judicial exception is not integrated into a practical application and the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Regarding Claims 1-9:
According to the first step (Step 1) of the 101 analysis, Claims 1-9 recite a computer-implemented method comprising: Thus, the claim is to a process, which is one of the four statutory categories of inventions.
Regarding Claim 1:
In the next step (Step 2A, Prong 1) of the analysis, the limitations of:
determining a hyperparameter for influencing non-local samples,
identifying a set of local samples associated with a first entity;
identifying a set of non-local samples comprising samples associated with a plurality of entities other than the first entity;
assigning a local sample weight to one or more samples of the set of local samples;
determining a range of non-local sample weights;
determining a range of hyperparameters based on the range of non-local sample weights;
determining an optimized hyperparameter based on the range of hyperparameters;
assigning an optimized non-local sample weight to one or more samples of the set of non-local samples, the optimized non-local sample weight based on the optimized hyperparameter;
generating a prediction of how a user associated with the first entity will respond to a communication from the first entity, the prediction associated with the first entity and being based on:
the set of local samples;
the set of non-local sample;
the local sample weight; and
the optimized non-local sample weight.
Under the broadest reasonable interpretation, the above limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the mind or with the aid of pencil and paper but for the recitation of a generic computer component. If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, the limitation of using machine learning is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a method using machine learning) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
In the last step (Step 2B) of the analysis, the additional element of using machine learning, does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, the method using machine learning, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception. The claim is not patent eligible.
Regarding Claim 2:
In (Step 2A, Prong 1), the limitation of:
wherein the local sample weight is 1,
Under the broadest reasonable interpretation, the above limitations are process steps that recite mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding Claim 3:
In (Step 2A, Prong 1), the limitations of:
wherein the range of non-local sample weights is between:
a total number of samples in the set of local samples over a total number of samples in the set of local samples and the set of non-local samples; and
an integer value 1.
Under the broadest reasonable interpretation, the above limitations are process steps that recite mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding Claim 4:
In step (Step 2A, Prong 1) the limitation of:
wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non-local sample weight, determining an associated hyperparameter to be a ratio of a total number of samples in the set of local samples to a difference between a total number of samples in the set of non- local samples and the total number of samples in the set of local samples multiplied by the one non-local sample weight plus the total number of samples in the set of local samples.
Under the broadest reasonable interpretation, the above limitations are process steps that recite mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding Claim 5:
In step (Step 2A, Prong 1), the limitations of:
wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non-local sample weight, determining an associated hyperparameter to be a ratio of a total number of samples in the set of local samples multiplied by the local sample weight to a difference between a total number of samples in the set of non-local samples and the total number of samples in the set of local samples multiplied by the one non-local sample weight plus the total number of samples in the set of local samples multiplied by the local sample weight.
Under the broadest reasonable interpretation, the above limitations are process steps that recite mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding Claim 6:
In step (Step 2A, Prong 1), the limitations of:
wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non- local sample weight, determining an associated hyperparameter to be a ratio of a sum of local sample weights assigned to the one or more samples of the set of local samples to a sum of non-local sample weights assigned to the one or more samples of the set of non-local samples plus the sum of the local sample weights assigned to the one or more samples of the set of local samples.
Under the broadest reasonable interpretation, the above limitations are process steps that recite mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding Claim 7:
In step (Step 2A, Prong 2), the limitation of:
wherein determining an optimized hyperparameter based on the range of hyperparameters comprises performing a grid search
is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a method wherein determining an optimized hyperparameter based on the range of hyperparameters comprises performing a grid search) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
In the last step (Step 2B) of the analysis, the additional element does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, method wherein determining an optimized hyperparameter based on the range of hyperparameters comprises performing a grid search, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception. The claim is not patent eligible.
Regarding Claim 8:
In step (Step 2A, Prong 1) the limitation of:
wherein determining an optimized hyperparameter based on the range of hyperparameters comprises utilizing a Bayesian optimization.
Under the broadest reasonable interpretation, the above limitations are process steps that recite mathematical relationships and calculations but for the recitation of generic computer components. If a claim, under its broadest reasonable interpretation covers mathematical concepts but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding Claim 9:
In step (Step 2A, Prong 1) the limitation of:
wherein the prediction is a prediction of an action to be taken by one or more individuals associated with the first entity.
Under the broadest reasonable interpretation, the above limitations are process steps that cover mental processes including an observation, evaluation, judgment or opinion that could be performed in the mind or with the aid of pencil and paper but for the recitation of a generic computer component. If a claim, under its broadest reasonable interpretation, covers a mental process but for the recitation of generic computer components, then it falls within the “Mental Process” grouping of abstract ideas.
In the next step (Step 2A, prong 2) of the analysis, it does not integrate into a practical application because it does not add any additional elements that integrate the abstract idea into practical application.
In the last step (Step 2B) of the analysis, it does not add any additional elements that amount to significantly more than the abstract idea and thus fails to add an inventive concept. The claim is not patent eligible.
Regarding claims 10-15:
According to the first step (Step 1) of the 101 analysis, Claims 10-15 recite a non-transitory machine-readable storage medium that provides instructions that, if executed by a processor, are configurable to cause the processor to perform operations. Thus, the claim is to a manufacture, which is one of the four statutory categories of inventions.
Regarding Claim 10:
The claim recites substantially similar limitation to claim 1 and is therefore rejected on the same basis.
Regarding Claim 11:
The claim recites substantially similar limitation to claim 3 and is therefore rejected on the same basis.
Regarding Claim 12:
The claim recites substantially similar limitation to claim 4 and is therefore rejected on the same basis.
Regarding Claim 13:
The claim recites substantially similar limitation to claim 7 and is therefore rejected on the same basis.
Regarding Claim 14:
The claim recites substantially similar limitation to claim 8 and is therefore rejected on the same basis.
Regarding Claim 15:
The claim recites substantially similar limitation to claim 9 and is therefore rejected on the same basis.
Regarding claims 16-20:
According to the first step (Step 1) of the 101 analysis, claims 16-20 are directed to an apparatus comprising: a processor; and a non-transitory machine-readable storage medium that provides instructions that, if executed by a processor, are configurable to cause the processor to perform operations (manufacture) and falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Regarding Claim 16:
The claim recites substantially similar limitation to claim 1 and is therefore rejected on the same basis.
Regarding Claim 17:
The claim recites substantially similar limitation to claim 3 and is therefore rejected on the same basis.
Regarding Claim 18:
The claim recites substantially similar limitation to claim 4 and is therefore rejected on the same basis.
Regarding Claim 19:
The claim recites substantially similar limitation to claim 7 and is therefore rejected on the same basis.
Regarding Claim 20:
The claim recites substantially similar limitation to claim 8 and is therefore rejected on the same basis.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1 and 3-20 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (Non-Local Information for a Mixture of Multiple Linear Classifiers) in view of Wang et al. (Non-local Neural Networks) and Angermueller et al. (Population-Based Black-Box Optimization for Sequence Design) and further in view of Kursun (US 10992765 B2).
Regarding Claim 1:
Li et al teaches: A computer-implemented method for determining a hyperparameter for influencing non-local samples in machine learning, the method comprising ([Abstract] Page 3741, Column 1, Lines 1-17, “For many problems in machine learning fields, the data are nonlinearly distributed. One popular way to tackle this kind of data is training a local kernel machine or a mixture of several locally linear models. However, both of these approaches heavily relies on local information, such as neighbor relations of each data sample, to capture potential data distribution. In this paper, we show the non-local information is more efficient for data representation. With an implementation of a winner-take-all autoencoder, several non-local templates are trained to trace the data distribution and to represent each sample in different subspaces with a suitable weight. By training a linear model for each subspace in a divide and conquer manner, one single support vector machine can be formulated to solve nonlinear classification problems. Experimental results demonstrate that a mixture of multiple linear classifiers from non-local information performs better than or is at least competitive with state-of-the art mixtures of locally linear models”; [Page 3741, Column 2, Paragraph 2] Thus, it is very sensitive to the selection of hyper-parameters to decide a reasonable evaluation metric of locality. Note: the hyperparameters influence the non-local templates before the training process begins):
identifying a set of local samples associated with a first entity ([Page 3742, Column 1, Section II, Paragraph 1] In this section, we summarize several previously proposed mixtures of locally linear models into one formulation. The reason a classifier is called to be local is that it commonly has a linear form with a multiplication of a function defining its influential local region. Note: Local region corresponds to the first entity);
identifying a set of non-local samples comprising samples associated with a plurality of entities other than the first entity ([Abstract] One popular way to tackle this kind of data is training a local kernel machine or a mixture of several locally linear models. However, both of these approaches heavily relies on local information, such as neighbor relations of each data sample, to capture potential data distribution. In this paper, we show the non-local information is more efficient for data representation. With an implementation of a winner-take all autoencoder, several non-local templates are trained to trace the data distribution and to represent each sample in different subspaces with a suitable weight);
assigning a local sample weight to one or more samples of the set of local samples ([Page 3741, Column 2, Paragraph 2] Thus, it is very sensitive to the selection of hyper-parameters to decide a reasonable evaluation metric of locality. For a mixture of locally linear classifiers, most of the related methods also use Euclidean distance to describe locality. [Page 3744, Column 2, Paragraph 6] For the LLSVM [1], the number of anchor points generated by a k-means algorithm is selected in grid {10, 15, 20} and 100 as its paper originally suggests, while its coefficients of the local coding are obtained using inverse Euclidian distance based weighting [13] and the nearest neighbors for local coding are chosen in grid {6, 8, 10});
and generating a prediction using machine learning, the prediction associated with the first entity and being based on: the set of local samples ([Page 3741, Abstract] For many problems in machine learning fields, the data are nonlinearly distributed. One popular way to tackle this kind of data is training a local kernel machine or a mixture of several locally linear models. [Page 3742, Column 1, Section II, Paragraph 1] In this section, we summarize several previously proposed mixtures of locally linear models into one formulation. The reason a classifier is called to be local is that it commonly has a linear form with a multiplication of a function defining its influential local region. Note: Local region corresponds to the first entity. [Page 3744, Section A. Evaluation Metrics, Paragraph 1] To give a fair and clear comparison, four classical evaluation metrics are used in our following experiments. Specifically, Accuracy is used in Section V-C to evaluate performance gap between our method and other locally linear models. [Page 3744, Section A. Evaluation Metrics, Paragraph 2] Three other widely used evaluation metrics precision, recall and F-score are implemented to further evaluate the effectiveness of each method. They are particularly defined for a binary classification task with certain levels of imbalance. Precision is the proportion of positive predictions that are correct, and recall is the proportion of positive samples that are correctly predicted to be positive);
the local sample weight ([Page 3741, Column 2, Paragraph 2] Thus, it is very sensitive to the selection of hyper-parameters to decide a reasonable evaluation metric of locality. For a mixture of locally linear classifiers, most of the related methods also use Euclidean distance to describe locality. [Page 3744, Column 2, Paragraph 6] For the LLSVM [1], the number of anchor points generated by a k-means algorithm is selected in grid {10, 15, 20} and 100 as its paper originally suggests, while its coefficients of the local coding are obtained using inverse Euclidian distance based weighting [13] and the nearest neighbors for local coding are chosen in grid {6, 8, 10}).
However, Li does not explicitly disclose: determining a range of non-local sample weights; determining a range of hyperparameters based on the range of non-local sample weights; determining an optimized hyperparameter based on the range of hyperparameters; assigning an optimized non-local sample weight to one or more samples of the set of non-local samples, the optimized non-local sample weight based on the optimized hyperparameter; generating a prediction of how a user associated with the first entity will respond to a communication from the first entity, the set of non-local samples; and the optimized non-local sample weight.
Wang teaches, in an analogous system: determining a range of non-local sample weights (Section 1, Pg. 7794, Column 1 and 2, Lines 18-23 and 1-4, “In this paper, we present non-local operations as an efficient, simple, and generic component for capturing long-range dependencies with deep neural networks. Our proposed non-local operation is a generalization of the classical non-local mean operation [4] in computer vision. Intuitively, a non-local operation computes the response at a position as a weighted sum of the features at all positions in the input feature maps (Figure 1). The set of positions can be in space, time, or spacetime, implying that our operations are applicable for image, sequence, and video problems”);
determining a range of hyperparameters based on the range of non-local sample weights (Section 3.1, Pg. 7796, Column 1, Lines 3-6, “The non-local operation is also different from a fully connected
(
f
c
)
layer. Eq.(1) computes responses based on relationships between different locations, whereas
f
c
uses learned weights”; EN: the equation represents the sample weights).
the set of non-local samples ([Page 7794, Column 2, Paragraph 4] A single non-local block, which is our basic unit, can directly capture these spacetime dependencies in a feedforward fashion. With a few non-local blocks, our architectures called non-local neural networks are more accurate for video classification than 2D and 3D convolutional networks [48] (including the inflated variant. [Page 7798, Figure 3 legend] These visualizations show how the model finds related clues to support its prediction);
and the optimized non-local sample weight ([Page 7794, Column 2, Figure 1] A spacetime non-local operation in our network trained for video classification in Kinetics. A position xi’s response is computed by the weighted average of the features of all positions xj (only the highest weighted ones are shown here). [Page 7798, Figure 3 legend] These visualizations show how the model finds related clues to support its prediction).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer-implemented method of Li to incorporate the teachings of Wang to determine a range of non-local sample weights; determining a range of hyperparameters based on the range of non-local sample weights; the set of non-local samples; and the optimized non-local sample weight. One would have been motivated to do this modification because doing so would give the benefit of with a few non-local blocks, our architectures called non-local neural networks are more accurate for video classification than 2D and 3D convolutional networks as taught by Wang [Page 7794, Column 2, Paragraph 4].
Angermueller teaches, in an analogous system: determining an optimized hyperparameter based on the range of hyperparameters ([Section 1, Page 2, Column 1, Lines 5-11] “We employ a heterogenous population, consisting of global model-based optimizers based on discriminative and generative models along evolutionary strategies. Finally, we further improve P3BO by introducing a variant, Adaptive-P3BO, which adapts the hyper-parameters of the algorithms themselves on the fly using evolutionary search”);
assigning an optimized non-local sample weight to one or more samples of the set of non-local samples, the optimized non-local sample weight based on the optimized hyperparameter ([Section 6.3, Pg. 7, Column 1and 2, Lines 18-23 and 1] “We evaluate sample-efficiency for a particular optimization problem by comparing the cumulative maximum of
f
(x) (Max reward) depending on the number of samples proposed. We further use the area under the Max reward curve to summarize sample-efficiency in a single number and comparing methods across optimization problems. We repeat each experiment 20 times with different random seeds”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Li and Wang to incorporate the teachings of Angermueller to use determining an optimized hyperparameter based on the range of hyperparameters; assigning an optimized non-local sample weight to one or more samples of the set of non-local samples, the optimized non-local sample weight based on the optimized hyperparameter. One would have been motivated to do this modification because doing so would give the benefit of adapting the hyper-parameters of the algorithms themselves on the fly using evolutionary search as taught by Angermueller [Section 1, Page 2, Column 1, Lines 5-11].
Kursun teaches, in an analogous system: generating a prediction of how a user associated with the first entity will respond to a communication from the first entity ([Column 1, Lines 21-25] Embodiments of the present invention address the above needs and/or achieve other advantages by providing apparatuses (e.g., a system, computer program product and/or other devices) and methods for preemptive user interactions for predictive exposure alerting. [Column 1, Lines 56-63] wherein the information comprises an output that is generated by one or more machine learning models, generate exposure characteristics for the interaction based on the output associated with the resource entity and user data associated with the user, wherein the exposure characteristics are unique to the interaction and the user, and in response to generating the exposure characteristics, transmit the exposure characteristics to a user device. [Column 13, Lines 23-52] As shown in block 730, the system based on the user agreement, prompts the user to authorize transfer of anonymized user data to the resource entity to receive the one or more supplemental resources. The user may choose to be notified before transferring any user data to the resource entities in the user agreement. Based upon the user agreement, transmits a notification to the user and prompt the user to authorize the transmission of the data. As shown in block 735, the system transmits the anonymized user data to the resource entity. As shown in block 740, the system in response to transmitting the anonymized user data to the resource entity, receives the one or more supplemental resources from the resource entity. The system may receive one or more supplemental resources that are specific to the user (e.g., products of interest) based on the user data transmitted to the resource entity by the system. As shown in block 745, the system transmits the one or more supplemental resources to the user device. In addition, the system may provide a real-time communication platform, where the real-time communication platform allows anonymized communication between the one or more user and the one or more resource entities. The real-time communication platform allows the user and the resource entities to negotiate deals, price of the products, or the like. During negotiation, if the user provides any personal information via the real-time communication platform, the system automatically identifies such information and anonymizes the personal information before displaying the information to the resource entity).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Li, Wang, and Angermueller to incorporate the teachings of Kursun to generate a prediction of how a user associated with the first entity will respond to a communication from the first entity. One would have been motivated to do this modification because doing so would give the benefit of continuously tracking and monitoring user activity associated with a user as taught by Kursun [Column 1, Lines 47-48].
Regarding Claim 3:
The system of Li, Wang, Angermueller, and Kursun teaches: The computer-implemented method of claim 1 (as shown above).
However, Li fails to explicitly teach: wherein the range of non-local sample weight is between: a total number of samples in the set of local samples over a total number of samples in the set of local samples and the set of non-local samples; and the integer value 1.
Wang further teaches, in an analogous system: wherein the range of non-local sample weight is between: a total number of samples in the set of local samples over a total number of samples in the set of local samples and the set of non-local samples; and the integer value 1 ([Section 1, Pg. 7794, Column 1 and 2, Lines 18-23 and 1-4] “In this paper, we present non-local operations as an efficient, simple, and generic component for capturing long-range dependencies with deep neural networks. Our proposed non-local operation is a generalization of the classical non-local mean operation [4] in computer vision. Intuitively, a non-local operation computes the response at a position as a weighted sum of the features at all positions in the input feature maps (Figure 1). The set of positions can be in space, time, or spacetime, implying that our operations are applicable for image, sequence, and video problems”. [Section 6, Pg. 7801, Column 2, Lines 17-24] “Table 6 shows the results on COCO. On a strong baseline of R101, adding 4 non-local blocks to the key point head leads to a ~1 point increase of key point AP. If we add one extra non-local block to the backbone as done for object detection, we observe an in total 1.4 points increase of key point AP over the baseline. In particular, we see that the stricter criterion of
A
P
75
is boosted by 2.4 points, suggesting a stronger localization performance”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer-implemented method of Li to incorporate the teachings of Wang wherein the range of non-local sample weight is between: a total number of samples in the set of local samples over a total number of samples in the set of local samples and the set of non-local samples; and the integer value 1. One would have been motivated to do this modification because doing so would give the benefit of with a few non-local blocks, our architectures called non-local neural networks are more accurate for video classification than 2D and 3D convolutional networks as taught by Wang [Page 7794, Column 2, Paragraph 4].
Regarding Claim 4:
The system of Li, Wang, Angermueller, and Kursun teaches: The computer-implemented method of claim 1 (as shown above).
However, Li fails to explicitly teach: wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non-local sample weight, determining an associated hyperparameter to be a ratio of a total number of samples in the set of local samples to a difference between a total number of samples in the set of non-local samples and the total number of samples in the set of local samples multiplied by the one non-local sample weight plus the total number of samples in the set of local samples.
Wang further teaches, in an analogous system: wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non-local sample weight, determining an associated hyperparameter to be a ratio of a total number of samples in the set of local samples to a difference between a total number of samples in the set of non-local samples and the total number of samples in the set of local samples multiplied by the one non-local sample weight plus the total number of samples in the set of local samples ([Section 3.1, Pg. 7796, Column 1, Lines 3-6] “The non-local operation is also different from a fully connected
(
f
c
)
layer. Eq.(1) computes responses based on relationships between different locations, whereas
f
c
uses learned weights”, [Section 4.1, Pg. 7798, Column 1, Lines 8-16], “We train on an 8-GPU machine and each GPU has 8 clips in a mini-batch (so in total with a mini-batch size of 64 clips). We train our models for 400k iterations in total, starting with a learning rate of 0.01 and reducing it by a factor of 10 at every 150k iterations (see also Figure 4). We use a momentum of 0.9 and a weight decay of 0.0001. We adopt dropout after the global pooling layer, with a dropout ratio of 0.5. We fine-tune our models with BatchNorm (BN) enabled when it is applied).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer-implemented method of Li to incorporate the teachings of Wang wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non-local sample weight, determining an associated hyperparameter to be a ratio of a total number of samples in the set of local samples to a difference between a total number of samples in the set of non-local samples and the total number of samples in the set of local samples multiplied by the one non-local sample weight plus the total number of samples in the set of local samples. One would have been motivated to do this modification because doing so would give the benefit of with a few non-local blocks, our architectures called non-local neural networks are more accurate for video classification than 2D and 3D convolutional networks as taught by Wang [Page 7794, Column 2, Paragraph 4].
Regarding Claim 5:
The system of Li, Wang, Angermueller, and Kursun teaches: The computer-implemented method of claim 1 (as shown above).
However, Li fails to explicitly teach: wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non-local sample weight, determining an associated hyperparameter to be a ratio of a total number of samples in the set of local samples multiplied by the local sample weight to a difference between a total number of samples in the set of non-local samples and the total number of samples in the set of local samples multiplied by the one non-local sample weight plus the total number of samples in the set of local samples multiplied by the local sample weight.
Wang further teaches, in an analogous system: wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non-local sample weight, determining an associated hyperparameter to be a ratio of a total number of samples in the set of local samples multiplied by the local sample weight to a difference between a total number of samples in the set of non-local samples and the total number of samples in the set of local samples multiplied by the one non-local sample weight plus the total number of samples in the set of local samples multiplied by the local sample weight ([Section 3.1, Pg. 7796, Column 1, Lines 3-6] “The non-local operation is also different from a fully connected
(
f
c
)
layer. Eq.(1) computes responses based on relationships between different locations, whereas
f
c
uses learned weights”. [Section 4.1, Pg. 7798, Column 1, Lines 8-16] “We train on an 8-GPU machine and each GPU has 8 clips in a mini-batch (so in total with a mini-batch size of 64 clips). We train our models for 400k iterations in total, starting with a learning rate of 0.01 and reducing it by a factor of 10 at every 150k iterations (see also Figure 4). We use a momentum of 0.9 and a weight decay of 0.0001. We adopt dropout after the global pooling layer, with a dropout ratio of 0.5. We fine-tune our models with BatchNorm (BN) enabled when it is applied.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer-implemented method of Li to incorporate the teachings of Wang wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non-local sample weight, determining an associated hyperparameter to be a ratio of a total number of samples in the set of local samples multiplied by the local sample weight to a difference between a total number of samples in the set of non-local samples and the total number of samples in the set of local samples multiplied by the one non-local sample weight plus the total number of samples in the set of local samples multiplied by the local sample weight. One would have been motivated to do this modification because doing so would give the benefit of with a few non-local blocks, our architectures called non-local neural networks are more accurate for video classification than 2D and 3D convolutional networks as taught by Wang [Page 7794, Column 2, Paragraph 4].
Regarding Claim 6:
The system of Li, Wang, Angermueller, and Kursun teaches: The computer-implemented method of claim 1 (as shown above).
However, Li fails to explicitly teach: wherein determining a range of hyperparameters based on the range of non-local sample weights comprises, for any one non-local sample weight, determining an associated hyperparameter to be a ratio of a sum of loc