Last updated: April 19, 2026
Application No. 18/332,118
FEATURE STORE DATA PREPARATION OPTIMIZATION

Final Rejection §101§103
Filed
Jun 09, 2023
Examiner
BOLEN, NICHOLAS D
Art Unit
3624
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Final)
Interview Optional

— +10.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 122 resolved cases, 2023–2026
Examiner Intelligence

BOLEN, NICHOLAS D View full profile →
Grants only 10% of cases
Career Allow Rate
12 granted / 122 resolved
-42.2% vs TC avg
Moderate +10% lift
Without
With
+10.5%
Interview Lift
resolved cases with interview
Typical timeline
4y 3m
Avg Prosecution
29 currently pending
Career history
151
Total Applications
across all art units
Statute-Specific Performance

§101
36.5%
-3.5% vs TC avg
§103
48.6%
+8.6% vs TC avg
§102
7.6%
-32.4% vs TC avg
§112
7.1%
-32.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 122 resolved cases
Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statements (IDS) submitted on 9/23/2025 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Notice to Applicant
Claims 1-19 are presently amended. 
Claims 1-20 are pending. 

Response to Amendment
Applicant’s amendments are acknowledged. 

Response to Arguments
Applicant' s arguments filed 9/17/2025 have been fully considered in view of further consideration of statutory law, Office policy, precedential common law, and the cited prior art as necessitated by the amendments to the claims, and are not persuasive for the reasons set forth below.

35 USC § 101 Rejections
	First, Applicant argues that “a first technical improvement is captured by the features "selecting an execution alternative. using an execution cost model that predicts an execution cost for each execution alternative" recited in the Applicant's independent claim 1. Per this recited optimization methodology, the system dynamically evaluates multiple execution strategies for computing a feature and selects the most efficient one based on predicted resource usage.
Specifically, the specification describes a cost model that evaluates execution alternatives based on various factors. See, e.g., Applicant's Specification, paragraph [0032]. The cost estimator uses a weighted sum to determine the most computationally cost-effective execution plan. Id.
The above-described practice of selecting an execution alternative based on execution cost reduces CPU and memory overhead during feature computation, which is a marked technical improvement over prior systems that statically compute features without reuse or cost- based planning. See also id. at paragraph [0034].
Because the claimed features of Claim 1 are directed to steps performed to realize the distinct technical improvements discussed above, claim 1 is directed to a practical application and is patent-eligible under § 101” [Arguments, pages 9-10].
In response, Applicant’s arguments are considered but are not persuasive. Examiner respectfully maintains that the present invention recites a judicial exception without significantly more. In particular, and with respect to the assertion that the “practice of selecting an execution alternative based on execution cost reduces CPU and memory overhead during feature computation, which is a marked technical improvement over prior systems that statically compute features without reuse or cost- based planning”, Examiner respectfully disagrees and maintains that reciting a execution cost estimator is not sufficient to demonstrate a technological improvement to any particular field of technology or to the functioning of computing systems. Specifically, Examiner observes that the present claims do not detail how the execution cost estimator operates, and the claims only briefly mention that it predicts an execution cost. Further, claim 1 only recites that the method is “computer-implemented”, without claiming any computer components. Independent claims 12 and 18 only recite generic computing components in the preamble of the claims and do not appear to implement them in any meaningful way. Thus, Examiner respectfully maintains that the additional elements of the claims are not recited at a level of specificity that could be considered to demonstrate a practical application or otherwise any technological improvement to a field of technology or to the functioning of computers. As such, Examiner remains unpersuaded.

35 USC § 103 Rejections
First, Applicant argues that “The combination of Bonaci and Chen does not teach or suggest generating an alternative feature definition based on a new feature definition and a matched feature definition that at least partially includes the new feature definition…
Firstly, Bonaci's teaching of the ability to reuse training features upon which a training feature generation model during production does not teach or suggest determining that a new feature definition is at least partially included in a matched feature definition, as recited in the Applicant's independent claim 1. Also, Bonaci's teaching of reusing training features during production does not teach or suggest generating an alternative feature definition based on the new feature definition and the matched feature definition, as recited in the Applicant's independent claim 1.
Secondly, Bonaci's teaching of using a model that is trained on training features to generate a new feature does not teach or suggest determining that a new feature definition is at least partially included in a matched feature definition, as recited in the Applicant's independent claim 1. Bonaci is silent regarding whether any matching between a new feature definition and a matched (existing) feature definition occurs either during training or inference phases of its feature generation model. Accordingly, Bonaci's teaching of using a feature generation model that is trained on training features to generate a new feature also does not teach or suggest generating an alternative feature definition based on the new feature definition and the matched feature definition, as recited in the Applicant's independent claim 1, because Bonaci does not teach or suggest any matching of a new feature definition to a matched feature definition…
At most, the combination of the references teaches generating a feature using a model that is trained using training features (Bonaci), reusing the training features during production (Bonaci) and making features available to APIs (e.g., for model training) using wide tables (Chen), but does not teach or suggest generating an alternative feature definition based on the new feature definition and the matched feature definition, as recited in the Applicant's independent claim 1 and determining that a new feature definition is at least partially included in a matched feature definition, as recited in the Applicant's independent claim 1. The combination of the cited references does not teach or suggest any matching of a new feature definition with a matched feature definition that at least partially includes the new feature definition nor does not the combination teach or suggest determining an alternative feature definition based on the new feature definition and the matched feature definition” [Arguments, pages 10-12].
	In response, Applicant’s arguments are considered but are not persuasive. Examiner respectfully disagrees and directs the Applicant to (Bonaci, ¶¶ 42-43, 61, and 120-121). In particular, Examiner observes that the recited machine learning model compares and matches newly generated features to those in a validation set (i.e. a matched feature definition). The method of Bonaci goes on to use the machine learning model to iterate with different feature sets and generate alternative feature definitions based on the new feature definitions with from the training data and the matched feature definition from the validation set, in accordance with the amended claims of the present invention. Thus, Examiner respectfully maintains that Bonaci renders obvious the above-argued limitation of the independent claims. As such, Examiner remains unpersuaded.

	Second, Applicant argues that “The combination of Bonaci and Chen does not teach or suggest generating an alternative feature definition based on a new feature definition and a matched feature definition that at least partially includes the new feature definition…
The Office Action mapped Chen's teaching of joining tables to generate features to the claimed feature of selecting an execution alternative from an execution of a PIT join using the alternative feature definition and a PIT join using the new feature definition. Office Action at 13. However, this mapping fails…
Chen's joins are part of a feature set definition and used to materialize wide tables for training and inference. Accordingly, Chen does not teach or suggest selecting an execution alternative from among candidate execution alternatives, using an execution cost model that predicts an execution cost for each candidate execution alternative the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator, as recited in the Applicant's independent claim 1.
Also, Chen does not teach any mechanism for selecting between multiple join strategies. In contrast, the claimed invention selects between two distinct executions of PIT joins: one using a new feature definition and one using an alternative feature definition, based on a cost model. Accordingly, Chen does not teach or suggest an execution cost model that predicts an execution cost for each candidate execution alternative, including (i) an execution…
Bonaci's use of a model to generate features, where the model is trained using training features, does not each or suggest However, Bonaci's use of a model to generate features, where the model is trained using training features, does not each or suggest an execution cost model that predicts an execution cost for each candidate execution alternative, including (i) an execution…
At most, the combination of the references teaches generating features by joining tables (Chen), enabling the joins to be reused (Chen), generating a feature using a model trained on training features (Bonaci), and making the training features available for reuse (Bonaci), but does not teach or suggest selecting an execution alternative from among candidate execution alternatives, using an execution cost model that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution…” [Arguments, pages 12-14].
	In response, Applicant’s arguments are considered but are not persuasive. Examiner respectfully disagrees and maintains that, through through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Bonaci and Chen discloses …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator.
First, Bonaci disclose a data aggregation technique including an expense estimation and minimization technique which selects an alternative technique for minimal cost (Bonaci, ¶ 82, According to an aspect, feature computation layer 106 is configured to simultaneously compute more than one feature, such as a large number of features. When simultaneously computing many features, it is possible to compute each feature independently and then join the computed values based on the entity and time. However, this approach is inefficient for at least two major reasons. First, computing each feature may involve retrieving and processing the same input events multiple times. Second, once the features are computed, performing an N-way join is an expensive operation. FIG. 5A illustrates an example N-way join 500a, (discloses PIT join techniques) such as a 3-way join, being performed after multiple features are individually computed. Computing two or more of the three features shown in FIG. 5A may involve retrieving and processing the same input events multiple times. After these three features are individually computed, they may be joined and output by the system), (Id., ¶ 83, Rather than employing this inefficient and expensive technique for simultaneously computing multiple features, feature computation layer 106 (discloses execution cost estimator) may instead combine all of the aggregations into a single pass over events that computes (at each point in time and for each entity) the value of all aggregations. (discloses execution alternative selected for minimal cost) The description of this flattened operation is called the aggregation plan and the process for producing it is described in more detail below. This flattened aggregation plan allows for the simultaneous computation of the aggregations necessary for all requested features with a single pass over the input, and therefore eliminates the need for the N-way join. FIG. 5B illustrates an example simultaneous feature computation 500b without an N-way join. As depicted in FIG. 5B, all of the multiple features are simultaneously computed with a single pass over the input, eliminating the need to retrieve and process the same input events multiple times), (Id., ¶ 157, In embodiments, resume tokens are utilized to continually apply the results of a query to a separate (i.e., external) data store with minimal cost. This may be achieved by first running an initial query, writing the results to the separate data store, and receiving a resume token. A query may be periodically run to update the results in the external store. Each query uses the resume token returned by the previous response. The new results may reflect only those results which have changed).

    PNG
    media_image1.png
    452
    647
    media_image1.png
    Greyscale

Further, Chen discloses time-sensitive PIT join techniques using new and alternative feature definitions (Chen, ¶ 23, For purposes of this document, features are joins between different tables on the same key (e.g., address, timestamp) with perhaps simple transformations applied to some fields (discloses selecting an optimal time-sensitive PIT join alternative) that will make it more suitable for machine learning training and/or service, such as the nullification of values that are obviously incorrect), (Id., ¶ 24, These two concepts allow for the separation of physical mechanisms for managing tables from the logical design of features, while allowing each layer to be iterated on separately. For example, supporting a new table format merely requires the user to implement a common interface, which could immediately be used by a feature layer without any knowledge of the physical details), (Id., ¶ 39, As described above, features are joins between tables FIG. 3 is a diagram illustrating an example of features, in accordance with an example embodiment. Features are what someone building a model would care about. They define a configuration file that references all the fields needed from each of the tables defined.), (Id., ¶ 43, When iterating over a FeatureSet for model training, two commands may be defined: [0044] 1. build_feature_set_input_table—feature-set $FEATURE_SET—path $PATH which, given a FeatureSet definition, will perform all the joins necessary for the input table and can later be iterated on. [0045] 2. build_feature_set—feature-set $FEATURE_SET—input-table $PATH will build the feature set using the input table from 1. This command would fail if the input table does not contain the necessary columns. [0046] These two commands allow a user to iterate over feature set definitions without having to recalculate all joins every time).

    PNG
    media_image2.png
    285
    493
    media_image2.png
    Greyscale

One of ordinary skill in the art would have recognized that applying the known technique of Bonaci would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the cost estimation technique of Bonaci to the feature-definition-based PIT join teachings of Chen would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such data aggregation features into similar systems. Further, applying a cost minimization based alternative selection to Chen with feature definitions considered accordingly, would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow more cost effective data aggregation according to specific feature definitions.
Thus, through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Bonaci and Chen discloses …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator.
	As such, Examiner remains unpersuaded.



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  

Step 1: Claims 1-20 are directed to statutory categories, namely a process (claims 1-11), an article of manufacture (claims 12-17) and a machine (claims 18-20). 

Step 2A, Prong 1: Claims 1, 12 and 18 in part, recite the following abstract idea: 
…A … method, comprising: receiving a new feature definition specifying parameters of a feature; comparing the new feature definition with a plurality of computed feature definitions stored in a feature store; in response to determining that the new feature definition is at least partially included in a matched feature definition of the plurality of computed feature definitions, generating an alternative feature definition based on the new feature definition and the matched feature definitions; selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a first point-in-time PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator; and executing the selected execution alternative using a compute engine to generate the feature [Claim 1],
…receiving a new feature definition specifying parameters of a feature; comparing the new feature definition with a plurality of computed feature definitions stored in a feature store; in response to determining that the new feature definition is at least partially included in a matched feature definition of the plurality of computed feature definitions, generating an alternative feature definition based on the new feature definition and the matched feature definition; selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a first point-in-time PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator, wherein selecting the execution alternative further comprises evaluating, using a feature selection criterion one or more of the alternative feature definition and the new feature definition; and executing the selected execution alternative using a compute engine to generate the feature [Claim 12],
…receiving a new feature definition specifying parameters of a feature; comparing the new feature definition with a plurality of computed feature definitions stored in a feature store; in response to determining that the new feature definition is at least partially included in a matched feature definition of the plurality of computed feature definitions, generating an alternative feature definition based on the new feature definition and the matched feature definition; selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a first point-in-time PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator, wherein selecting the execution alternative further comprises evaluating, using a feature selection criterion one or more of the alternative feature definition and the new feature definition; and executing the selected execution alternative using a compute engine to generate the feature [Claim 18].

These concepts are not meaningfully different than the following concepts identified by the MPEP:
Concepts relating to certain methods of organizing human activity. The aforementioned limitations describe steps for managing personal behavior or relationships or interactions between people, including social activities, teaching, and following rules or instructions. Specifically, selecting an execution alternative from an execution of a PIT join using the alternative feature definition and an execution of a PIT join using the new feature definition is considered to describe steps for following rules or instructions. 
Mental Processes. The aforementioned limitations describe steps for concepts performed in the human mind which includes an observation, evaluation, judgment, or an opinion. Specifically, selecting an execution alternative from an execution of a PIT join using the alternative feature definition and an execution of a PIT join using the new feature definition is considered to describe steps for an evaluation.
As such, claims 1, 12 and 18 recite concepts identified as abstract ideas.

The dependent claims recite limitations relative to the independent claims, including, for example: 
further comprising: receiving a plurality of candidate source data layouts that are based on current feature computation pipelines and current source data layout; determining a plurality of candidate source data layouts; and selecting a new data source layout from the plurality of candidate source data layouts that are based on current feature computation pipelines and current source data layout. [Claim 2],
wherein selecting a new data source layout further comprising evaluating the plurality of candidate source data layouts and the current source data layout based on a layout selection criterion, wherein the layout selection criterion comprises selection of a minimum cost configuration of the new data source layout. [Claim 3],
wherein the selection of the minimum cost configuration is implemented using binary integer programming. [Claim 4],
wherein selecting the execution alternative further comprises evaluating, using a feature selection criterion one or more of the alternative feature definitions and the new feature definition. [Claim 5],
wherein the feature selection criterion comprises minimization of data to be scanned using one or more of the alternative feature definitions and the new feature definition. [Claim 6],
The limitations of these dependent claims are merely narrowing the abstract idea identified in the independent claims, and thus, the dependent claims also recite abstract ideas.

Step 2A, Prong 2: This judicial exception is not integrated into a practical application. In particular, claims 1, 12 and 18 only recite the following additional elements – 
…computer implemented… [Claim 1],
One or more physically manufactured computer-readable storage media, encoding computer-executable instructions for executing on a computer system a computer process, the computer process comprising… [Claim 12],
A system comprising: memory; one or more processor units; a feature store data preparation optimization system stored in the memory and executable by the one or more processor units, the feature store data preparation optimization system encoding computer-executable instructions on the memory for executing on the one or more processor units a computer process, the computer process comprising… [Claim 18].

The system, processor and executable instructions are recited at a high-level of generality (see MPEP § 2106.05(a)), like the following MPEP example:
iii. Gathering and analyzing information using conventional techniques and displaying the result, TLI Communications, 823 F.3d at 612-13, 118 USPQ2d at 1747-48;
Furthermore, the computer implemented element is considered to amount to no more than mere instructions to apply the exception using a generic computer component (see MPEP 2106.05(f)), like the following MPEP example: 
i. A commonplace business method or mathematical algorithm being applied on a general purpose computer, Alice Corp. Pty. Ltd. V. CLS Bank Int’l, 573 U.S. 208, 223, 110 USPQ2d 1976, 1983 (2014); Gottschalk v. Benson, 409 U.S. 63, 64, 175 USPQ 673, 674 (1972); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); 
Accordingly, these additional elements do not integrate the abstract idea into a practical application. 
The remaining dependent claims do not recite any new additional elements, and thus do not integrate the abstract idea into a practical application.

Step 2B: Claims 1, 12 and 18 and their underlying limitations, steps, features and terms, considered both individually and as a whole, do not include additional elements that are sufficient to amount to significantly more than the judicial exception for the following reasons: 
Independent claims 1, 12 and 18 only recite the following additional elements – 
…computer implemented… [Claim 1],
One or more physically manufactured computer-readable storage media, encoding computer-executable instructions for executing on a computer system a computer process, the computer process comprising… [Claim 12],
A system comprising: memory; one or more processor units; a feature store data preparation optimization system stored in the memory and executable by the one or more processor units, the feature store data preparation optimization system encoding computer-executable instructions on the memory for executing on the one or more processor units a computer process, the computer process comprising… [Claim 18].

These elements do not amount to significantly more than the abstract idea for the reasons discussed in 2A prong 2 with regard to MPEP 2106.05(a) and MPEP 2106.05(f). By the failure of the elements to integrate the abstract idea into a practical application there, the additional elements likewise fail to amount to an inventive concept that is significantly more than an abstract idea here, in Step 2B. 
As such, both individually or in combination, these limitations do not add significantly more to the judicial exception.
The remaining dependent claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the dependent claims do not recite any new additional elements other than those mentioned in the independent claims, which amount to no more than mere instructions to apply the exception using a generic computer component (see MPEP 2106.05(f)). As such, these claims are not patent eligible.

	
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 and 5-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bonaci et al., U.S. Publication No. 2022/0156254 [hereinafter Bonaci] in view of Chen, U.S. Publication No. 2020/0004733 [hereinafter Chen].


Regarding Claim 1, Bonaci discloses …A computer-implemented method, comprising: receiving a new feature definition specifying parameters of a feature (Bonaci, ¶ 32, In an embodiment, feature engineering system 100 may be configured to generate feature vectors and/or examples associated with a particular entity. As is discussed below in more detail, a user of system 100, such as a data scientist, may be responsible for instructing system 100 which entity or entities should be included in the feature vectors and/or examples. For example, if the user of system 100 wants to train a model to predict how much homes will sell for in Seattle, the user of system 100 may instruct system 100 to choose houses in Seattle as the entities that should be included in the feature vectors (discloses receiving a new feature definition specifying parameters of the feature) and/or examples. If the user instructed system 100 to choose, for example, houses in Los Angeles as the set of entities that should be included in the feature vectors and/or examples, the model may not be able to accurately predict selling prices for homes in Seattle), (Id., ¶ 120, Once the user has created and/or changed the feature definition and/or example selection, the feature engineering system can use this information to efficiently create the desired features and/or feature vectors and/or examples for the user. For example, the feature engineering system can use this information to create the desired features and/or feature vectors and/or examples for the user by re-using previous computations. After the desired features and/or feature vectors and/or examples have been generated, they may be exported to the user. At 704, the generated features and/or feature vectors and/or examples may be exported to the user. The user may use these exported features and/or feature vectors and/or examples to train and/or validate/evaluate the model. At 706, the user may train the model on any training examples generated by the feature engineering system. At 708, the user may validate and/or evaluate the model using any validation examples generated by the feature engineering system. If the user wants the feature engineering system to generate new or different features and/or feature vectors and/or examples, the user may easily change the dataset being used or experiment with a different dataset. For example, the user may want to try a new dataset to see if the model performs better after being trained with the new dataset. The method 700 may return to step 702, where the user may change the feature definition and/or update the example selection configuration. The user may continue to perform this iterative process until the model is generating results that satisfy the user);
comparing the new feature definition with a plurality of computed feature definitions stored in a feature store (Id., ¶ 61, In embodiments, feature computation layer 106 is configured to determine the features using the raw data and/or events stored to related event store 105. The feature computation layer 106 may be configured to determine the features by applying a variety of numerical processes to the data, such as arithmetic operations, aggregations, and various other techniques. In an embodiment, a user of the system 100 may determine useful features for a model by evaluating the features generated by feature computation layer 106 using both numerical methods and attempts to train a model using the examples generated from these features. By attempting to train the model using the generated examples, the user may see if the model trained using the features of interest has less error, such as by testing the model using a validation set, as compared to the model trained with different features), (Id., ¶ 92, In an embodiment, feature engine 103 includes a feature store 107. (discloses feature store) Feature computation layer 106 may store the determined features and/or generated feature vectors to feature store 107. Feature store 107 makes deployed features available for users. According to an aspect, feature computation layer 106 keeps feature store 107 up-to-date, such as by computing and updating values of features when new events are received and/or when a request is received from a user. Based on the features stored to feature store 107, feature computation layer 106 may avoid recomputing features using the same events. For example, if feature computation layer 106 has determined features using events up to arrival time x, feature computation layer 106 determines features using events up to arrival time x+n by only considering events that arrived after arrival time x and before arrival time x+n), (Id., ¶ 93, According to an aspect, feature computation layer 106 updates the features and/or save the new features to feature store 107. As a result, feature store 107 is configured to make up-to-date query results 113 available on-demand and computed features are readily available for quick model application. A user who wants to use a model trained on a particular exported dataset may efficiently retrieve stored pre-computed values);
in response to determining that the new feature definition is at least partially included in a matched feature definition of the plurality of computed feature definitions, generating an alternative feature definition based on the new feature definition and the matched feature definition (Id., ¶ 61, In embodiments, feature computation layer 106 is configured to determine the features using the raw data and/or events stored to related event store 105. The feature computation layer 106 may be configured to determine the features by applying a variety of numerical processes to the data, such as arithmetic operations, aggregations, and various other techniques. In an embodiment, a user of the system 100 may determine useful features for a model by evaluating the features generated by feature computation layer 106 using both numerical methods and attempts to train a model using the examples generated from these features. By attempting to train the model using the generated examples, the user may see if the model trained using the features of interest has less error, such as by testing the model using a validation set, as compared to the model trained with different features), (Id., ¶ 42, Feature engineering system 100 is configured to use the data from data sources 101,102 to efficiently provide and/or generate feature vectors, such as a predictor feature vector, for a user to use in the application stage. Applying the model may involve computing a feature vector using the same computations that were used in training of the model, but for an entity or time that may not have been part of the training or validation examples. Because feature engineering system 100 is also configured to generate feature vectors for the user to use in the training stage, the same feature (discloses generating new feature definitions) vector definitions that were used for training are automatically available during production. As discussed above, making the same feature vector definitions used for training automatically available during production allows for event-based models to be successfully used in production. For example, feature engineering system 100 may provide and/or generate predictor feature vectors for a user to use in the application stage, while the feature engineering system 100 may provide and/or generate predictor and label feature vectors for a user to use in the training and validation stage. Feature engineering system 100 may generate the feature vectors and/or validation examples in a similar manner as described above for training examples), (Id., ¶ 43, System 100 is configured to ingest event data from one or more sources 101, 102 of data. In some configurations, a data source includes historical data, e.g., from historical data source 101. In that case, the data includes data that was received and/or stored within a historic time period i.e. not real-time. The historical data is typically indicative of events that occurred within a previous time period. For example, the historic time period may be a prior year or a prior two years, e.g., relative to a current time, etc. Historical data source 101 may be stored in and/or retrieved from one or more files, one or more databases, an offline source, and the like or may be streamed from an external source. The historical data ingested by system 100 may be associated with a user of system 100, such as a data scientist, that wants to train and implement a model using features generated from the data. System 100 may ingest the data from one or more sources 101,102 and use it to compute features), (Id., ¶ 120, Once the user has created and/or changed the feature definition and/or example selection, the feature engineering system can use this information to efficiently create the desired features and/or feature vectors and/or examples for the user. For example, the feature engineering system can use this information to create the desired features and/or feature vectors and/or examples for the user by re-using previous computations. After the desired features and/or feature vectors and/or examples have been generated, they may be exported to the user. At 704, the generated features and/or feature vectors and/or examples may be exported to the user. The user may use these exported features and/or feature vectors and/or examples to train and/or validate/evaluate the model. At 706, the user may train the model on any training examples generated by the feature engineering system. At 708, the user may validate and/or evaluate the model using any validation examples generated by the feature engineering system. If the user wants the feature engineering system to generate new or different features and/or feature vectors and/or examples, the user may easily change the dataset being used or experiment with a different dataset. For example, the user may want to try a new dataset to see if the model performs better after being trained with the new dataset. The method 700 may return to step 702, where the user may change the feature definition and/or update the example selection configuration. The user may continue to perform this iterative process until the model is generating results that satisfy the user), (Id., ¶ 121, FIG. 8 shows an example network 800 for feature engineering. The network 800 includes a feature engineering system 802 and one or more clients 804. System 802 may be similar to and/or perform similar functions as those performed by system 100 and/or system 200 described above. System 802 includes an API Server 808, one or more compute nodes 814, metadata storage 810, event data storage 816, staged data storage 806, prepared data storage 812, and result data storage 818. The event data storage 816, the staged data storage 806, and/or the prepared data storage 812 may utilize an external storage system, such as Amazon S3 or any other external storage system. The compute nodes 814 may be, for example, a feature engine, such as one of the feature engines described above);
and executing the selected execution alternative using a compute engine to generate the feature (Id., ¶ 90, Feature engineering system 100 (discloses compute engine) may simplify collaboration in feature generation (discloses generating the feature) and/or selection. As discussed above, features are often defined by users, such as data scientists. A company may have multiple data scientists producing features for one or more models. The data scientists may need to use different tools to access different kinds of raw data and/or events, further complicating the process of producing features. Collaboration on features produced in ad-hoc and varied ways makes it difficult to share features between users and/or projects. In addition, the techniques for producing features may vary based on the data size and the need for producing the feature vectors “in a production environment.” This may lead to the need to implement features multiple times for different situations. However, feature engineering system 100 may address these shortcomings by ingesting and/or saving raw data and/or events from a variety of sources and making the features available to users in different locations and/or using different devices, such as via the feature studio described further herein), (Id., ¶ 91, In an embodiment, feature computation layer 106 is configured to compute feature vectors. A feature vector is a list of features of an entity. The feature computation layer 106 may be configured to compute and/or update feature vectors as events are ingested by the feature engine 103. The feature computation layer 106 may be configured to compute and/or update feature vectors in response to user queries).
While suggested in at least Fig. 2 and related text, Bonaci does not explicitly disclose …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator; 
However, through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Bonaci and Chen discloses …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator.
First, Bonaci disclose a data aggregation technique including an expense estimation and minimization technique which selects an alternative technique for minimal cost (Bonaci, ¶ 82, According to an aspect, feature computation layer 106 is configured to simultaneously compute more than one feature, such as a large number of features. When simultaneously computing many features, it is possible to compute each feature independently and then join the computed values based on the entity and time. However, this approach is inefficient for at least two major reasons. First, computing each feature may involve retrieving and processing the same input events multiple times. Second, once the features are computed, performing an N-way join is an expensive operation. FIG. 5A illustrates an example N-way join 500a, (discloses PIT join techniques) such as a 3-way join, being performed after multiple features are individually computed. Computing two or more of the three features shown in FIG. 5A may involve retrieving and processing the same input events multiple times. After these three features are individually computed, they may be joined and output by the system), (Id., ¶ 83, Rather than employing this inefficient and expensive technique for simultaneously computing multiple features, feature computation layer 106 (discloses execution cost estimator) may instead combine all of the aggregations into a single pass over events that computes (at each point in time and for each entity) the value of all aggregations. (discloses execution alternative selected for minimal cost) The description of this flattened operation is called the aggregation plan and the process for producing it is described in more detail below. This flattened aggregation plan allows for the simultaneous computation of the aggregations necessary for all requested features with a single pass over the input, and therefore eliminates the need for the N-way join. FIG. 5B illustrates an example simultaneous feature computation 500b without an N-way join. As depicted in FIG. 5B, all of the multiple features are simultaneously computed with a single pass over the input, eliminating the need to retrieve and process the same input events multiple times), (Id., ¶ 157, In embodiments, resume tokens are utilized to continually apply the results of a query to a separate (i.e., external) data store with minimal cost. This may be achieved by first running an initial query, writing the results to the separate data store, and receiving a resume token. A query may be periodically run to update the results in the external store. Each query uses the resume token returned by the previous response. The new results may reflect only those results which have changed).

    PNG
    media_image1.png
    452
    647
    media_image1.png
    Greyscale

Further, Chen discloses time-sensitive PIT join techniques using new and alternative feature definitions (Chen, ¶ 23, For purposes of this document, features are joins between different tables on the same key (e.g., address, timestamp) with perhaps simple transformations applied to some fields (discloses selecting an optimal time-sensitive PIT join alternative) that will make it more suitable for machine learning training and/or service, such as the nullification of values that are obviously incorrect), (Id., ¶ 24, These two concepts allow for the separation of physical mechanisms for managing tables from the logical design of features, while allowing each layer to be iterated on separately. For example, supporting a new table format merely requires the user to implement a common interface, which could immediately be used by a feature layer without any knowledge of the physical details), (Id., ¶ 39, As described above, features are joins between tables FIG. 3 is a diagram illustrating an example of features, in accordance with an example embodiment. Features are what someone building a model would care about. They define a configuration file that references all the fields needed from each of the tables defined.), (Id., ¶ 43, When iterating over a FeatureSet for model training, two commands may be defined: [0044] 1. build_feature_set_input_table—feature-set $FEATURE_SET—path $PATH which, given a FeatureSet definition, will perform all the joins necessary for the input table and can later be iterated on. [0045] 2. build_feature_set—feature-set $FEATURE_SET—input-table $PATH will build the feature set using the input table from 1. This command would fail if the input table does not contain the necessary columns. [0046] These two commands allow a user to iterate over feature set definitions without having to recalculate all joins every time).

    PNG
    media_image2.png
    285
    493
    media_image2.png
    Greyscale

One of ordinary skill in the art would have recognized that applying the known technique of Bonaci would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the cost estimation technique of Bonaci to the feature-definition-based PIT join teachings of Chen would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such data aggregation features into similar systems. Further, applying a cost minimization based alternative selection to Chen with feature definitions considered accordingly, would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow more cost effective data aggregation according to specific feature definitions.
Thus, through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Bonaci and Chen discloses …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator.
	It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the feature definition generation elements of Bonaci to include the PIT join elements of Chen in the analogous art of time sensitive data stores.
 The motivation for doing so would have been to implement an improved method “that produces wide tables containing features for machine learned models that allow for more efficient processing, improved sharing of information, and increased accuracy. These wide tables are made available for model training for multiple models and/or groups. These wide tables may be served on a serving database for fast access for API serving and lightweight access during interactive development. The solution decreases the time needed to add a new feature from several days to a couple of hours by enabling experimentation” (Chen, ¶ 21). Such improvements would benefit Bonaci’s method which provide an “ability to maintain feature values in real time [which] may improve the accuracy of the model. For example, the model may be able to make more accurate predictions, or a larger percentage of the predictions that the model makes may be accurate. The accuracy of the model may be improved because predictions made with more recent feature values more accurately reflect the current interests/environments, etc. that the prediction is being made about” [Chen, ¶ 21; Bonaci, ¶ 29].

Regarding Claim 2, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 1…
Bonaci discloses …wherein the execution cost estimator predicts the execution cost for each candidate execution alternative… (Bonaci, ¶ 82, According to an aspect, feature computation layer 106 is configured to simultaneously compute more than one feature, such as a large number of features. When simultaneously computing many features, it is possible to compute each feature independently and then join the computed values based on the entity and time. However, this approach is inefficient for at least two major reasons. First, computing each feature may involve retrieving and processing the same input events multiple times. Second, once the features are computed, performing an N-way join is an expensive operation. FIG. 5A illustrates an example N-way join 500a, (discloses PIT join techniques) such as a 3-way join, being performed after multiple features are individually computed. Computing two or more of the three features shown in FIG. 5A may involve retrieving and processing the same input events multiple times. After these three features are individually computed, they may be joined and output by the system), (Id., ¶ 83, Rather than employing this inefficient and expensive technique for simultaneously computing multiple features, feature computation layer 106 (discloses execution cost estimator) may instead combine all of the aggregations into a single pass over events that computes (at each point in time and for each entity) the value of all aggregations. (discloses execution alternative selected for minimal cost) The description of this flattened operation is called the aggregation plan and the process for producing it is described in more detail below. This flattened aggregation plan allows for the simultaneous computation of the aggregations necessary for all requested features with a single pass over the input, and therefore eliminates the need for the N-way join. FIG. 5B illustrates an example simultaneous feature computation 500b without an N-way join. As depicted in FIG. 5B, all of the multiple features are simultaneously computed with a single pass over the input, eliminating the need to retrieve and process the same input events multiple times), (Id., ¶ 157, In embodiments, resume tokens are utilized to continually apply the results of a query to a separate (i.e., external) data store with minimal cost. This may be achieved by first running an initial query, writing the results to the separate data store, and receiving a resume token. A query may be periodically run to update the results in the external store. Each query uses the resume token returned by the previous response. The new results may reflect only those results which have changed).
While suggested in at least Fig. 2 and related text, Bonaci does not explicitly disclose …further comprising: receiving a plurality of candidate source data layouts that are based on current feature computation pipelines and current source data layout; determining a plurality of candidate source data layouts; and selecting a new data source layout from the plurality of candidate source data layouts that are based on current feature computation pipelines and current source data layout.
However, Chen discloses further comprising: receiving a plurality of candidate source data layouts that are based on current feature computation pipelines and current source data layout (Chen, ¶ 22, For purposes of this document, a table may be defined as a collection of rows held on a structured format with the same schema. It can take multiple physical representations, including, for example, a Postgres table, Parquet file, CSV file, or Big Query Table, and usually contains a key that uniquely identifies each row);
determining a plurality of candidate source data layouts (Chen, ¶ 64, In an example embodiment, in order to reduce the size of the wide table, the feature store stores a set of features pivoted by (address token, list date) for comparables. Thus, subject-comp pairs can be precomputed per distance and stored in a comparables table. In one example embodiment, the comparable data is stored in an array in a wide table. In the wide table, each row of the table is pivoted per (address token, list date) and contains all of the features (hence, why it is called wide). Specifically, it may be a column in a wide table and data can be stored in the format of [(comp_address_token, distance)], where distance is the distance between the comparable and a subject property. Alternatively, it may be stored as a separate table in a flat fashion. Each comparable has a pair of address tokens and a distance. If auxiliary information is stored about comps, instead of a flat pair, the pair may be ordered as (subject_address-token, comp_address_token, comp_information). Alternatively, it may be stored in both a wide table and as a separate table. (discloses determining a plurality of candidate source data layouts) In some example embodiments, the system may filter to the appropriate set based on heuristics, as well as perform the scoring as part of the job hierarchy);
and selecting a new data source layout from the plurality of candidate source data layouts that are based on current feature computation pipelines and current source data layout… that utilizes the selected new data source layout (Id., ¶ 38, It should be noted that the Parquet model was used in this example as the basis format to simplify interaction from the features part. Parquet was chosen as it is a good fit for model training and also has good support for parallelization, which than then be easily transformed to any serving store, such as Postgre, Cassandra, and Base), (Id., ¶ 22, For purposes of this document, a table may be defined as a collection of rows held on a structured format with the same schema. It can take multiple physical representations, including, for example, a Postgres table, Parquet file, CSV file, or Big Query Table, and usually contains a key that uniquely identifies each row).
It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the feature definition generation elements of Bonaci to include format determination elements of Chen in the analogous art of time sensitive data stores for the same reasons as stated for claim 1.

Regarding Claim 3, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 2…
Through KSR Rationale C (See MPEP 2141(III)(C)), the combination of Bonaci and Chen discloses …wherein selecting a new data source layout further comprising evaluating the plurality of candidate source data layouts and the current source data layout based on a layout selection criterion, wherein the layout selection criterion comprises selection of a minimum cost configuration of the new data source layout.
First, Bonaci discloses considerations of cost minimization with respect to data structuring (Bonaci, ¶ 157, In embodiments, resume tokens are utilized to continually apply the results of a query to a separate (i.e., external) data store with minimal cost. This may be achieved by first running an initial query, writing the results to the separate data store, and receiving a resume token. A query may be periodically run to update the results in the external store. Each query uses the resume token returned by the previous response. The new results may reflect only those results which have changed), (Id., ¶ 159, In embodiments, the system 802 may be configured to perform temporally correct joins, such as with foreign entities. A value at a point in time is temporally correct if it includes all of the events up to (and including) that point in time and none of the events after that point in time. The result of any computation may thus be a sequence of values corresponding to the temporally correct value at each point in time. By contrast, many other data processing systems instead operate on all of the data (events) in the system. This may result in the correct values at a time after all of the events. However, due to delays that occur between when events happen and when they are added to the system, this may not result in a correct value at any given point in time), (Id., ¶ 160, Being able to compute values that are correct at historic points in time, as the system 802 is able to do, is critical to creating features that may be used to train predictive models without leakage. Rather than representing the value at every point in time, the system 802 may represent only those values that are observed, such as those values that are returned as part of the results, used in additional computations, etc. The system 802 may represent the value only at the points in time when it changes. For example, the computation “sum(Event.amount)” may only change when an event occurs).
Further, Chen discloses selecting a source data layout based on optimization criterion (Chen, ¶ 22, For purposes of this document, a table may be defined as a collection of rows held on a structured format with the same schema. It can take multiple physical representations, including, for example, a Postgres table, Parquet file, CSV file, or Big Query Table, and usually contains a key that uniquely identifies each row), (Id., ¶ 64, In an example embodiment, in order to reduce the size of the wide table, the feature store stores a set of features pivoted by (address token, list date) for comparables. Thus, subject-comp pairs can be precomputed per distance and stored in a comparables table. In one example embodiment, the comparable data is stored in an array in a wide table. In the wide table, each row of the table is pivoted per (address token, list date) and contains all of the features (hence, why it is called wide). Specifically, it may be a column in a wide table and data can be stored in the format of [(comp_address_token, distance)], where distance is the distance between the comparable and a subject property. Alternatively, it may be stored as a separate table in a flat fashion. Each comparable has a pair of address tokens and a distance. If auxiliary information is stored about comps, instead of a flat pair, the pair may be ordered as (subject_address-token, comp_address_token, comp_information). Alternatively, it may be stored in both a wide table and as a separate table. (discloses determining a plurality of candidate source data layouts) In some example embodiments, the system may filter to the appropriate set based on heuristics, as well as perform the scoring as part of the job hierarchy), (Id., ¶ 38, It should be noted that the Parquet model was used in this example as the basis format to simplify interaction from the features part. Parquet was chosen as it is a good fit for model training and also has good support for parallelization, which than then be easily transformed to any serving store, such as Postgre, Cassandra, and Base).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have selected a source data layout based on cost minimization criterion as in the improvement discussed in Bonaci in the system executing the method of Chen. As in Chen, it is within the capabilities of one of ordinary skill in the art to provide temporally correct join operations to create data tables using Bonaci’s cost minimization considerations with the predicted result of providing accurate, useful and timely information to the end user.
Thus, through KSR Rationale C (See MPEP 2141(III)(C)), the combination of Bonaci and Chen discloses …wherein selecting a new data source layout further comprising evaluating the plurality of candidate source data layouts and the current source data layout based on a layout selection criterion, wherein the layout selection criterion comprises selection of a minimum cost configuration of the new data source layout.
It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the feature definition generation elements of Bonaci to include format determination elements of Chen in the analogous art of time sensitive data stores for the same reasons as stated for claim 1.

Regarding Claim 5, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 1…
While suggested in at least Fig. 2 and related text, Bonaci does not explicitly disclose …wherein selecting the execution alternative further comprises evaluating, using a feature selection criterion one or more of the alternative feature definition and the new feature definition.
However, Chen discloses …wherein selecting the execution alternative further comprises evaluating, using a feature selection criterion one or more of the alternative feature definition and the new feature definition (Chen, ¶ 44, 1. build_feature_set_input_table—feature-set $FEATURE_SET—path $PATH which, given a FeatureSet definition, will perform all the joins necessary for the input table and can later be iterated on), (Id., ¶ 46, These two commands allow a user to iterate over feature set definitions without having to recalculate all joins every time), (Id., ¶ 60, At 516, add calculated features is performed. The calculated features are any features that potentially depend on information that is not possessed by the system until a customer visits a corresponding web site. For example, while the system may have an estimate of home square footage, the seller may provide a more accurate estimate when requesting a price estimate from the system. Calculated features run at the time that a prediction of home value is made so they can included updated information provided during the year), (Id., ¶ 63, It should be noted that the same mechanisms described above, including the feature stores and wide table, can be performed for all comparable properties as well as just the subject property. With comparables, however, it is difficult to precompute these values since there is no time yet specified in a query to guarantee against future leakage. For example, if one wanted to produce an estimate for a subject property based on comparables that occurred before the last time the subject property was sold (for example, 2012), it is difficult to precompute features for those comparables before that time is known (e.g., until the user specifies 2012 in the query, the system does not know to limit the data from comparables to 2012 or earlier). As such, in an example embodiment, features for all possible comparables for all times, based solely on distance from a subject property, for each possible subject property, can be precomputed. While this greatly improves performance at query-time, the result is more data than will fit on a single machine, and this type of computation is difficult to perform in a distributed fashion).
It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the feature definition generation elements of Bonaci to include the selection criterion elements of Chen in the analogous art of time sensitive data stores for the same reasons as stated for claim 1.

Regarding Claim 6, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 5…
Bonaci further discloses …wherein the feature selection criterion comprises minimization of data to be scanned using one or more of the alternative feature definition and the new feature definition (Bonaci, ¶ 149, In embodiments, system 802 may process all late data regardless of actual delay. Doing so in a resumable query may use any eligible intermediate state. An intermediate state is eligible if the latest event it includes is before the earliest new event. Resuming computation from such a state ensures events are processed in order, since no events later than any of the new events have yet been processed. The best eligible intermediate state may be the one that minimizes the number of events that need to be processed. The best eligible intermediate state may be determined by choosing the state with the maximum event time less than the latest new data point).

Regarding Claim 7, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 6…
Bonaci further discloses …wherein the minimization of data to be scanned further comprises calculating a benefit based on a number of data partitions to be read by the execution of a third PIT join using the alternative feature definition and a number of data partitions to be read by the execution of the second PIT join using the new feature definition(Bonaci, ¶ 149, In embodiments, system 802 may process all late data regardless of actual delay. Doing so in a resumable query may use any eligible intermediate state. An intermediate state is eligible if the latest event it includes is before the earliest new event. Resuming computation from such a state ensures events are processed in order, since no events later than any of the new events have yet been processed. The best eligible intermediate state may be the one that minimizes the number of events that need to be processed. The best eligible intermediate state may be determined by choosing the state with the maximum event time less than the latest new data point),  (Id., ¶ 152, Referring back to FIG. 8, the ability of the system 802 to handle late data while immediately producing results reflecting all received events and its ability to resume computations with minimal need to reprocess prior events are important for handling late data. As an example, many stream processing systems assume that late data may be bounded. Such stream processing systems may require users to configure a maximum expected delay and/or may only process events older than this maximum delay. They may discard any events that exceed the maximum lateness. All of these are undesirable features that the system 802 remedies), (Id., ¶ 161, A “temporally correct join” is a join that produces the correct value at every point in time. A lookup is one mechanism for performing a join. To be temporally correct, a lookup must use the temporally correct key to determine the foreign entity to lookup from and it must use the temporally correct value for the foreign entity. Performing a temporally correct join may require a temporal processing engine which can compute the correct values at specific points in time).

Regarding Claim 8, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 6…
Bonaci further discloses …wherein the minimization of data to be scanned further comprises calculating a benefit based on a size of data not to be read by the execution of a third PIT join using the alternative feature definition and a size of data not to be read by the execution of the second PIT join using the new feature definition (Bonaci, , ¶ 149, In embodiments, system 802 may process all late data regardless of actual delay. Doing so in a resumable query may use any eligible intermediate state. An intermediate state is eligible if the latest event it includes is before the earliest new event. Resuming computation from such a state ensures events are processed in order, since no events later than any of the new events have yet been processed. The best eligible intermediate state may be the one that minimizes the number of events that need to be processed. The best eligible intermediate state may be determined by choosing the state with the maximum event time less than the latest new data point),  (Id., ¶ 152, Referring back to FIG. 8, the ability of the system 802 to handle late data while immediately producing results reflecting all received events and its ability to resume computations with minimal need to reprocess prior events are important for handling late data. As an example, many stream processing systems assume that late data may be bounded. Such stream processing systems may require users to configure a maximum expected delay and/or may only process events older than this maximum delay. They may discard any events that exceed the maximum lateness. All of these are undesirable features that the system 802 remedies), (Id., ¶ 136, In embodiments, data slices may be used to select a random or pseudo-random sample of the entities. This may be used when iterating on feature engineering to reduce the total data set size being queried. This is more ideal than a solution that just takes a random sample of the events, because each of the selected entities has a complete set of events. Because each of the selected entities has a complete set of events, the feature values computed for them would be the same for the sampled data slice and on the entire data set. The selection of a random sample may use computed values. For instance, a sample of 1000 entities that are representatively distributed by age group may be requested by configuring a data slice that is sampled proportionally to the age groups in the entire data set. If a given age group represents 20% of the data, then there would be 200 entities in the produced sample), (Id., ¶ 143, Queries for the results since a previous resume token may return significantly smaller sets of results than a complete query. Rows which were previously returned may be omitted. Rows with values that have not changed since they were previously returned may also be omitted. This smaller result size may be faster to load into a storage system for serving feature values. Queries for the results since a previous page token may additionally, or alternatively, require significantly less compute time. This may be accomplished by storing intermediate states from the previous computation reflecting some or all of the events previously processed. When a query with a resume token is received, the intermediate state(s) from an earlier query may be used instead of reprocessing the corresponding events. This may allow the query to process only the new input since the previous query, rather than all of the input. In long running systems, it may quickly be the case that all previously accumulated data is significantly larger than the data arriving in any time interval, so this will often significantly speed up the queries).

Regarding Claim 9, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 2…
Bonaci further discloses …further comprising generating the plurality of candidate source data layouts further comprising: retrieving the plurality of computed feature definitions stored in a feature store (Id., ¶ 92, In an embodiment, feature engine 103 includes a feature store 107. Feature computation layer 106 may store the determined features and/or generated feature vectors to feature store 107. Feature store 107 makes deployed features available for users. According to an aspect, feature computation layer 106 keeps feature store 107 up-to-date, such as by computing and updating values of features when new events are received and/or when a request is received from a user. Based on the features stored to feature store 107, feature computation layer 106 may avoid recomputing features using the same events. For example, if feature computation layer 106 has determined features using events up to arrival time x, feature computation layer 106 determines features using events up to arrival time x+n by only considering events that arrived after arrival time x and before arrival time x+n);
extracting data sources used to compute the plurality of computed feature definitions stored in a feature store (Id., ¶ 93, According to an aspect, feature computation layer 106 updates the features and/or save the new features to feature store 107. As a result, feature store 107 is configured to make up-to-date query results 113 available on-demand and computed features are readily available for quick model application. A user who wants to use a model trained on a particular exported dataset may efficiently retrieve stored pre-computed values);
and partitioning each of the extracted data sources based on a predetermined granularity of time period (Id., ¶ 86, In an embodiment, it may be desirable for feature computation layer 106 to operate on a sample of data. If feature computation layer 106 can operate on a sample of data, quick, approximate answers can be provided in response to interactive queries. To make the sampling informative, complete information for a subset of entities is included, rather than a subset of events for every entity. Without lookups, this sampling can be accomplished by taking only those events related to a subset of the entities. If the events are partitioned by entity, this could be accomplished by considering only a subset of the partitions. With lookups it is necessary to make sure that all events referenced by the sampled primary entities are available. This can be done by computing the lookup keys that the primary entity sample will need (at the selected point(s) in time) and using that set of keys as the sample of foreign entity events. While generating this sample may require filtering events from all partitions, it may be reused as features are changed so long as the definition of the lookup key does not change. In practice, the lookup key tends to change less frequently than other parts of the feature definitions, so this kind of sampling is likely to improve the performance of interactive queries), (Id., ¶ 89, The techniques discussed above allow feature engineering system 100 to maintain live feature values. Specifically, the techniques discussed above allow feature engine 103 to compute feature values using a partitioned scan over historic events. This allows exporting feature vectors and/or examples computed over the historic data in an efficient manner. Once the feature vectors and/or examples have been produced, feature engine 103 may also be configured to maintain “live” feature values which may be retrieved for a time near the current time for use when applying the model. In an embodiment, this online maintenance is achieved by storing the final accumulator values produced during the export. At any point in time the “new” events may be treated as individual rows or a batch of rows and new accumulators (and feature values) may be produced), (Id., ¶ 37, Similarly, the user may configure the selection of corresponding label times used to generate the training examples for the event-based model in a variety of different ways. In an embodiment, the user may configure the label times to be selected at fixed times. The fixed time may be, for example, today, or on the 1st of a month, or any other fixed time. (discloses fixed time granularities) In another embodiment, the user may configure the label times to be selected at fixed offset times after the prediction times. For example, as discussed above, if an event-based model is to predict whether an individual will quit a subscription service within the next month, the user may configure the label times to be selected at the points-in-time that occur one month after the respective prediction time(s). In another embodiment, the user may configure the label times to be selected when a particular event occurs. For example, as discussed above, if an event-based model is to predict, when a house is listed for sale, how much that house will eventually sell for, then the user may configure the label times to be selected at those points-in-time at which houses eventually sell. In another embodiment, the user may configure the label times to be selected at computed times. For example, if an event-based model is to predict whether scheduled flights will depart on time, then the label times may be configured to be selected at points-in-time calculated to be the scheduled departure times. The user of system 100 understands its own data and the problem that needs to be solved, so the user of system 100 may be best equipped to define the manner in which the prediction time(s) and corresponding label time(s) should be selected by system 100).

Regarding Claim 10, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 9…
Bonaci further discloses …wherein the predetermined granularity of time period may be at least one of a month, a day, an hour, and a minute (Bonaci, ¶ 37, Similarly, the user may configure the selection of corresponding label times used to generate the training examples for the event-based model in a variety of different ways. In an embodiment, the user may configure the label times to be selected at fixed times. The fixed time may be, for example, today, or on the 1st of a month, or any other fixed time. (discloses fixed time granularity of one month) In another embodiment, the user may configure the label times to be selected at fixed offset times after the prediction times. For example, as discussed above, if an event-based model is to predict whether an individual will quit a subscription service within the next month, the user may configure the label times to be selected at the points-in-time that occur one month after the respective prediction time(s). In another embodiment, the user may configure the label times to be selected when a particular event occurs. For example, as discussed above, if an event-based model is to predict, when a house is listed for sale, how much that house will eventually sell for, then the user may configure the label times to be selected at those points-in-time at which houses eventually sell. In another embodiment, the user may configure the label times to be selected at computed times. For example, if an event-based model is to predict whether scheduled flights will depart on time, then the label times may be configured to be selected at points-in-time calculated to be the scheduled departure times. The user of system 100 understands its own data and the problem that needs to be solved, so the user of system 100 may be best equipped to define the manner in which the prediction time(s) and corresponding label time(s) should be selected by system 100).

Regarding Claim 11, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 1…
Bonaci further discloses …wherein the plurality of computed feature definitions are determined using PIT joins (Bonaci, ¶ 72, In an embodiment, in addition to aggregations over related events, computing each feature includes zero or more lookups of values computed over other sets of events. For example, if the features are computed over events performed by user entities it may be useful to lookup properties computed from events relating to specific videos. In this case, the features computed from events related to users are “lookup” values computed from events related to videos. This “lookup” operation provides similar capabilities to a join operation), (Id., ¶ 73, If feature computation layer 106 is configured to operate over all of the input events for both the primary entity and the foreign entity, feature computation layer 106 could simultaneously compute all the necessary aggregations. While this is conceptually how temporal aggregations with lookups behave, feature computation layer 106 performs this in a partitioned and potentially distributed manner. Without lookups, temporal aggregations may be executed entirely partitioned by entity. When executing temporal joins (disclose point-in-time joins) across multiple partitions, any lookup may request data from any other entity, and therefore any other partition, thus requiring some mechanism for cross-partition communication), (Id., ¶ 66, According to an aspect, feature computation layer 106 is configured to compute features by performing aggregations across events associated with an entity. Computing features from large amounts of raw data is a technically complicated process, as it may involve computing aggregate properties across all of the raw data. In an embodiment, feature computation layer 106 is configured to compute event-based features by performing temporal aggregations across events associated with an entity. To perform temporal aggregations, feature computation layer 106 produces a feature value at every time, aggregating all of the events that happened up to that particular time. Feature computation layer 106 does not aggregate everything and produce a single value—this would prevent the feature computation layer 106 from determining how the feature value changed over time. It is important that feature vectors and/or examples reflect the real feature values that will be available when applying the model as closely as possible. For this reason, if the model is being applied to “live” feature values (computed over all the events up to that point in time), each feature vectors and/or example should also be computed over the events up to the point in time selected for that example).

Regarding Claim 12, Bonaci discloses …One or more physically manufactured computer-readable storage media, encoding computer-executable instructions for executing on a computer system a computer process, the computer process comprising: receiving a new feature definition specifiying parameters of the feature (Bonaci, ¶ 32, In an embodiment, feature engineering system 100 may be configured to generate feature vectors and/or examples associated with a particular entity. As is discussed below in more detail, a user of system 100, such as a data scientist, may be responsible for instructing system 100 which entity or entities should be included in the feature vectors and/or examples. For example, if the user of system 100 wants to train a model to predict how much homes will sell for in Seattle, the user of system 100 may instruct system 100 to choose houses in Seattle as the entities that should be included in the feature vectors (discloses receiving a new feature definition specifying parameters of the feature) and/or examples. If the user instructed system 100 to choose, for example, houses in Los Angeles as the set of entities that should be included in the feature vectors and/or examples, the model may not be able to accurately predict selling prices for homes in Seattle), (Id., ¶ 120, Once the user has created and/or changed the feature definition and/or example selection, the feature engineering system can use this information to efficiently create the desired features and/or feature vectors and/or examples for the user. For example, the feature engineering system can use this information to create the desired features and/or feature vectors and/or examples for the user by re-using previous computations. After the desired features and/or feature vectors and/or examples have been generated, they may be exported to the user. At 704, the generated features and/or feature vectors and/or examples may be exported to the user. The user may use these exported features and/or feature vectors and/or examples to train and/or validate/evaluate the model. At 706, the user may train the model on any training examples generated by the feature engineering system. At 708, the user may validate and/or evaluate the model using any validation examples generated by the feature engineering system. If the user wants the feature engineering system to generate new or different features and/or feature vectors and/or examples, the user may easily change the dataset being used or experiment with a different dataset. For example, the user may want to try a new dataset to see if the model performs better after being trained with the new dataset. The method 700 may return to step 702, where the user may change the feature definition and/or update the example selection configuration. The user may continue to perform this iterative process until the model is generating results that satisfy the user), (Id., ¶ 217, The system memory 1928 in FIG. 19 may include computer system readable media in the form of volatile memory, such as random access memory (‘RAM’) 1930 and/or cache memory 1932. Computing node 1900 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, a storage system 1934 may be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk, e.g., a “floppy disk,” and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media may be provided. In such instances, each may be connected to bus 1918 by one or more data media interfaces. As will be further depicted and described below, memory 1928 may include at least one program product having a set, e.g., at least one, of program modules that are configured to carry Gut the functions of embodiments of the invention);
comparing the new feature definition with a plurality of computed feature definitions stored in a feature store (Id., ¶ 61, In embodiments, feature computation layer 106 is configured to determine the features using the raw data and/or events stored to related event store 105. The feature computation layer 106 may be configured to determine the features by applying a variety of numerical processes to the data, such as arithmetic operations, aggregations, and various other techniques. In an embodiment, a user of the system 100 may determine useful features for a model by evaluating the features generated by feature computation layer 106 using both numerical methods and attempts to train a model using the examples generated from these features. By attempting to train the model using the generated examples, the user may see if the model trained using the features of interest has less error, such as by testing the model using a validation set, as compared to the model trained with different features), (Id., ¶ 92, In an embodiment, feature engine 103 includes a feature store 107. (discloses feature store) Feature computation layer 106 may store the determined features and/or generated feature vectors to feature store 107. Feature store 107 makes deployed features available for users. According to an aspect, feature computation layer 106 keeps feature store 107 up-to-date, such as by computing and updating values of features when new events are received and/or when a request is received from a user. Based on the features stored to feature store 107, feature computation layer 106 may avoid recomputing features using the same events. For example, if feature computation layer 106 has determined features using events up to arrival time x, feature computation layer 106 determines features using events up to arrival time x+n by only considering events that arrived after arrival time x and before arrival time x+n), (Id., ¶ 93, According to an aspect, feature computation layer 106 updates the features and/or save the new features to feature store 107. As a result, feature store 107 is configured to make up-to-date query results 113 available on-demand and computed features are readily available for quick model application. A user who wants to use a model trained on a particular exported dataset may efficiently retrieve stored pre-computed values);
in response to determining that the new feature definition is at least partially included in a matched feature definition of the plurality of computed feature definitions, generating one or more alternative feature definitions based on the new feature definition and the matched feature definition (Id., ¶ 42, Feature engineering system 100 is configured to use the data from data sources 101,102 to efficiently provide and/or generate feature vectors, such as a predictor feature vector, for a user to use in the application stage. Applying the model may involve computing a feature vector using the same computations that were used in training of the model, but for an entity or time that may not have been part of the training or validation examples. Because feature engineering system 100 is also configured to generate feature vectors for the user to use in the training stage, the same feature (discloses generating new feature definitions) vector definitions that were used for training are automatically available during production. As discussed above, making the same feature vector definitions used for training automatically available during production allows for event-based models to be successfully used in production. For example, feature engineering system 100 may provide and/or generate predictor feature vectors for a user to use in the application stage, while the feature engineering system 100 may provide and/or generate predictor and label feature vectors for a user to use in the training and validation stage. Feature engineering system 100 may generate the feature vectors and/or validation examples in a similar manner as described above for training examples), (Id., ¶ 43, System 100 is configured to ingest event data from one or more sources 101, 102 of data. In some configurations, a data source includes historical data, e.g., from historical data source 101. In that case, the data includes data that was received and/or stored within a historic time period i.e. not real-time. The historical data is typically indicative of events that occurred within a previous time period. For example, the historic time period may be a prior year or a prior two years, e.g., relative to a current time, etc. Historical data source 101 may be stored in and/or retrieved from one or more files, one or more databases, an offline source, and the like or may be streamed from an external source. The historical data ingested by system 100 may be associated with a user of system 100, such as a data scientist, that wants to train and implement a model using features generated from the data. System 100 may ingest the data from one or more sources 101,102 and use it to compute features), (Id., ¶ 120, Once the user has created and/or changed the feature definition and/or example selection, the feature engineering system can use this information to efficiently create the desired features and/or feature vectors and/or examples for the user. For example, the feature engineering system can use this information to create the desired features and/or feature vectors and/or examples for the user by re-using previous computations. After the desired features and/or feature vectors and/or examples have been generated, they may be exported to the user. At 704, the generated features and/or feature vectors and/or examples may be exported to the user. The user may use these exported features and/or feature vectors and/or examples to train and/or validate/evaluate the model. At 706, the user may train the model on any training examples generated by the feature engineering system. At 708, the user may validate and/or evaluate the model using any validation examples generated by the feature engineering system. If the user wants the feature engineering system to generate new or different features and/or feature vectors and/or examples, the user may easily change the dataset being used or experiment with a different dataset. For example, the user may want to try a new dataset to see if the model performs better after being trained with the new dataset. The method 700 may return to step 702, where the user may change the feature definition and/or update the example selection configuration. The user may continue to perform this iterative process until the model is generating results that satisfy the user), (Id., ¶ 121, FIG. 8 shows an example network 800 for feature engineering. The network 800 includes a feature engineering system 802 and one or more clients 804. System 802 may be similar to and/or perform similar functions as those performed by system 100 and/or system 200 described above. System 802 includes an API Server 808, one or more compute nodes 814, metadata storage 810, event data storage 816, staged data storage 806, prepared data storage 812, and result data storage 818. The event data storage 816, the staged data storage 806, and/or the prepared data storage 812 may utilize an external storage system, such as Amazon S3 or any other external storage system. The compute nodes 814 may be, for example, a feature engine, such as one of the feature engines described above);
and executing the selected execution alternative using a compute engine to generate the feature (Id., ¶ 90, Feature engineering system 100 (discloses compute engine) may simplify collaboration in feature generation (discloses generating the feature) and/or selection. As discussed above, features are often defined by users, such as data scientists. A company may have multiple data scientists producing features for one or more models. The data scientists may need to use different tools to access different kinds of raw data and/or events, further complicating the process of producing features. Collaboration on features produced in ad-hoc and varied ways makes it difficult to share features between users and/or projects. In addition, the techniques for producing features may vary based on the data size and the need for producing the feature vectors “in a production environment.” This may lead to the need to implement features multiple times for different situations. However, feature engineering system 100 may address these shortcomings by ingesting and/or saving raw data and/or events from a variety of sources and making the features available to users in different locations and/or using different devices, such as via the feature studio described further herein), (Id., ¶ 91, In an embodiment, feature computation layer 106 is configured to compute feature vectors. A feature vector is a list of features of an entity. The feature computation layer 106 may be configured to compute and/or update feature vectors as events are ingested by the feature engine 103. The feature computation layer 106 may be configured to compute and/or update feature vectors in response to user queries).
While suggested in at least Fig. 2 and related text, Bonaci does not explicitly disclose …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator; 
However, through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Bonaci and Chen discloses …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator.
First, Bonaci disclose a data aggregation technique including an expense estimation and minimization technique which selects an alternative technique for minimal cost (Bonaci, ¶ 82, According to an aspect, feature computation layer 106 is configured to simultaneously compute more than one feature, such as a large number of features. When simultaneously computing many features, it is possible to compute each feature independently and then join the computed values based on the entity and time. However, this approach is inefficient for at least two major reasons. First, computing each feature may involve retrieving and processing the same input events multiple times. Second, once the features are computed, performing an N-way join is an expensive operation. FIG. 5A illustrates an example N-way join 500a, (discloses PIT join techniques) such as a 3-way join, being performed after multiple features are individually computed. Computing two or more of the three features shown in FIG. 5A may involve retrieving and processing the same input events multiple times. After these three features are individually computed, they may be joined and output by the system), (Id., ¶ 83, Rather than employing this inefficient and expensive technique for simultaneously computing multiple features, feature computation layer 106 (discloses execution cost estimator) may instead combine all of the aggregations into a single pass over events that computes (at each point in time and for each entity) the value of all aggregations. (discloses execution alternative selected for minimal cost) The description of this flattened operation is called the aggregation plan and the process for producing it is described in more detail below. This flattened aggregation plan allows for the simultaneous computation of the aggregations necessary for all requested features with a single pass over the input, and therefore eliminates the need for the N-way join. FIG. 5B illustrates an example simultaneous feature computation 500b without an N-way join. As depicted in FIG. 5B, all of the multiple features are simultaneously computed with a single pass over the input, eliminating the need to retrieve and process the same input events multiple times), (Id., ¶ 157, In embodiments, resume tokens are utilized to continually apply the results of a query to a separate (i.e., external) data store with minimal cost. This may be achieved by first running an initial query, writing the results to the separate data store, and receiving a resume token. A query may be periodically run to update the results in the external store. Each query uses the resume token returned by the previous response. The new results may reflect only those results which have changed).

    PNG
    media_image1.png
    452
    647
    media_image1.png
    Greyscale

Further, Chen discloses time-sensitive PIT join techniques using new and alternative feature definitions (Chen, ¶ 23, For purposes of this document, features are joins between different tables on the same key (e.g., address, timestamp) with perhaps simple transformations applied to some fields (discloses selecting an optimal time-sensitive PIT join alternative) that will make it more suitable for machine learning training and/or service, such as the nullification of values that are obviously incorrect), (Id., ¶ 24, These two concepts allow for the separation of physical mechanisms for managing tables from the logical design of features, while allowing each layer to be iterated on separately. For example, supporting a new table format merely requires the user to implement a common interface, which could immediately be used by a feature layer without any knowledge of the physical details), (Id., ¶ 39, As described above, features are joins between tables FIG. 3 is a diagram illustrating an example of features, in accordance with an example embodiment. Features are what someone building a model would care about. They define a configuration file that references all the fields needed from each of the tables defined.), (Id., ¶ 43, When iterating over a FeatureSet for model training, two commands may be defined: [0044] 1. build_feature_set_input_table—feature-set $FEATURE_SET—path $PATH which, given a FeatureSet definition, will perform all the joins necessary for the input table and can later be iterated on. [0045] 2. build_feature_set—feature-set $FEATURE_SET—input-table $PATH will build the feature set using the input table from 1. This command would fail if the input table does not contain the necessary columns. [0046] These two commands allow a user to iterate over feature set definitions without having to recalculate all joins every time).

    PNG
    media_image2.png
    285
    493
    media_image2.png
    Greyscale

One of ordinary skill in the art would have recognized that applying the known technique of Bonaci would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the cost estimation technique of Bonaci to the feature-definition-based PIT join teachings of Chen would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such data aggregation features into similar systems. Further, applying a cost minimization based alternative selection to Chen with feature definitions considered accordingly, would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow more cost effective data aggregation according to specific feature definitions.
Thus, through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Bonaci and Chen discloses …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator.
It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the feature definition generation elements of Bonaci to include the PIT join elements of Chen in the analogous art of time sensitive data stores for the same reasons as stated for claim 1.

Regarding Claims 13-15, these claims recite limitations substantially similar to those in claims 6-8, respectively, and are rejected for the same reasons as stated above.

Regarding Claims 16-17, these claims recite limitations substantially similar to those in claims 2 and 9, respectively, and are rejected for the same reasons as stated above.

Regarding Claim 18, Bonaci discloses … A system comprising: memory; one or more processor units; a feature store data preparation optimization system stored in the memory and executable by the one or more processor units, the feature store data preparation optimization system encoding computer-executable instructions on the memory for executing on the one or more processor units a computer process, the computer process comprising: receiving a new feature definition specifying parameters of the feature (Bonaci, ¶ 32, In an embodiment, feature engineering system 100 may be configured to generate feature vectors and/or examples associated with a particular entity. As is discussed below in more detail, a user of system 100, such as a data scientist, may be responsible for instructing system 100 which entity or entities should be included in the feature vectors and/or examples. For example, if the user of system 100 wants to train a model to predict how much homes will sell for in Seattle, the user of system 100 may instruct system 100 to choose houses in Seattle as the entities that should be included in the feature vectors (discloses receiving a new feature definition specifying parameters of the feature) and/or examples. If the user instructed system 100 to choose, for example, houses in Los Angeles as the set of entities that should be included in the feature vectors and/or examples, the model may not be able to accurately predict selling prices for homes in Seattle), (Id., ¶ 120, Once the user has created and/or changed the feature definition and/or example selection, the feature engineering system can use this information to efficiently create the desired features and/or feature vectors and/or examples for the user. For example, the feature engineering system can use this information to create the desired features and/or feature vectors and/or examples for the user by re-using previous computations. After the desired features and/or feature vectors and/or examples have been generated, they may be exported to the user. At 704, the generated features and/or feature vectors and/or examples may be exported to the user. The user may use these exported features and/or feature vectors and/or examples to train and/or validate/evaluate the model. At 706, the user may train the model on any training examples generated by the feature engineering system. At 708, the user may validate and/or evaluate the model using any validation examples generated by the feature engineering system. If the user wants the feature engineering system to generate new or different features and/or feature vectors and/or examples, the user may easily change the dataset being used or experiment with a different dataset. For example, the user may want to try a new dataset to see if the model performs better after being trained with the new dataset. The method 700 may return to step 702, where the user may change the feature definition and/or update the example selection configuration. The user may continue to perform this iterative process until the model is generating results that satisfy the user), (Id., ¶ 217, The system memory 1928 in FIG. 19 may include computer system readable media in the form of volatile memory, such as random access memory (‘RAM’) 1930 and/or cache memory 1932. Computing node 1900 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, a storage system 1934 may be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a “hard drive”). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk, e.g., a “floppy disk,” and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media may be provided. In such instances, each may be connected to bus 1918 by one or more data media interfaces. As will be further depicted and described below, memory 1928 may include at least one program product having a set, e.g., at least one, of program modules that are configured to carry Gut the functions of embodiments of the invention);
comparing the new feature definition with a plurality of computed feature definitions stored in a feature store (Id., ¶ 61, In embodiments, feature computation layer 106 is configured to determine the features using the raw data and/or events stored to related event store 105. The feature computation layer 106 may be configured to determine the features by applying a variety of numerical processes to the data, such as arithmetic operations, aggregations, and various other techniques. In an embodiment, a user of the system 100 may determine useful features for a model by evaluating the features generated by feature computation layer 106 using both numerical methods and attempts to train a model using the examples generated from these features. By attempting to train the model using the generated examples, the user may see if the model trained using the features of interest has less error, such as by testing the model using a validation set, as compared to the model trained with different features), (Id., ¶ 92, In an embodiment, feature engine 103 includes a feature store 107. (discloses feature store) Feature computation layer 106 may store the determined features and/or generated feature vectors to feature store 107. Feature store 107 makes deployed features available for users. According to an aspect, feature computation layer 106 keeps feature store 107 up-to-date, such as by computing and updating values of features when new events are received and/or when a request is received from a user. Based on the features stored to feature store 107, feature computation layer 106 may avoid recomputing features using the same events. For example, if feature computation layer 106 has determined features using events up to arrival time x, feature computation layer 106 determines features using events up to arrival time x+n by only considering events that arrived after arrival time x and before arrival time x+n), (Id., ¶ 93, According to an aspect, feature computation layer 106 updates the features and/or save the new features to feature store 107. As a result, feature store 107 is configured to make up-to-date query results 113 available on-demand and computed features are readily available for quick model application. A user who wants to use a model trained on a particular exported dataset may efficiently retrieve stored pre-computed values);
in response to determining that the new feature definition is at least partially included in a matched feature definition of the plurality of computed feature definitions, generating one or more alternative feature definitions based on the new feature definition and the matched feature definition (Id., ¶ 42, Feature engineering system 100 is configured to use the data from data sources 101,102 to efficiently provide and/or generate feature vectors, such as a predictor feature vector, for a user to use in the application stage. Applying the model may involve computing a feature vector using the same computations that were used in training of the model, but for an entity or time that may not have been part of the training or validation examples. Because feature engineering system 100 is also configured to generate feature vectors for the user to use in the training stage, the same feature (discloses generating new feature definitions) vector definitions that were used for training are automatically available during production. As discussed above, making the same feature vector definitions used for training automatically available during production allows for event-based models to be successfully used in production. For example, feature engineering system 100 may provide and/or generate predictor feature vectors for a user to use in the application stage, while the feature engineering system 100 may provide and/or generate predictor and label feature vectors for a user to use in the training and validation stage. Feature engineering system 100 may generate the feature vectors and/or validation examples in a similar manner as described above for training examples), (Id., ¶ 43, System 100 is configured to ingest event data from one or more sources 101, 102 of data. In some configurations, a data source includes historical data, e.g., from historical data source 101. In that case, the data includes data that was received and/or stored within a historic time period i.e. not real-time. The historical data is typically indicative of events that occurred within a previous time period. For example, the historic time period may be a prior year or a prior two years, e.g., relative to a current time, etc. Historical data source 101 may be stored in and/or retrieved from one or more files, one or more databases, an offline source, and the like or may be streamed from an external source. The historical data ingested by system 100 may be associated with a user of system 100, such as a data scientist, that wants to train and implement a model using features generated from the data. System 100 may ingest the data from one or more sources 101,102 and use it to compute features), (Id., ¶ 120, Once the user has created and/or changed the feature definition and/or example selection, the feature engineering system can use this information to efficiently create the desired features and/or feature vectors and/or examples for the user. For example, the feature engineering system can use this information to create the desired features and/or feature vectors and/or examples for the user by re-using previous computations. After the desired features and/or feature vectors and/or examples have been generated, they may be exported to the user. At 704, the generated features and/or feature vectors and/or examples may be exported to the user. The user may use these exported features and/or feature vectors and/or examples to train and/or validate/evaluate the model. At 706, the user may train the model on any training examples generated by the feature engineering system. At 708, the user may validate and/or evaluate the model using any validation examples generated by the feature engineering system. If the user wants the feature engineering system to generate new or different features and/or feature vectors and/or examples, the user may easily change the dataset being used or experiment with a different dataset. For example, the user may want to try a new dataset to see if the model performs better after being trained with the new dataset. The method 700 may return to step 702, where the user may change the feature definition and/or update the example selection configuration. The user may continue to perform this iterative process until the model is generating results that satisfy the user), (Id., ¶ 121, FIG. 8 shows an example network 800 for feature engineering. The network 800 includes a feature engineering system 802 and one or more clients 804. System 802 may be similar to and/or perform similar functions as those performed by system 100 and/or system 200 described above. System 802 includes an API Server 808, one or more compute nodes 814, metadata storage 810, event data storage 816, staged data storage 806, prepared data storage 812, and result data storage 818. The event data storage 816, the staged data storage 806, and/or the prepared data storage 812 may utilize an external storage system, such as Amazon S3 or any other external storage system. The compute nodes 814 may be, for example, a feature engine, such as one of the feature engines described above);
and executing the selected execution alternative using a compute engine to generate the feature (Id., ¶ 90, Feature engineering system 100 (discloses compute engine) may simplify collaboration in feature generation (discloses generating the feature) and/or selection. As discussed above, features are often defined by users, such as data scientists. A company may have multiple data scientists producing features for one or more models. The data scientists may need to use different tools to access different kinds of raw data and/or events, further complicating the process of producing features. Collaboration on features produced in ad-hoc and varied ways makes it difficult to share features between users and/or projects. In addition, the techniques for producing features may vary based on the data size and the need for producing the feature vectors “in a production environment.” This may lead to the need to implement features multiple times for different situations. However, feature engineering system 100 may address these shortcomings by ingesting and/or saving raw data and/or events from a variety of sources and making the features available to users in different locations and/or using different devices, such as via the feature studio described further herein), (Id., ¶ 91, In an embodiment, feature computation layer 106 is configured to compute feature vectors. A feature vector is a list of features of an entity. The feature computation layer 106 may be configured to compute and/or update feature vectors as events are ingested by the feature engine 103. The feature computation layer 106 may be configured to compute and/or update feature vectors in response to user queries).
While suggested in at least Fig. 2 and related text, Bonaci does not explicitly disclose …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator; 
However, through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Bonaci and Chen discloses …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator.
First, Bonaci disclose a data aggregation technique including an expense estimation and minimization technique which selects an alternative technique for minimal cost (Bonaci, ¶ 82, According to an aspect, feature computation layer 106 is configured to simultaneously compute more than one feature, such as a large number of features. When simultaneously computing many features, it is possible to compute each feature independently and then join the computed values based on the entity and time. However, this approach is inefficient for at least two major reasons. First, computing each feature may involve retrieving and processing the same input events multiple times. Second, once the features are computed, performing an N-way join is an expensive operation. FIG. 5A illustrates an example N-way join 500a, (discloses PIT join techniques) such as a 3-way join, being performed after multiple features are individually computed. Computing two or more of the three features shown in FIG. 5A may involve retrieving and processing the same input events multiple times. After these three features are individually computed, they may be joined and output by the system), (Id., ¶ 83, Rather than employing this inefficient and expensive technique for simultaneously computing multiple features, feature computation layer 106 (discloses execution cost estimator) may instead combine all of the aggregations into a single pass over events that computes (at each point in time and for each entity) the value of all aggregations. (discloses execution alternative selected for minimal cost) The description of this flattened operation is called the aggregation plan and the process for producing it is described in more detail below. This flattened aggregation plan allows for the simultaneous computation of the aggregations necessary for all requested features with a single pass over the input, and therefore eliminates the need for the N-way join. FIG. 5B illustrates an example simultaneous feature computation 500b without an N-way join. As depicted in FIG. 5B, all of the multiple features are simultaneously computed with a single pass over the input, eliminating the need to retrieve and process the same input events multiple times), (Id., ¶ 157, In embodiments, resume tokens are utilized to continually apply the results of a query to a separate (i.e., external) data store with minimal cost. This may be achieved by first running an initial query, writing the results to the separate data store, and receiving a resume token. A query may be periodically run to update the results in the external store. Each query uses the resume token returned by the previous response. The new results may reflect only those results which have changed).

    PNG
    media_image1.png
    452
    647
    media_image1.png
    Greyscale

Further, Chen discloses time-sensitive PIT join techniques using new and alternative feature definitions (Chen, ¶ 23, For purposes of this document, features are joins between different tables on the same key (e.g., address, timestamp) with perhaps simple transformations applied to some fields (discloses selecting an optimal time-sensitive PIT join alternative) that will make it more suitable for machine learning training and/or service, such as the nullification of values that are obviously incorrect), (Id., ¶ 24, These two concepts allow for the separation of physical mechanisms for managing tables from the logical design of features, while allowing each layer to be iterated on separately. For example, supporting a new table format merely requires the user to implement a common interface, which could immediately be used by a feature layer without any knowledge of the physical details), (Id., ¶ 39, As described above, features are joins between tables FIG. 3 is a diagram illustrating an example of features, in accordance with an example embodiment. Features are what someone building a model would care about. They define a configuration file that references all the fields needed from each of the tables defined.), (Id., ¶ 43, When iterating over a FeatureSet for model training, two commands may be defined: [0044] 1. build_feature_set_input_table—feature-set $FEATURE_SET—path $PATH which, given a FeatureSet definition, will perform all the joins necessary for the input table and can later be iterated on. [0045] 2. build_feature_set—feature-set $FEATURE_SET—input-table $PATH will build the feature set using the input table from 1. This command would fail if the input table does not contain the necessary columns. [0046] These two commands allow a user to iterate over feature set definitions without having to recalculate all joins every time).

    PNG
    media_image2.png
    285
    493
    media_image2.png
    Greyscale

One of ordinary skill in the art would have recognized that applying the known technique of Bonaci would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the cost estimation technique of Bonaci to the feature-definition-based PIT join teachings of Chen would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such data aggregation features into similar systems. Further, applying a cost minimization based alternative selection to Chen with feature definitions considered accordingly, would have been recognized by those of ordinary skill in the art as resulting in an improved system that would allow more cost effective data aggregation according to specific feature definitions.
Thus, through KSR Rationale D (See MPEP 2141(III)(D)), the combination of Bonaci and Chen discloses …selecting an execution alternative from among candidate execution alternatives, using an execution cost estimator that predicts an execution cost for each candidate execution alternative, wherein the candidate execution alternatives include (i) an execution of a PIT join using the alternative feature definition and (ii) an execution of a second PIT join using the new feature definition, the selected execution alternative being selected based on the execution costs predicted by the execution cost estimator.
It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the feature definition generation elements of Bonaci to include the PIT join elements of Chen in the analogous art of time sensitive data stores for the same reasons as stated for claim 1.

Regarding Claims 19-20, these claims recite limitations substantially similar to those in claims 2 and 9, respectively, and are rejected for the same reasons as stated above.



Claims 4 are rejected under 35 U.S.C. 103 as being unpatentable over Bonaci in view of Chen and in further view of Gao et al., U.S. Publication No. 2022/0101438 [hereinafter Gao]].


Regarding Claim 4, the combination of Bonaci and Chen discloses …The computer-implemented method of claim 3…
While suggested in at least Fig. 2 and related text of Bonaci, the combination of Bonaci and Chen does not explicitly disclose …wherein the selection of the minimum cost configuration is implemented using binary integer programming. 
However, Gao discloses …wherein the selection of the minimum cost configuration is implemented using binary integer programming (Gao, 529,  For Tail Risk Optimizer, mixed integer programming, binary integer programming and linear programming with rounding techniques are available. CVaR-Mean frontier can be shaped with the optimizer parallel computation ability and then illustrate the relationship between CVaR and expected return. Different CVaR-Mean frontier can be visualized according to diversification preferences (see FIG. 71) and market scenarios. The mixed integer programming can offer more accurate results according to asset price and asset tradable amount while the linear programming with rounding techniques guarantees faster performance on large scale computation (See FIG. 72)), (Id., ¶ 135, In some implementations, the database calculation engine may utilize various innovative data reduction, scaling and parallel computing techniques (e.g., techniques to use global temporary tables and sessions, data reduction techniques to drastically reduce the amount of data used for processing thus lowering processing time, and several other data parallelization techniques used for generating simulation data): [0136] Use of Multiple Batches to achieve higher degree of parallelism (DOP) [0137] Use of Global Temporary Tables (GTT) to be able to run batch in multiple sessions and limit temporary storage requirements [0138] Use of Data Reduction techniques to limit full table scans for joins between Factor Exposure and Factor Simulation table [0139] Use of Parallel Query to parallelize generation of Asset Simulation and Contribution to Value at Risk data [0140] Use of Parallel DML to parallelize inserting data related to Asset Simulation and Contribution to Value at Risk [0141] Use of DDL for faster execution of delete statements to speed up cleanup of global temporary tables), (Id., ¶ 127, In some embodiments, the MLPO may implement a database calculation engine for calculating simulation data. The database calculation engine may be a SQL-based solution that effectively utilizes different data reduction and parallel execution techniques to reduce the overall response time. Instead of using a dedicated high-performance platform (e.g., IBM Netezza Data Appliance) the database calculation engine may be used for simulation calculation providing a faster, streamlined, cost effective and scalable solution (e.g., using Oracle RDS on Cloud) that provides calculation results in substantially less amount of time. Further, the database calculation engine eliminates having to maintain a complex infrastructure and applications associated with using a dedicated high-performance platform, and having to pay for additional licensing and maintenance costs), (Id., ¶ 254, 7B illustrates embodiments of a computation engine architecture that supports and implements the business logic. In various implementations the computation engine may be characterized by the following features: … [0261] 7: Cost Optimization: Inexpensive commodity-grade virtual servers may be provisioned dynamically leveraging inexpensive Spot Instances or Reserved Instances).
It would have been obvious to a person of ordinary skill in the art before the effective filing date to have modified the feature definition generation elements of Bonaci and the PIT join elements of Chen to include the binary integer programming elements of Gao in the analogous art of machine learning portfolio simulating and optimizing apparatuses, methods and systems.
 The motivation for doing so would have been to “offer more accurate results according to asset price and asset tradable amount while the linear programming with rounding techniques guarantees faster performance on large scale computation (See FIG. 72)” and wherein a “wide array format may facilitate improved performance when calculating portfolio level return metrics”(Gao, ¶¶ 529, 488), wherein such improvements would have benefited Chen’s method which  “produces wide tables containing features for machine learned models that allow for more efficient processing, improved sharing of information, and increased accuracy. These wide tables are made available for model training for multiple models and/or groups. These wide tables may be served on a serving database for fast access for API serving and lightweight access during interactive development. The solution decreases the time needed to add a new feature from several days to a couple of hours by enabling experimentation” (Chen, ¶ 21). Such improvements would have further benefitted Bonaci’s method which provide an “ability to maintain feature values in real time [which] may improve the accuracy of the model. For example, the model may be able to make more accurate predictions, or a larger percentage of the predictions that the model makes may be accurate. The accuracy of the model may be improved because predictions made with more recent feature values more accurately reflect the current interests/environments, etc. that the prediction is being made about” [Gao, ¶¶ 529, 488; Chen, ¶ 21; Bonaci, ¶ 29].


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Johri et al., U.S. Patent No. 8,965,879 discloses a unique join data caching method.
Zimmerman et al., U.S. Publication No. 2011/0022552 discloses systems and methods for implementing a machine-learning agent to retrieve information in response to a message.
Danna, U.S. Publication No. 2021/0403036 discloses systems and methods for encoding and searching scenario information.

 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS D BOLEN whose telephone number is (408)918-7631. The examiner can normally be reached Monday - Friday 8:00 AM - 5:00 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patty Munson can be reached at (571) 270-5396. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NICHOLAS D BOLEN/               Examiner, Art Unit 3624                                                                                                                                                                                         /HAMZEH OBAID/Primary Examiner, Art Unit 3624
Read full office action
Prosecution Timeline

Jun 09, 2023
Application Filed
Jun 14, 2025
Non-Final Rejection — §101, §103
Sep 12, 2025
Applicant Interview (Telephonic)
Sep 12, 2025
Examiner Interview Summary
Sep 17, 2025
Response Filed
Feb 06, 2026
Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

15/631,233
Patent 12205077
SMART REMINDERS FOR RESPONDING TO EMAILS
2y 5m to grant Granted Jan 21, 2025
15/631,266
Patent 12198105
SMART REMINDERS FOR RESPONDING TO EMAILS
2y 5m to grant Granted Jan 14, 2025
17/156,123
Patent 12093873
USER PERFORMANCE ANALYSIS AND CORRECTION FOR S/W
2y 5m to grant Granted Sep 17, 2024
17/083,682
Patent 11935077
OPERATIONAL PREDICTIVE SCORING OF COMPONENTS AND SERVICES OF AN INFORMATION TECHNOLOGY SYSTEM
2y 5m to grant Granted Mar 19, 2024
16/041,018
Patent 11635224
OPERATION SUPPORT SYSTEM, OPERATION SUPPORT METHOD, AND NON-TRANSITORY RECORDING MEDIUM
2y 5m to grant Granted Apr 25, 2023
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
10%
Grant Probability
20%
With Interview (+10.5%)
4y 3m
Median Time to Grant
Moderate
PTA Risk
Based on 122 resolved cases by this examiner. Grant probability derived from career allow rate.