DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Style
In this action unitalicized bold is used for claim language, while italicized bold is used for emphasis.
Applicant Reply
“The claims may be amended by canceling particular claims, by presenting new claims, or by rewriting particular claims as indicated in 37 CFR 1.121(c). The requirements of 37 CFR 1.111(b) must be complied with by pointing out the specific distinctions believed to render the claims patentable over the references in presenting arguments in support of new claims and amendments. . . . The prompt development of a clear issue requires that the replies of the applicant meet the objections to and rejections of the claims. Applicant should also specifically point out the support for any amendments made to the disclosure. See MPEP § 2163.06. . . . An amendment which does not comply with the provisions of 37 CFR 1.121(b), (c), (d), and (h) may be held not fully responsive. See MPEP § 714.” MPEP § 714.02. Generic statements or listing of numerous paragraphs do not “specifically point out the support for” claim amendments. “With respect to newly added or amended claims, applicant should show support in the original disclosure for the new or amended claims. See, e.g., Hyatt v. Dudas, 492 F.3d 1365, 1370, n.4, 83 USPQ2d 1373, 1376, n.4 (Fed. Cir. 2007) (citing MPEP § 2163.04 which provides that a ‘simple statement such as ‘applicant has not pointed out where the new (or amended) claim is supported, nor does there appear to be a written description of the claim limitation ‘___’ in the application as filed’ may be sufficient where the claim is a new or amended claim, the support for the limitation is not apparent, and applicant has not pointed out where the limitation is supported.’)” MPEP § 2163(II)(A).
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Generally: separately listed claim elements are construed as distinct components, all claim terms must be given weight, and there is presumed to be a difference in meaning and scope when different words or phrases are used in separate claims. Since different term or phrases are presumed to differ in scope and each term or phrase in the claims must find clear support in the description, a description of a single element in the Specification may fail to support multiple claim terms. “[C]laims must ‘conform to the invention as set forth in the remainder of the specification and the terms and phrases used in the claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description.’ 37 C.F.R. § 1.75(d)(1).” Phillips v. AWH Corp., 415 F.3d 1303, 1316 (Fed. Cir. 2005) (as cited in MPEP § 2111). Further, a lack of lack of detail in the Specification describing how a claimed result is achieved can support a finding that the Applicant was not in possession of the claimed invention at the time of filing, notwithstanding verbatim support. “It is not enough that one skilled in the art could write a program to achieve the claimed function because the specification must explain how the inventor intends to achieve the claimed function to satisfy the written description requirement. See, e.g., Vasudevan Software, Inc. v. MicroStrategy, Inc., 782 F.3d 671, 681-683, 114 USPQ2d 1349, 1356, 1357 (Fed. Cir. 2015) (reversing and remanding the district court’s grant of summary judgment of invalidity for lack of adequate written description where there were genuine issues of material fact regarding "whether the specification show[ed] possession by the inventor of how accessing disparate databases is achieved"). If the specification does not provide a disclosure of the computer and algorithm in sufficient detail to demonstrate to one of ordinary skill in the art that the inventor possessed the invention a rejection under 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 112, first paragraph, for lack of written description must be made.” MPEP § 2161.01(I). “An original claim may lack written description support when (1) the claim defines the invention in functional language specifying a desired result but the disclosure fails to sufficiently identify how the function is performed or the result is achieved[.] See Ariad Pharms., Inc. v. Eli Lilly & Co., 598 F.3d 1336, 1349-50 (Fed. Cir. 2010) (en banc). The written description requirement is not necessarily met when the claim language appears in ipsis verbis in the specification. ‘Even if a claim is supported by the specification, the language of the specification, to the extent possible, must describe the claimed invention so that one skilled in the art can recognize what is claimed. The appearance of mere indistinct words in a specification or a claim, even an original claim, does not necessarily satisfy that requirement.’” MPEP § 2163.03.
All independent claims recite “determining a frequency of occurrence of a first categorical data value represented by the first data point within a plurality of data points of the first factor[.]” This language is used as an example, but applies to all usage of the terms discussed below in the claims. There is more than one issue with this claim language, each being sufficient for a determination that Applicant was not in possession of the claimed invention as of the effective filing date. First, the claims recite two elements which are not mentioned in the original Specification, specifically, the “categorical data value” and “frequency of occurrence.” Without any further explanation or any clear implicit support, Examiner finds that the Specification fails to adequately describe the claimed combination. Second, the language in the Specification which is most similar to the language in the claims, and which is cited in the Applicant Remarks as ostensibly providing support for this claim amendment, describes a different concept from that which is claimed. See Rem. 10, citing paragraphs 79-80 of the Specification. The Specification describes a “categorical autobinner . . . may group data points based on the frequency of their content: the most frequent categories . . . are treated as individual groups while the remaining less frequent categories are put into one single group.” Spec. ¶79. As best understood, this describes reducing the number of categories of data based on the amount of data in each category. But the claims recite determining the “frequency of occurrence” of a specific “first categorical value represented by the first data point” within a given factor. That is, the specification, as best understood, describes using the amount of data related to a given “factor” (or parameter) to determine whether or not to merge that factor with other factors, while the claim recites determining of the “frequency of occurrence” or number repeated “first categorical data values represented by the [specific] data point” withing “the first factor” or parameter. Since the claims, as best understood, read on determining a specific repeated data points while the closest language in the Specification describes using the amount of data within a category to determine when a category should be merged with other categories, the claim language is unsupported.
All independent claims separately recite “a first categorical data value” and “a first data point[.]” Examiner is unable to see any clear difference in the meaning and scope of these terms. But the use of different terms in a claim implies separate claim elements. The use of two different terms in the claim implies distinct claim elements. But examiner is unable to find any support in the Specification for interpreting these applicant-invented terms to have a different meaning and scope from one another. Since the Specification fails to support both a “first categorical data value” and “a first data point” having a separate meaning and scope from one another, as would be consistent with using separate claim terms, the use of both terms is not supported. Note that neither term appears in the Specification. The closest language to either term is “data point”, which does not support the two separately claimed elements.
All dependent claims are rejected as including the material of the claims from which they depend.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Generally: separately listed claim elements are construed as distinct components, that all claim terms must be given weight, there is presumed to be a difference in meaning and scope when different words or phrases are used in separate claims, and repeated and consistent descriptions in the specification indicate the proper scope of a claimed term. “[C]laims must ‘conform to the invention as set forth in the remainder of the specification and the terms and phrases used in the claims must find clear support or antecedent basis in the description so that the meaning of the terms in the claims may be ascertainable by reference to the description.’ 37 C.F.R. § 1.75(d)(1).” Phillips v. AWH Corp., 415 F.3d 1303, 1316 (Fed. Cir. 2005) (as cited in MPEP § 2111). Therefore, use of two different terms in the claims that both rely on the description of a single structure in the Specification may render at least one term indefinite because there is no way to determine which term should be construed in view of the description of the single structure.
All independent claims recite “determining a frequency of occurrence of a first categorical data value represented by the first data point within a plurality of data points of the first factor[.]” This language is used as an example, but applies to all usage of these terms in the claims. There is more than one issue with this claim language, each being sufficient for a determination that the claims are indefinite. First, the claims recite two elements, neither of which are mentioned in the original Specification, specifically, the “categorical data value” and “frequency of occurrence.” Without any further explanation or any clear implicit support, Examiner finds that there is no objective measure of the scope of these applicant-invented terms. Second, the language in the Specification which is most similar to the language in the claims and which is cited in the applicant remarks as ostensibly providing support for this claim amendment, describes a different concept from that which is claimed. See Rem. 10, citing paragraphs 79-80 of the Specification. The Specification describes a “categorical autobinner . . . may group data points based on the frequency of their content: the most frequent categories . . . are treated as individual groups while the remaining less frequent categories are put into one single group.” Spec. ¶79. As best understood, this describes reducing the number of categories of data based on the amount of data in each category. But the claims recite determining the “frequency of occurrence” of a specific “first categorical value represented by the first data point” within a given factor. That is, the specification, as best understood, describes using the amount of data related to a given “factor” or parameter to determine whether or not to merge that factor with other factors, while the claim recites determining of the “frequency of occurrence” or number repeated “first categorical data values represented by the [specific] data point” withing “the first factor” or parameter. Since the claims, as best understood, read on determining repeated data points while the closest language in the Specification describes using the amount of data within a category to determine when a category should be merged with other categories, there are at least two reasonable ways of interpreting the claim language, rendering the claims indefinite.
All independent claims separately recite “a first categorical data value” and “a first data point[.]” There is no way for one of ordinary skill in the art to reasonably ascertain any difference in the meaning and scope for these two different claim terms. But the use of different terms in a claim implies distinct claim elements. Further, Examiner is unable to find any support in the Specification for interpreting these applicant-invented terms to have a different meaning and scope from one another. Since it is not clear whether these terms both refer to the same claim element, or if they are used in reference to different claim elements, the claim is indefinite. If the terms refer to different claim elements, it is unclear how they should differ from one another. Note that neither term appears in the Specification. While the Specification uses the term “data point”, this is unhelpful in ascertaining separate meanings for the two separately recited terms.
All dependent claims are rejected as including the material of the claims from which they depend.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3, 5, 8-20 are rejected under 35 U.S.C. 103 as being unpatentable over Fidanza (US 2020/0349641, different assignee), Brownlee (How to Choose a Feature Selection Method For Machine Learning, 2020), and Nazir (US 11,798,090; filed 2019, different assignee.)
1. (Currently Amended) A method, comprising: by one or more processors, obtaining a first dataset comprising a plurality of factors, wherein each factor of the plurality of factors comprises a plurality of data points, and wherein each data point of each factor is characterized as an ordered datatype or a categorical datatype; (The Specification describes the “datatypes” consistent with intrinsic characteristics of the factors and data. See Spec. ¶¶59-60 describing countable data as an “ordered datatype” and uncountable data as a “categorical datatype.” (“An ordered datatype . . . express[es] information in the form of numerical or ordered values . . . called quantitative factors. Examples . . . include age, debt-to-income, number of dependents. . . A categorical datatype . . . group[s] information with similar characteristics . . . called qualitative factors. . . . Examples . . . include race, gender, income bracket, ZIP code and education level.”)
Fidanza teaches: “[T]he feature engineering process may include two main stages, e.g., feature selection and feature extraction. In the first stage of feature selection, the loan issuance system 30 may analyze the given data and select the most relevant features for classification.” Fidanza ¶ 50. “Features may be selected on the basis of their scores and various statistical tests for the correlation with the outcome variable.” Fidanza ¶ 67. “Linear classifiers may permit machine learning and statistical classification by using an object's characteristics to identify a class or group to which it belongs. The linear classifier may make the classification decision based on the value of a linear combination of characteristics. Data may include the feature values as described above, and may be presented to the loan issuance server 50 and its server processor 52 in a vector as a feature vector.” Fidanza ¶74.
Fidanza does not expressly teach determining factors based on the datatype.
Brownlee teaches: “Common data types include numerical (such as height) and categorical (such as a label) . . . .The more that is known about the data type of a variable, the easier it is to choose an appropriate statistical measure for a filter-based feature selection method. In this section, we will consider two broad categories of variable types: numerical and categorical; also, the two main groups of variables to consider: input and output. Input variables are those that are provided as input to a model. In feature selection, it is this group of variables that we wish to reduce in size. Output variables are those for which a model is intended to predict, often called the response variable. The type of response variable typically indicates the type of predictive modeling problem being performed. For example, a numerical output variable indicates a regression predictive modeling problem, and a categorical output variable indicates a classification predictive modeling problem.” Brownlee Page 6-7. Brownlee Page 7 shows a figure titled “How to Choose a Feature Selection Method.” This graph shows determining whether the input variable (feature) is numerical or categorical. Also, based on the flow of the chart in this figure, knowledge of the “datatype” of the input variable is used to determine the feature selection method based on this figure. The section below the figure starting on Page 8 shows which input datatypes are best used with specific modeling techniques.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Brownlee because determining the datatype of the factors helps determine the proper algorithm for feature selection.) determining that a first data point of a first factor of the plurality of factors is characterized as the categorical datatype; determining the frequency of occurrence of a first categorical data value represented by the first data point within a plurality of data points of the first factor; (See Brownlee Pages 6-7 cited above teaching determining data is numerical (ordered)or categorical. Brownlee does not expressly teach determining the frequency of occurrence of a categorical data value. Note that this language, per the Applicant Remarks, is based on the support in paragraphs 79-80 of the Specification, which describes grouping of different data points based on their inclusion within a given category.
Nazir teaches “Data clean engine 310 is configured to clean the data, e.g., by standardizing data types and values, removing duplicated variables, removing variables with a unique value, and removing obviously non-predictive variables (e.g., user id, etc.).” Nazir col. 7 ll. 43-47. “Variable remove engine 420 is configured to remove variables from the data set. For example, variables with incomplete or sparse data” Nazir col. 8 ll. 10-12. “Data aggregation engine 340 is configured to aggregate the data to the desired granularity. The appropriate granularity will depend on the type and structure of the input variables and the target.” Nazir ¶53. “How each variable is aggregated depends on the type of variable. Numerical variables are aggregated by calculating new statistical variables, e.g., median, mean, min, max, standard deviation, etc. Enumerated variables, i.e. variables that identify a status, indicator, code, etc., are aggregated by counting the number of each type. For example, policyholder marital status may be aggregated by counting the number of policyholders that are married, divorced, separated, never married, etc. In instances where the number of types is large, e.g., ages, the aggregate variables may divide the types into bins.” Nazir ¶126. Note that marital status a categorical value. For a more detailed explanation, see Nazir Fig. 12 and accompanying description (col. 18 l. 20 – col. 19 l. 45.)
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nazir because aggregation saves space and avoids redundant operations for the aggregated data.) determining a plurality of collections of categorical data values based at least in part on the categorical datatype, wherein each collection of categorical data values of the plurality of collections of categorical data values is associated with a respective representative data value and a respective frequency of occurrence of categorical data values associated with data points; determining, from among the plurality of collections of categorical data values, based at least in part on the frequency of occurrence of the first categorical data value represented bythe first data point, a collection of categorical data values corresponding to the first categorical data value of the first data point; determining a representative data value associated with the collection of categorical data values corresponding to the first categorical data value; replacing the first categorical data value in the first data point with the representative data value; (The categorization into ordered and categorical datatypes and association between categorical data values and a respective frequency of occurrence of categorical data values are addressed in the rejection above. This claim language reads on binning data into ranges, read in view of the Specification. See Spec. ¶¶74, 90. The previously cited art does not expressly teach determining a range of data values associated with a data value of data point and replacing the data value with a representative value of the range.
Nazir teaches: “The data processing system 130 is adapted to segment, process, clean, and/or aggregate data retrieved from the database system 120 to generate a data set for use in training the various prediction models described herein.” Nazir col. 3 ll. 64-67. “Data aggregation engine 340 is configured to aggregate the data to the desired granularity. The appropriate granularity will depend on the type and structure of the input variables and the target.” Nazir col. 7 ll. 59-62. “The data from the build engines is then sent to data aggregation and transformation engine 1056, which combines and converts the data into a form that can be used to train one or more models[.]” Nazir col. 16 ll. 49-52. “In step 1116, the data for each segment is aggregated where necessary. . . . How each variable is aggregated depends on the type of variable. Numerical variables are aggregated by calculating new statistical variables, e.g., median, mean, min, max, standard deviation, etc. Enumerated variables, i.e. variables that identify a status, indicator, code, etc., are aggregated by counting the number of each type. For example, policyholder marital status may be aggregated by counting the number of policyholders that are married, divorced, separated, never married, etc. In instances where the number of types is large, e.g., ages, the aggregate variables may divide the types into bins. For example, prospect ages may be aggregated into variables counting the number of prospects aged 16-17, the number of prospects aged 18-21, the number of prospects aged 22-25, etc. Boolean input variables are aggregated into counts of true and false.” Nazir Col. 17 ll. 42-60. One of ordinary skill in the art would understand aggregating by counting the number of policyholders that are married/divorced/single etc. as teaching that information categorical data (e.g. marital status) will be replaced with a count of policy holders having that status. Based on Nazir col. 2 ll. 64-67, one of ordinary skill in the art would understand Nazir as teaching data aggregated into a count being used to train a model.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nazir because aggregating by the number of policy holders associated with a given category (i.e. marriage status) reduces the amount of data used to represent a concept for training, thereby saving space and processing power required to process associated machine learning algorithms.) determining, based at least in part on the first factor and the representative data value, an indicator of correlation between the first factor and a target machine-learned model prediction; assigning a score to the first factor based at least in part on the indicator of correlation; (“Feature selection may also be known as variable selection and used to simplify the machine learning model and enhance processing of the computer to be more efficient and facilitate interpretation of data by clients and the loan issuance system 30. This may allow shorter training times to avoid the problems associated with dimensionality and enhanced generalization, for example, by reducing overfitting. With feature selection, the data that contains some features that are either redundant or irrelevant may be removed without incurring much loss of information. . . . It is possible to test each possible subset of features, finding the one that minimizes the error rate.” Fidanza ¶ 65. “The filter methods may use a proxy measure instead of an error rate to score a feature subset. Wrapper methods may use a predictive model to score feature subsets, where a new subset may be used to train a machine learning model that is tested on a hold-outset. The number of mistakes made on that hold-outset may be counted as the error rate of the model to give a score for that subset. The wrapper algorithm may train a new model for each subset.” Fidanza ¶ 66. “Features may be selected on the basis of their scores and various statistical tests for the correlation with the outcome variable.” Fidanza ¶ 67. “Some examples of wrapper methods . . . aims to find the best performing feature subset. That algorithm may repeatedly create models and keep aside the best or worst performing feature of each iteration. It should be understood there is some difference between filter and wrapper techniques. For example, . . . while wrapper techniques measure the usefulness of a subset of features by actually training a model on it.” Fidanza ¶ 68. “The loan issuance system 30 may have an object to identify the statistically reliable relationships between input data features and a target variable using the machine-learning modeling.” Fidanza ¶ 96. Note also that training the model to find the best performing features implies acquiring some indicator of correlation between the factors and their predictions.) creating a ranked listing of the plurality of factors based at least in part on the score assigned to the first factor; selecting a subset of the plurality of factors included in the ranked listing; generating a second dataset based at least in part on the subset of the plurality of factors; and training a machine learning model using the second dataset. (“Some examples of wrapper methods may also include forward selection as an interpretive method or a backward elimination and a recursive feature elimination that aims to find the best performing feature subset. That algorithm may repeatedly create models and keep aside the best or worst performing feature of each iteration.” Fidanza ¶ 68. “It should be understood that the recursive feature elimination (RFE) may repeatedly construct a model, for example, a regression model or SVM and choose either the best or worst performing feature such as based on coefficients and setting the feature aside and repeating the process with the rest of the features. This can be applied until all features in the data set are exhausted and features may be ranked according to when they were eliminated. With a linear correlation, each feature may be evaluated independently.” Fidanza ¶ 70. Note that the chosen feature vector is a data subset. “Based on the features obtained in the financial feature extraction such as available income, indebtedness capacity, and credit products balance risk, a classification model may be trained. Different classification algorithms may be used and the algorithm that may be selected is one that may be considered the best to fit the business requirements that are taken into consideration by the loan issuance system 30.” Fidanza ¶ 73. “Linear classifiers may permit machine learning and statistical classification by using an object's characteristics to identify a class or group to which it belongs. The linear classifier may make the classification decision based on the value of a linear combination of characteristics. Data may include the feature values as described above, and may be presented to the loan issuance server 50 and its server processor 52 in a vector as a feature vector. This will allow document classification also.” Fidanza ¶ 74.)
2. The method of claim 1, further including: assigning a pattern value to each data point of a plurality of data points of the same datatype in the plurality of factors; and grouping plurality of the data points based at least in part on the pattern value assigned to each data point. (“Linear classifiers may permit machine learning and statistical classification by using an object's characteristics to identify a class or group to which it belongs. The linear classifier may make the classification decision based on the value of a linear combination of characteristics.” Fidanza ¶ 74.)
3. The method of claim 1, wherein: the collection of categorical data values corresponding to the first categorical data value of the first data point consists of the first categorical data value, and determining the representative data value associated with the collection of categorical data values corresponding to the first categorical data value comprising (See rejection of claim 1.) determining the first categorical data value as the representative data value based at least in part on the collection of categorical data values consisting of the first categorical data value. (See rejection of claim 1. “How each variable is aggregated depends on the type of variable. Numerical variables are aggregated by calculating new statistical variables, e.g., median, mean, min, max, standard deviation, etc. Enumerated variables, i.e. variables that identify a status, indicator, code, etc., are aggregated by counting the number of each type. For example, policyholder marital status may be aggregated by counting the number of policyholders that are married, divorced, separated, never married, etc.” Nazir col. 17 ll. 48-60.)
5. The method of claim 1, wherein: the collection of categorical data values corresponding to the first categorical data value of the first data point comprises a plurality of categorical data values including the first categorical data value, and the representative data value is distinct from the first categorical data value (“How each variable is aggregated depends on the type of variable. Numerical variables are aggregated by calculating new statistical variables, e.g., median, mean, min, max, standard deviation, etc. Enumerated variables, i.e. variables that identify a status, indicator, code, etc., are aggregated by counting the number of each type. For example, policyholder marital status may be aggregated by counting the number of policyholders that are married, divorced, separated, never married, etc. In instances where the number of types is large, e.g., ages, the aggregate variables may divide the types into bins. For example, prospect ages may be aggregated into variables counting the number of prospects aged 16-17, the number of prospects aged 18-21, the number of prospects aged 22-25, etc. Boolean input variables are aggregated into counts of true and false.” Nazir col. 17, ll. 47-60. Note that Boolean values and counts are “distinct” from the values in the original data (e.g. a count of married persons is “distinct” from data of the individuals being counted.).
For the rejection of claim 8, see rejection of claim 1.
9. The system of claim 8, wherein the operations further comprise: assigning a pattern value to each data point of a plurality of data points of a same datatype in the plurality of factors based at least in part on a set of predefined rules; and grouping data points of the plurality of data points of the same datatype, based on the assigned pattern values, into bins of data points having a common pattern value. (“Linear classifiers may permit machine learning and statistical classification by using an object's characteristics to identify a class or group to which it belongs. The linear classifier may make the classification decision based on the value of a linear combination of characteristics.” Fidanza ¶ 74.
Fidanza does not expressly teach bins of data having a common pattern.
Nazir teaches: “How each variable is aggregated depends on the type of variable. Numerical variables are aggregated by calculating new statistical variables, e.g., median, mean, min, max, standard deviation, etc. Enumerated variables, i.e. variables that identify a status, indicator, code, etc., are aggregated by counting the number of each type. For example, policyholder marital status may be aggregated by counting the number of policyholders that are married, divorced, separated, never married, etc. In instances where the number of types is large, e.g., ages, the aggregate variables may divide the types into bins. For example, prospect ages may be aggregated into variables counting the number of prospects aged 16-17, the number of prospects aged 18-21, the number of prospects aged 22-25, etc. Boolean input variables are aggregated into counts of true and false.” Nazir Col. 17 ll. 55-60.
With respect to this limitation, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nazir because binning lowers the number of variables, thereby saving space and processing power required to process associated machine learning algorithms.)
10. The system of claim 8, wherein the indicator of correlation is further determined by utilizing a chi-squared test. (“Features may be selected on the basis of their scores and various statistical tests for the correlation with the outcome variable. . . . [A] Chi-squared algorithm may be a statistical test applied to groups of categorical features to evaluate the likelihood of correlation or association between them using their frequency distribution.” Fidanza ¶ 67. See also Brownlee Pages 6-8 teaching use of a Chi-squared algorithm under specific conditions.)
11. The system of claim 8, wherein the collection of categorical data values corresponding to the first categorical data value of the first data point consists of the first categorical data value, and the representative data value is distinct from the first categorical data value . (See rejection of claim 5.)
12. The system of claim 8, wherein the ordered datatype identifies a pattern of debt and income ratios. (With respect to claim interpretation, note that an “ordered” or “categorical” type of data is described in the Specification consistent with an intrinsic property of the data. See e.g. ¶¶ 59-60. Fidanza teaches: “The income and transactional data of the bank account associated with the client may comprise available income as a sum of client incomes weighted by a confidence of income occurrence, and an indebtedness capacity about the capacity of the client to repay the loan.” Fidanza ¶ 7.
Fidanza does not expressly teach that the ratio of debt to income is used.
Nazir teaches: “The methods and systems discussed herein may be used with any type of data. For example, the methods and systems discussed herein may be used with data associated with oil production, or data associated with any other technology or field (e.g., engineering fields, construction fields, legal field, medical field, educational field, insurance field etc.).” Nazir Col. 9 ll. 20-25. “The variables with the most influence on the target are selected and added to the feature set used for training a prediction model.” Nazir Abstract. “Third party data collections and/or databases contain tremendous amounts of data for each individual and/or address in the country. This data can be leveraged, using specially trained or constructed models, to predict which of the over 200 million prospects is most likely to convert. The data include data regarding each prospect’s social media habits, recent life events, income, debts, assets, buying patterns, individual and household characteristics, demographics, and interests.” Nazir ¶ 100. One of ordinary skill in the art would understand the sequential listing “income, debts” as referring to DTI.
With respect to this limitation, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nazir because this type of data may predict behavior.)
13. The system of claim 8, wherein the categorical datatype identifies a pattern of zone improvement plan codes. (With respect to claim interpretation, note that an “ordered” or “categorical” type of data is described in the Specification consistent with an intrinsic property of the data. See e.g. ¶¶ 59-60. Fidanza does not expressly teach using a ZIP code as a datatype.
Nazir teaches: “The methods and systems discussed herein may be used with any type of data. For example, the methods and systems discussed herein may be used with data associated with oil production, or data associated with any other technology or field (e.g., engineering fields, construction fields, legal field, medical field, educational field, insurance field etc.).” Nazir Col. 9 ll. 20-25. Nazir teaches: Each city represents an entirely different risk profile. A more representative methodology aggregates data at a much finer grain – at minimum, at the Metropolitan Statistical Area (MSA) level, and ideally at the Zip Code Tabulation Area (ZCTA) level.” Nazir Col. 11 ll. 62-66.
With respect to this limitation, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nazir because this type of data may correlate with income, behavior, and risk.)
14. The system of claim 8, further comprising: obtaining a scoring adjustment associated with an adjusted factor; determining that the first factor corresponds to the adjusted factor; and based on determining that the first factor corresponds to the adjusted factor, adjusting the score of the first factor based on the scoring adjustment. (As best understood, this language claims the subject matter of paragraphs 43 to 46 of the Specification. The Specification explains that “[t]he scorer 126 assigns a ‘score’ to each factor based on the correlation indicator from the trained ML model 130. The correlation indicator indicates how strongly the factor correlates and the target prediction of the trained ML model 130. In some instances, the scorer 126 may adjust the score for reasons not incorporated or considered by the trained ML model 130. This is a heuristics adjustment to the score calculated based on the trained ML model 130. [0045] To this end, the scorer 126 may obtain scoring-adjustments associated with adjusted factors. That is, the scorer 126 obtains a table that lists adjustments to scores for specific factors, which are the adjusted factors. [0046] The scorer 126 determines that a factor ascertained in the dataset matches one of the adjusted factors and, accordingly, adjusts the score of the factor of the matched adjusted factors with the scoring-adjustments associated with the matched adjusted factors.” Spec. ¶¶ 43-46. This describes the claimed “adjusted factor[s]” as “adjustments to scores for specific factors.” Fidanza teaches creation of, and running multiple models that indicate the best performing features. See Fidanza ¶ 68. See Fidanza ¶ 68. (“Some examples of wrapper methods may also include forward selection as an interpretive method or a backward elimination and a recursive feature elimination that aims to find the best performing feature subset. That algorithm may repeatedly create models and keep aside the best or worst performing feature of each iteration. . . . [W]rapper techniques measure the usefulness of a subset of features by actually training a model on it. . . . [W]rapper techniques usually provide the best subset of features.”))
For rejection of claim 15, see rejection of claim 8.
16. The one or more computer-readable media of claim 15, wherein the one or more factors are selected further based at least in part on the respective scores of the one or more factors being greater than a particular numerical score. (See rejection of claim 1. “Features may be selected on the basis of their scores and various statistical tests for the correlation with the outcome variable.” Fidanza ¶ 67. Note that selection on the basis of ranked scores implies a selection based on a greater score.)
17. The one or more computer-readable media of claim 15, wherein the ordered datatype identifies a pattern of human ages. (With respect to claim interpretation, note that an “ordered” or “categorical” type of data is described in the Specification consistent with an intrinsic property of the data. See e.g. ¶¶ 59-60. Fidanza does not expressly teach using a human age as a datatype.
Nazir teaches: “The methods and systems discussed herein may be used with any type of data. For example, the methods and systems discussed herein may be used with data associated with oil production, or data associated with any other technology or field (e.g., engineering fields, construction fields, legal field, medical field, educational field, insurance field etc.).” Nazir Col. 9 ll. 20-25. “In instances where the number of types is large, e.g., ages, the aggregate variables may divide the types into bins. For example, prospect ages may be aggregated into variables counting the number of prospects aged 16-17, the number of prospects aged 18-21, the number of prospects aged 22-25, etc. Boolean input variables are aggregated into counts of true and false.” Col. 17 ll. 55-60.
With respect to this limitation, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nazir because this type of data may correlate with behavior and risk.)
18. The one or more computer-readable media of claim 15, wherein the categorical datatype identifies a pattern of educational levels of individuals. (With respect to claim interpretation, note that an “ordered” or “categorical” type of data is described in the Specification consistent with an intrinsic property of the data. See e.g. ¶¶ 59-60. Fidanza does not expressly teach using an education level as a datatype. Nazir teaches: “The methods and systems discussed herein may be used with any type of data. For example, the methods and systems discussed herein may be used with data associated with oil production, or data associated with any other technology or field (e.g., engineering fields, construction fields, legal field, medical field, educational field, insurance field etc.).” Nazir Col. 9 ll. 20-25. “In more detail, the available information for each prospect may include personal information (e.g., gender, education level, occupation, consumer prominence, etc.)” Nazir Col. 14 ll. 43-45.
With respect to this limitation, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nazir because personal data may correlate with behavior and risk.)
19. The one or more computer-readable media of claim 15, the operations further comprising: obtaining a scoring adjustment associated with an adjusted factor; determining that the first factor corresponds to the adjusted factor; and based on determining that the first factor corresponds to the adjusted factor, adjusting the score of the first factor based on the scoring adjustment. (As best understood, this language claims the subject matter of paragraphs 43 to 46 of the Specification. The Specification explains that “[t]he scorer 126 assigns a ‘score’ to each factor based on the correlation indicator from the trained ML model 130. The correlation indicator indicates how strongly the factor correlates and the target prediction of the trained ML model 130. In some instances, the scorer 126 may adjust the score for reasons not incorporated or considered by the trained ML model 130. This is a heuristics adjustment to the score calculated based on the trained ML model 130. [0045] To this end, the scorer 126 may obtain scoring-adjustments associated with adjusted factors. That is, the scorer 126 obtains a table that lists adjustments to scores for specific factors, which are the adjusted factors. [0046] The scorer 126 determines that a factor ascertained in the dataset matches one of the adjusted factors and, accordingly, adjusts the score of the factor of the matched adjusted factors with the scoring-adjustments associated with the matched adjusted factors.” Spec. ¶¶ 43-46. This describes the claimed “adjusted factor[s]” as “adjustments to scores for specific factors.” Fidanza teaches creation of, and running multiple models (including a “second machine learning model”) that rank (score) the best performing features. See Fidanza ¶ 68. (“Some examples of wrapper methods may also include forward selection as an interpretive method or a backward elimination and a recursive feature elimination that aims to find the best performing feature subset. That algorithm may repeatedly create models and keep aside the best or worst performing feature of each iteration. . . . [W]rapper techniques measure the usefulness of a subset of features by actually training a model on it. . . . [W]rapper techniques usually provide the best subset of features.”))
20. The one or more computer-readable media of claim 15, wherein the categorical datatype identifies a pattern of income brackets. (With respect to claim interpretation, note that an “ordered” or “categorical” type of data is described in the Specification consistent with an intrinsic property of the data. See e.g. ¶¶ 59-60.
Fidanza does not expressly teach using income brackets as a datatype.
Nazir teaches: “The methods and systems discussed herein may be used with any type of data. For example, the methods and systems discussed herein may be used with data associated with oil production, or data associated with any other technology or field (e.g., engineering fields, construction fields, legal field, medical field, educational field, insurance field etc.).” Nazir Col. 9 ll. 20-25. “in the 40-50 age group, have an income>$60000, with 3 or more vehicles, have a conversion ratio of 55%, compared to the base conversion ratio of 24% of all prospect within the state of Texas.” Col. 20 ll. 53-56.
With respect to this limitation, it would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Nazir because personal data may be correlated with behavior and risk.)
Claims 4 and 6 are rejected under 35 U.S.C. 103 as being unpatentable over Fidanza (US 2020/0349641, different assignee), Brownlee (How to Choose a Feature Selection Method For Machine Learning, 2020), and Nazir (US 11,798,090; filed 2019, different assignee.), and Zoldi (US 2009/0222243).
4. The method of claim 3, wherein determining the frequency of occurrence of the first categorical data value represented by the first data point within the plurality of data points of the first factor comprises determining a high frequency of occurrence of the first categorical data value within the plurality of data points, and determining that the collection of categorical data values corresponding to the first categorical data value consists of the first categorical data value based at least on part on the high frequency of occurrence of the first categorical data value within the plurality of data points. (“How each variable is aggregated depends on the type of variable. Numerical variables are aggregated by calculating new statistical variables, e.g., median, mean, min, max, standard deviation, etc. Enumerated variables, i.e. variables that identify a status, indicator, code, etc., are aggregated by counting the number of each type. For example, policyholder marital status may be aggregated by counting the number of policyholders that are married, divorced, separated, never married, etc.” Nazir col. 17 ll. 48-60. “In step 1250, the top n variables, where n can range from about 100 to about 200 depending on the type of model, are selected.” Nazir col. 19 ll. 7-9. “As explained in more detail below, n or the cumulative importance threshold may be considered a hyperparameter of the model that can be tuned. . . . Optionally, in step 1260, an additional number (m) of variables may be selected from the remaining input variables, i.e., the input variables outside of the most important variables selected in step 1250. The number of variables selected may be a tunable hyperparameter.” Nazir col. 19 ll. 12-22. “In step 1270, one or more statistical methods for variable reduction may be applied to the non-selected variables, i.e., the variables other than the top n variables selected in step 1250. For example, techniques such as random forest, dimensionality reduction, principle component analysis, etc., may be applied to reduce the number of variables. In one embodiment, an autoencoder may be used to extract the features. In one embodiment, between about 10 and about 20 features may be extracted from the non-selected variables for use in the model. The number of features may be a tunable hyperparameter. . . . The selected top n variables, the optionally selected m variables, and the new variables created by dimensionality reduction on the non-selected variables, are then combined into a feature set that is used to train the models.” Nazir col. 19 ll. 28-44.
The previously cited art does not teach “determining that the collection of categorical data values corresponding to the first categorical data value consists of the first categorical data value based at least on part on the high frequency of occurrence of the first categorical data value.”
Zoldi teaches “During the startup routine 400, the system does not compute a score using the adaptive model, but collects the feedback records at 402, which are queued in first-in, first-out (FIFO) tables, one for the non-fraud records (the non-fraud table), and one for the fraud records (the fraud table) at 404. Record collection may be limited to only those records satisfying a rule, such as exceeding a base model score threshold. At 406, the system determines whether bin edges, or "boundaries," are determined. If yes, at 412 binning is applied to the records. If no, at 408 the system determines whether there are sufficient number of records for binning. If a sufficient number of records has not been collected, the system will continue to queue the feedback records received at 402. After a sufficient number of records have been collected, the collection can be used feature-by-feature to determine the quantiles, for example, and to compute bin boundaries (equal-population binning) at 410, which are applied at 412.” Zoldi ¶44.
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Zoldi because postpones the overhead of setting up bins (including determining the bin boundaries) until sufficient data has been collected.)
6. The method of claim 5, wherein: determining the frequency of occurrence of the first categorical data value represented by the first data point within the plurality of data points of the first factor comprises determining a low frequency of occurrence of the first categorical data value within the plurality of data points, and determining that the collection of categorical data values corresponding to the first categorical data value comprises the plurality of categorical data values based at least on part on the low frequency of occurrence of the first categorical data value within the plurality of data points. (Zoldi teaches “During the startup routine 400, the system does not compute a score using the adaptive model, but collects the feedback records at 402, which are queued in first-in, first-out (FIFO) tables, one for the non-fraud records (the non-fraud table), and one for the fraud records (the fraud table) at 404. Record collection may be limited to only those records satisfying a rule, such as exceeding a base model score threshold. At 406, the system determines whether bin edges, or "boundaries," are determined. If yes, at 412 binning is applied to the records. If no, at 408 the system determines whether there are sufficient number of records for binning. If a sufficient number of records has not been collected, the system will continue to queue the feedback records received at 402. After a sufficient number of records have been collected, the collection can be used feature-by-feature to determine the quantiles, for example, and to compute bin boundaries (equal-population binning) at 410, which are applied at 412.” Zoldi ¶44. )
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Fidanza, Brownlee, Nazir, and Wang (US 2018/0300333).
7. The method of claim 1, further comprising: obtaining a manually adjusted score associated with an adjusted factor; determining that the first factor corresponds to the adjusted factor; and adjusting, based at least in part on determining that the first factor corresponds to the adjusted factor, the score of the first factor. (As best understood, this language claims the subject matter of paragraphs 43 to 46 of the Specification. The Specification explains that “[t]he scorer 126 assigns a ‘score’ to each factor based on the correlation indicator from the trained ML model 130. The correlation indicator indicates how strongly the factor correlates and the target prediction of the trained ML model 130. In some instances, the scorer 126 may adjust the score for reasons not incorporated or considered by the trained ML model 130. This is a heuristics adjustment to the score calculated based on the trained ML model 130. [0045] To this end, the scorer 126 may obtain scoring-adjustments associated with adjusted factors. That is, the scorer 126 obtains a table that lists adjustments to scores for specific factors, which are the adjusted factors. [0046] The scorer 126 determines that a factor ascertained in the dataset matches one of the adjusted factors and, accordingly, adjusts the score of the factor of the matched adjusted factors with the scoring-adjustments associated with the matched adjusted factors.” Spec. ¶¶ 43-46. This describes the claimed “adjusted factor[s]” as “adjustments to scores for specific factors.” Fidanza teaches creation of multiple models that indicate the best performing features. See Fidanza ¶ 68. (“Some examples of wrapper methods may also include forward selection as an interpretive method or a backward elimination and a recursive feature elimination that aims to find the best performing feature subset. That algorithm may repeatedly create models and keep aside the best or worst performing feature of each iteration. . . . [W]rapper techniques measure the usefulness of a subset of features by actually training a model on it. . . . [W]rapper techniques usually provide the best subset of features.”))
The previously cited art does not expressly teach manual adjustment of the score.
Wang teaches: “In feature engineering, feature selection is not fully driven by analytic algorithms using quantitative criteria as is in machine learning research, but instead mixed with human judgement using various qualitative criteria that are hard to derive from data, such as whether the selected features have interpretable physical meanings to the problem in question. For the most part, human experts are the only means capable, and responsible to make the final decision about feature selection, whereas feature selection algorithms only serve as a decision support tool.”
It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the teaching of Wang because using human to adjust the factor score allows the application of human judgement to be used to interpret the underlying meaning of data.)
Response to Arguments
Applicant's arguments filed 09/30/2025 have been fully considered but they are not persuasive.
Rejections under § 103:
The Remarks state that Nazir “does not describe using a frequency of occurrence of a categorical data value to determine a representative data value for a collection of such data values.” The Specification does not mention or implicitly describe a “frequency of occurrence” or a “categorical data value.” This makes a response difficult, because it is not clear what is being claimed. It is submitted that either term from within the Specification be used in the claims. Alternatively, if the Specification fails to describe terms that are necessary for claiming the invention, it is suggested that Applicant include claim language limiting the invented terms to a particular meaning. Merely including applicant invented terms is unlikely to produce productive amendments. The closest explanation in the Specification explains “If the datatype is categorical, the natural grouping may be associated with some assumed value associated with the categorical data points. For example, the categorical autobinner 218 may group data points based on the frequency of their content: the most frequent categories (typically 20 to 30 categories) are treated as individual groups while the remaining less frequent categories are put into one single group.” Spec. ¶79. See also Rem. 10 citing Spec. ¶¶79-80 as support for the claim amendments. The claim language itself reads on the art cited above, but note that Nazir also teaches “In step 1120, feature engineering is performed on the input variables to reduce the number of input variables from thousands to several hundred, while maintaining the overall signal of the data. In one embodiment, one or more test models are created using the input data, and the input variables with the most effect on the target are selected, e.g., in one embodiment between about 100 and about 150 variables are selected. Depending on the data, the top 100 - 150 variables can account for between about 60% and about 95% of the effect on the target. After the top variables are selected, an autoencoder may be used to consolidate the remaining variables into a smaller number of variables, e.g., about 10 to about 20 variables. This is useful to keep the signal of the remaining variables. Feature engineering is described in more detail with respect to FIG. 12.” Nazir col. 18 ll. 1-15.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL M KNIGHT whose telephone number is (571) 272-8646. The examiner can normally be reached Monday - Friday 9-5 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached on (571. The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
PAUL M. KNIGHT
/PAUL M KNIGHT/Examiner, Art Unit 2148