Office Action Analysis: 18343955 — GENERATION OF SUPPLEMENTED DATA FOR USE IN A DATA PIPELINE

Office Action

§101 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to the application and claims filed 06/29/2023. Claims 1-20 are pending and have been examined. Claims 1-20 are rejected.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/29/2023, 08/02/2024, 09/09/2024, 09/12/2024, 10/02/2024, 10/11/2024, 12/06/2024, 01/13/2025, 02/04/2025, 03/22/2025, 04/08/2025, 04/21/2025, 05/21/2025, 06/20/2025, 08/12/2025, 09/18/2025, and 10/13/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 1, 9, and 15 and their dependents are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “sufficient” in claim 1, 9, and 15 is a relative term which renders the claim indefinite. The term “sufficient” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. The use of the term “sufficient” renders the required “degree of reliability” indefinite. The claim attempts to qualify this term of degree by stating that it is “ensured by limiting types of unpopulated fields.” However, this qualifying phrase describes a mechanism for achieving reliability. It does not provide an objective, measurable boundary for what actually constitutes a ”sufficient” threshold. The specification similarly fails to provide a standard for ascertaining the requisite degree. The reliability threshold (element 230) is described as being determined “in response to the needs of a downstream consumer” (See Paragraph 20 and 76), thereby rendering the scope of “sufficient” dependent on the variable and subjective requirements of a particular entity rather than a fixed or ascertainable standard.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract
idea without significantly more.

Claim 1
Step 1: The claim recites a method; therefore, it is directed to the statutory category of process.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components, then it falls within the "Mathematical Concepts" grouping of abstract ideas. The claim recites the following abstract ideas:
“making a determination regarding whether a first inference model can predict at least the unpopulated field with a sufficient degree of reliability, the sufficient degree of reliability being ensured by limiting types of unpopulated fields for which inferences are generated…” (a person mentally or with a pen and paper makes a determination whether an inference model’s prediction reliability meets a required threshold.)
“in an instance of the determination in which the first inference model can predict at least the unpopulated field: generating an inference…” (a person mentally or with a pen and paper determines the first inference model is able to predict the unpopulated field and therefore makes an inference.)
“populating the unpopulated field using the inference to obtain supplemented data; and” (a person mentally or with a pen and paper fills in the unpopulated field with the inference they made in the earlier step.)
Step 2A prong 2: The claim does not recite additional elements that integrate the judicial exception into a practical application.
“obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to user selected limitations on information collection;” (Data Gather - Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).)
“…by the first inference model;” and “…using the first inference model;” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)
“providing the supplemented data to a downstream consumer.” (Adding insignificant extra-solution activity to the judicial exception (MPEP 2106.05(g)))
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception. 
“obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to user selected limitations on information collection;” (MPEP 2106.05(d)(II) indicates that merely gathering data is a well- understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well- understood, routine, conventional activity is supported under Berkheimer.)
“…by the first inference model;” and “…using the first inference model;” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)
“providing the supplemented data to a downstream consumer.” (MPEP 2106.05(d)(II) indicates that receiving or transmitting data over a network, e.g., using the Internet to gather data, is a well- understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well- understood, routine, conventional activity is supported under Berkheimer.)
Claim 2
Step 1: A method, as above.
Step 2A prong 1: See the rejection of claim 1 above, which claim 2 depends on. The claim further recites
the following abstract ideas:
“prior to obtaining the data: (…) qualify which fields of the data are predictable using other fields of the data; and (…) the fields of the data that are predictable…” (a person mentally or with a pen and paper before obtaining the data, qualifies fields that they deem are predictable using other fields.)
Step 2A prong 2 & Step 2B: The claim does not recite additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception. 
“using a second inference model to…” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)
performing a training process (…) to obtain the first inference model. (The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).)
Claim 3
Step 1: A method, as above.
Step 2A prong 1: See the rejection of claim 2 above, which claim 3 depends on. The claim further recites
the following abstract ideas:
“(…) the second populated field comprising information due to second user selected limitations on the information collection, and content of the second populated field being barred by the user selected limitations.” (a person mentally or with a pen and paper organizes, filters, and selectively records information based on pre-set, user-defined, and barred criteria.)
Step 2A prong 2: The claim does not recite additional elements that integrate the judicial exception into a practical application.
“obtaining second data comprising the populated field and a second populated field,” (Data Gather - Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).)
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception. 
“obtaining second data comprising the populated field and a second populated field,” (MPEP 2106.05(d)(II) indicates that receiving or transmitting data over a network, e.g., using the Internet to gather data, is a well- understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well- understood, routine, conventional activity is supported under Berkheimer.)

Claim 4
Step 1: A method, as above.
Step 2A prong 1: See the rejection of claim 3 above, which claim 4 depends on. The claim further recites
the following abstract ideas:
“wherein the data being in regard to a first user, and the second data being in regard to a second user.” (a person mentally or with a pen and paper can organize data, one belonging to first user and second data belonging to second user.)
Step 2A prong 2 & Step 2B: The claim does not recite any additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception. 

Claim 5
Step 1: A method, as above.
Step 2A prong 1: See the rejection of claim 1 above, which claim 5 depends on. The claim further recites
the following abstract ideas:
“wherein making the determination comprises: identifying a type of the unpopulated field;" (a person mentally or with a pen and paper identifies the type of the unpopulated field during the determination step.)
“identifying that the type of the unpopulated field is one of the types of the unpopulated fields for which the inferences are generated” (a person mentally or with a pen and paper identifies the type of the unpopulated field is the type for which the inference is generated.)
Step 2A prong 2 & Step 2B: The claim does not recite additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception. 
by the first inference model. (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)

Claim 6
Step 1: A method, as above.
Step 2A prong 1: See the rejection of claim 5 above, which claim 6 depends on. The claim further recites
the following abstract ideas:
“(…) qualified training data, the qualified training data comprising a subset of all available training data, the subset of the all available training data being selected (…)” (a person mentally or with a pen and paper denotes qualified training data which consists of a subset of all available data.)
Step 2A prong 2 & Step 2B: The claim does not recite additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception. 
“wherein the first inference model is based on…” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)
“…based on a second inference model” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)

Claim 7
Step 1: A method, as above.
Step 2A prong 1: See the rejection of claim 6 above, which claim 7 depends on.
Step 2A prong 2 & Step 2B: The claim does not recite additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception. 
“wherein the second inference model is a self-supervised learning inference model, and the first inference model being a supervised learning inference model.” (The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).)

Claim 8
Step 1: A method, as above.
Step 2A prong 1: See the rejection of claim 1 above, which claim 8 depends on.
Step 2A prong 2 & Step 2B: The claim does not recite additional elements that integrate the judicial exception into a practical application nor amount to significantly more than the judicial exception. 
“providing a computer-implemented service using the supplemented data provided to the downstream consumer.” (The limitation amounts to merely indicating a field of use or technological environment in which to apply a judicial exception. This does not amount to significantly more than the exception itself (MPEP 2106.05(h)).)

Claim 9
Step 1: The claim recites a non-transitory machine-readable medium; therefore, it is directed to the statutory category of manufacture.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components, then it falls within the "Mathematical Concepts" grouping of abstract ideas. The claim recites the following abstract ideas:
“making a determination regarding whether a first inference model can predict at least the unpopulated field with a sufficient degree of reliability, the sufficient degree of reliability being ensured by limiting types of unpopulated fields for which inferences are generated…” (a person mentally or with a pen and paper makes a determination whether an inference model’s prediction reliability meets a required threshold.)
“in an instance of the determination in which the first inference model can predict at least the unpopulated field: generating an inference…” (a person mentally or with a pen and paper determines the first inference model is able to predict the unpopulated field and therefore makes an inference.)
“populating the unpopulated field using the inference to obtain supplemented data; and” (a person mentally or with a pen and paper fills in the unpopulated field with the inference they made in the earlier step.)
Step 2A prong 2: The claim does not recite additional elements that integrate the judicial exception into a practical application.
“A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising:” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to user selected limitations on information collection;” (Data Gather - Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).)
“…by the first inference model;” and “…using the first inference model;” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)
“providing the supplemented data to a downstream consumer.” (Adding insignificant extra-solution activity to the judicial exception (MPEP 2106.05(g)))
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception. 
“A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising:” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to user selected limitations on information collection;” (MPEP 2106.05(d)(II) indicates that merely gathering data is a well- understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well- understood, routine, conventional activity is supported under Berkheimer.)
“…by the first inference model;” and “…using the first inference model;” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)
“providing the supplemented data to a downstream consumer.” (MPEP 2106.05(d)(II) indicates that receiving or transmitting data over a network, e.g., using the Internet to gather data, is a well- understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well- understood, routine, conventional activity is supported under Berkheimer.)

Claim 10 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 2. Therefore, claim 10 is rejected under the same rationale as claim 2. 
Claim 11 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 3. Therefore, claim 11 is rejected under the same rationale as claim 3. 
Claim 12 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 4. Therefore, claim 12 is rejected under the same rationale as claim 4. 
Claim 13 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 5. Therefore, claim 13 is rejected under the same rationale as claim 5. 
Claim 14 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 6. Therefore, claim 14 is rejected under the same rationale as claim 6. 

Claim 15
Step 1: The claim recites a system; therefore, it is directed to the statutory category of machine.
Step 2A Prong 1: If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the "Mental Processes" grouping of abstract ideas. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation by mathematical calculation but for the recitation of generic computer components, then it falls within the "Mathematical Concepts" grouping of abstract ideas. The claim recites the following abstract ideas:
“making a determination regarding whether a first inference model can predict at least the unpopulated field with a sufficient degree of reliability, the sufficient degree of reliability being ensured by limiting types of unpopulated fields for which inferences are generated…” (a person mentally or with a pen and paper makes a determination whether an inference model’s prediction reliability meets a required threshold.)
“in an instance of the determination in which the first inference model can predict at least the unpopulated field: generating an inference…” (a person mentally or with a pen and paper determines the first inference model is able to predict the unpopulated field and therefore makes an inference.)
“populating the unpopulated field using the inference to obtain supplemented data; and” (a person mentally or with a pen and paper fills in the unpopulated field with the inference they made in the earlier step.)
Step 2A prong 2: The claim does not recite additional elements that integrate the judicial exception into a practical application.
“a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising:” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to user selected limitations on information collection;” (Data Gather - Mere data gathering recited at a high level of generality, and thus are insignificant extra-solution activity (MPEP 2106.05(g)).)
“…by the first inference model;” and “…using the first inference model;” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)
“providing the supplemented data to a downstream consumer.” (Adding insignificant extra-solution activity to the judicial exception (MPEP 2106.05(g)))
Step 2B: The claim does not recite additional elements that amount to significantly more than the judicial exception. 
“a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising:” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)).)
“obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to user selected limitations on information collection;” (MPEP 2106.05(d)(II) indicates that merely gathering data is a well- understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well- understood, routine, conventional activity is supported under Berkheimer.)
“…by the first inference model;” and “…using the first inference model;” (Adding the words "apply it" (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f)). -- Examiner’s Note: claim recites a generic off the shelf inference model as tools to perform the recited abstract ideas.)
“providing the supplemented data to a downstream consumer.” (MPEP 2106.05(d)(II) indicates that receiving or transmitting data over a network, e.g., using the Internet to gather data, is a well- understood, routine, conventional function when it is claimed in a merely generic manner (as it is in the present claim). Thereby, a conclusion that the claimed limitation is well- understood, routine, conventional activity is supported under Berkheimer.)

Claim 16 is a system claim that recites substantially the same limitations as claim 2. Therefore claim 16 is rejected under the same rationale as claim 2.
Claim 17 is a system claim that recites substantially the same limitations as claim 3. Therefore claim 17 is rejected under the same rationale as claim 3.
Claim 18 is a system claim that recites substantially the same limitations as claim 4. Therefore claim 18 is rejected under the same rationale as claim 4.
Claim 19 is a system claim that recites substantially the same limitations as claim 5. Therefore claim 19 is rejected under the same rationale as claim 5.
Claim 20 is a system claim that recites substantially the same limitations as claim 6. Therefore claim 20 is rejected under the same rationale as claim 6.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claim 1-7, 9-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1, 2, 4-7, 10-12, 14-17, 19, 20 of copending Application No. 18/343,950 (reference application) in view of US patent application US 20130226838 A1 Chu et al. and non-patent literature Leemann et al. (“I PREFER NOT TO SAY: PROTECTING USER CONSENT IN MODELS WITH OPTIONAL PERSONAL DATA”, hereinafter “Leemann”). Although the claims at issue are not identical, they are not patentably distinct from each other because the reference application No. 18/343,950 discloses substantially the same subject matter of the claims of the present application. Claim 1 of the reference application discloses “obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to limitations on information collection; making a determination regarding whether a first inference model can predict at least the unpopulated field within a reliability range, the reliability range ensuring that inferences generated by the first inference model comply with the limitations; in an instance of the determination in which the first inference model can predict the at least the unpopulated field: generating an inference using the first inference model; populating the unpopulated field using the inference to obtain supplemented data; and providing the supplemented data to a downstream consumer.” The only difference in the claim is the reference application does not disclose “user selected” and “sufficient degree of reliability, the sufficient degree of reliability being ensured by limiting types of unpopulated fields”. This is not considered to be a significant distinction as all the other parts of the independent claim are substantially identical. The dependent claims disclose substantially similar subject matter (see below). A comparison chart of the claims are presented below, followed by an analysis.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Instant Application
Reference Application 18/343,950
1. A method of managing operation of a data pipeline, the method comprising: obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to user selected limitations on information collection; making a determination regarding whether a first inference model can predict at least the unpopulated field with a sufficient degree of reliability, the sufficient degree of reliability being ensured by limiting types of unpopulated fields for which inferences are generated by the first inference model; in an instance of the determination in which the first inference model can predict at least the unpopulated field: generating an inference using the first inference model; populating the unpopulated field using the inference to obtain supplemented data; and providing the supplemented data to a downstream consumer.
1. A method of managing operation of a data pipeline, the method comprising: obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to limitations on information collection; making a determination regarding whether a first inference model can predict at least the unpopulated field within a reliability range, the reliability range ensuring that inferences generated by the first inference model comply with the limitations; in an instance of the determination in which the first inference model can predict the at least the unpopulated field: generating an inference using the first inference model; populating the unpopulated field using the inference to obtain supplemented data; and providing the supplemented data to a downstream consumer.
2. The method of claim 1, further comprising: prior to obtaining the data: using a second inference model to qualify which fields of the data are predictable using other fields of the data; and performing a training process based on the fields of the data that are predictable to obtain the first inference model.
2. The method of claim 1, further comprising: prior to obtaining the data: using a second inference model to qualify which fields of the data are predictable using other fields of the data; and performing a training process based on the fields of the data that are predictable to obtain the first inference model.
3. The method of claim 2, further comprising: obtaining second data comprising the populated field and a second populated field, the second populated field comprising information due to second user selected limitations on the information collection, and content of the second populated field being barred by the user selected limitations.
4. The method of claim 2, further comprising: obtaining second data comprising the populated field and a second populated field, the second populated field comprising information due to second limitations on the information collection, and content of the second populated field being barred by the limitations.
4. The method of claim 3, wherein the data being in regard to a first user, and the second data being in regard to a second user.
5. The method of claim 4, wherein the data being in regard to a first user, and the second data being in regard to a second user.
5. The method of claim 1, wherein making the determination comprises: identifying a type of the unpopulated field; and identifying that the type of the unpopulated field is one of the types of the unpopulated fields for which the inferences are generated by the first inference model.
6. The method of claim 1, wherein making the determination comprises: identifying a type of the unpopulated field; and identifying that the type of the unpopulated field is one of the types of the unpopulated fields for which the inferences are generated by the first inference model.
6. The method of claim 5, wherein the first inference model is based on qualified training data, the qualified training data comprising a subset of all available training data, the subset of the all available training data being selected based on a second inference model.
7. The method of claim 6, wherein the first inference model is based on qualified training data, the qualified training data comprising a subset of all available training data, the subset of the all available training data being selected based on a second inference model.
7. The method of claim 6, wherein the second inference model is a self-supervised learning inference model, and the first inference model being a supervised learning inference model.
10. The method of claim 7, wherein the second inference model is a self-supervised learning inference model, and the first inference model being a supervised learning inference model.
9. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising: obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to user selected limitations on information collection; making a determination regarding whether a first inference model can predict at least the unpopulated field with a sufficient degree of reliability, the sufficient degree of reliability being ensured by limiting types of unpopulated fields for which inferences are generated by the first inference model; in an instance of the determination in which the first inference model can predict at least the unpopulated field: generating an inference using the first inference model; populating the unpopulated field using the inference to obtain supplemented data; and providing the supplemented data to a downstream consumer.
11. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising: obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to limitations on information collection; making a determination regarding whether a first inference model can predict at least the unpopulated field within a reliability range, the reliability range ensuring that inferences generated by the first inference model comply with the limitations; in an instance of the determination in which the first inference model can predict the at least the unpopulated field: generating an inference using the first inference model; populating the unpopulated field using the inference to obtain supplemented data; and providing the supplemented data to a downstream consumer.
10. The non-transitory machine-readable medium of claim 9, further comprising: prior to obtaining the data: using a second inference model to qualify which fields of the data are predictable using other fields of the data; and performing a training process based on the fields of the data that are predictable to obtain the first inference model.
12. The non-transitory machine-readable medium of claim 11, further comprising: prior to obtaining the data: using a second inference model to qualify which fields of the data are predictable using other fields of the data; and performing a training process based on the fields of the data that are predictable to obtain the first inference model.
11. The non-transitory machine-readable medium of claim 10, further comprising: obtaining second data comprising the populated field and a second populated field, the second populated field comprising information due to second user selected limitations on the information collection, and content of the second populated field being barred by the user selected limitations.
14. The non-transitory machine-readable medium of claim 12, further comprising: obtaining second data comprising the populated field and a second populated field, the second populated field comprising information due to second limitations on the information collection, and content of the second populated field being barred by the limitations.
12. The non-transitory machine-readable medium of claim 11, wherein the data being in regard to a first user, and the second data being in regard to a second user.
15. The non-transitory machine-readable medium of claim 14, wherein the data being in regard to a first user, and the second data being in regard to a second user.
13. The non-transitory machine-readable medium of claim 9, wherein making the determination comprises: identifying a type of the unpopulated field; and identifying that the type of the unpopulated field is one of the types of the unpopulated fields for which the inferences are generated by the first inference model.
6. The method of claim 1, wherein making the determination comprises: identifying a type of the unpopulated field; and identifying that the type of the unpopulated field is one of the types of the unpopulated fields for which the inferences are generated by the first inference model.
14. The non-transitory machine-readable medium of claim 13, wherein the first inference model is based on qualified training data, the qualified training data comprising a subset of all available training data, the subset of the all available training data being selected based on a second inference model.
7. The method of claim 6, wherein the first inference model is based on qualified training data, the qualified training data comprising a subset of all available training data, the subset of the all available training data being selected based on a second inference model.
15. A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising: obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to user selected limitations on information collection; making a determination regarding whether a first inference model can predict at least the unpopulated field with a sufficient degree of reliability, the sufficient degree of reliability being ensured by limiting types of unpopulated fields for which inferences are generated by the first inference model; in an instance of the determination in which the first inference model can predict at least the unpopulated field: generating an inference using the first inference model; populating the unpopulated field using the inference to obtain supplemented data; and providing the supplemented data to a downstream consumer.
16. A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising: obtaining data comprising a populated field and an unpopulated field, the unpopulated field lacking information due to limitations on information collection; making a determination regarding whether a first inference model can predict at least the unpopulated field within a reliability range, the reliability range ensuring that inferences generated by the first inference model comply with the limitations; in an instance of the determination in which the first inference model can predict the at least the unpopulated field: generating an inference using the first inference model; populating the unpopulated field using the inference to obtain supplemented data; and providing the supplemented data to a downstream consumer.
16. The data processing system of claim 15, further comprising: prior to obtaining the data: using a second inference model to qualify which fields of the data are predictable using other fields of the data; and performing a training process based on the fields of the data that are predictable to obtain the first inference model.
17. The data processing system of claim 16, further comprising: prior to obtaining the data: using a second inference model to qualify which fields of the data are predictable using other fields of the data; and performing a training process based on the fields of the data that are predictable to obtain the first inference model.
17. The data processing system of claim 16, further comprising: obtaining second data comprising the populated field and a second populated field, the second populated field comprising information due to second user selected limitations on the information collection, and content of the second populated field being barred by the user selected limitations.
19. The data processing system of claim 17, further comprising: obtaining second data comprising the populated field and a second populated field, the second populated field comprising information due to second limitations on the information collection, and content of the second populated field being barred by the limitations.
18. The data processing system of claim 17, wherein the data being in regard to a first user, and the second data being in regard to a second user.
20. The data processing system of claim 19, wherein the data being in regard to a first user, and the second data being in regard to a second user.
19. The data processing system of claim 15, wherein making the determination comprises: identifying a type of the unpopulated field; and identifying that the type of the unpopulated field is one of the types of the unpopulated fields for which the inferences are generated by the first inference model.
6. The method of claim 1, wherein making the determination comprises: identifying a type of the unpopulated field; and identifying that the type of the unpopulated field is one of the types of the unpopulated fields for which the inferences are generated by the first inference model.
20. The data processing system of claim 19, wherein the first inference model is based on qualified training data, the qualified training data comprising a subset of all available training data, the subset of the all available training data being selected based on a second inference model.
7. The method of claim 6, wherein the first inference model is based on qualified training data, the qualified training data comprising a subset of all available training data, the subset of the all available training data being selected based on a second inference model.


Claims 2, 4-7, 10, 12, 16, and 18 are identical to the reference application, as shown in the comparison chart above. 
Claims 13-14 and 19-20 are essentially identical to the reference application, with the only difference being “non-transitory machine-readable medium” for claim 13-14 and “data processing system” for claims 19-20.

Regarding claim 13-14:
Chu teaches “non-transitory machine-readable medium” (Para 121, “In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.”)
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to have modified the elements in claim 6-7 of the reference application to include a non-transitory machine readable medium to run their system. The motivation for doing so would be to be able to implement the invention in various other ways besides a generic method. See Paragraph 123-124 of Chu, “[0123] Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. [0124] Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).”

Regarding claims 19-20:
Chu teaches “data processing system” (Para 27, “The missing value imputation system 110 provides an efficient system to impute missing values of inputs/predictor variables for the subsequent model building processes on large and distributed data sources (e.g., using a Map-Reduce approach).”
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to have modified the elements in claims 6-7 of the reference application to include a “data processing system”. The motivation for doing so would be to be able to implement the invention in various other ways besides a generic method. See Paragraph 9 of Chu, “Provided are a computer implemented method, computer program product, and system for imputing a missing value for each of one or more predictor variables. Data is received from one or more data sources. For each of the one or more predictor variables, an imputation model is built based on information of a target variable; a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable; and the determined type of imputation model is constructed using basic statistics of the predictor variable and the target variable. The missing value is imputed for each of the one or more predictor variables using the data from the one or more data sources and one or more built imputation models to generate a completed data set.”

Claims 1, 9, and 15 are substantially identical to the reference application, with the only difference being “user selected” and “sufficient degree of reliability”. 

Regarding claims 1, 9 and 15:
Leemann teaches “user selected” (Page 1, “Some users consent to their data being used whereas others object and keep their data undisclosed.” Page 3, “It is the users’ choice to decide if they want to disclose z to the system, which results in an availability variable a ∈ {0,1}. Accordingly, only imputed samples z∗ = {z if a=1, else N/A} are observed, where a value of N/A indicates that a user did not reveal the optional information, e.g., did not use the companion app.”) 
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to have modified the elements in claims 1, 11, and 16 of the reference application include “user selected”. The motivation for doing so would be to ensure the predictive models remain functional when users withhold some personal information. See page 2 of Leemann, “the non-sharers do not want the additional information to be considered in the decision making process; in return, they are willing to sacrifice some accuracy, but they do not want to face other systematic disadvantages.”

Chu teaches “sufficient degree of reliability” (Paragraph 34, “In block 304, a global validation sample is scored by each Mapper to evaluate the accuracy of an imputation model. The Reducer selects the top K imputation models out of N possible imputation models based on some accuracy measures as the final ensemble model for each of the one or more predictor variables with missing values.” – EN: this denotes evaluating each imputation model against a validation sample using accuracy measures and selects only the top-performing models. This accuracy-based selection process ensures that only models meeting a “sufficient degree of reliability” are used to generate imputations for the unpopulated fields.) “the sufficient degree of reliability being ensured by limiting types of unpopulated fields for which inferences are generated by the first inference model;” (Paragraph 80, “determining which imputation model type to construct (which is based on the data source type and the measurement levels of the predictor variable and target variable (i.e., categorical or continuous)).” Paragraph 34, “The Reducer selects the top K imputation models out of N possible imputation models based on some accuracy measures as the final ensemble model for each of the one or more predictor variables with missing values.” Also see FIG. 6 and Paragraph 78, which shows that each model type is restricted to a specific combination of predictor variable type and target variable type (for example, piecewise linear regression is limited to “continuous/continuous. Conditional mode is limited to categorical/categorical.). – EN: The instant application in paragraph 20, explains that “limiting types of unpopulated fields” means the system identifies which field types can be reliably predicted and restricts inference generation to those qualifying types only. The purpose may be to prevent the model from generating unreliable predictions for field types it is not suited to handle. Under the Broadest reasonable interpretation, “limiting types of unpopulated fields for which inferences are generated by the first inference model” can reasonably be read as constraining a given inference model to generate inferences only for those types of unpopulated fields that match the model’s design parameters, thereby ensuring reliability. The prior art Chu’s system ensures reliability through a two-part limiting mechanism. First, the system restricts which types of unpopulated fields each imputation model can generate inferences for by matching model architecture to the measurement level of the field, for example, a piecewise linear regression model is limited to generating inferences only for continuous-type unpopulated fields, and a conditional mode model is limited only to categorical-type fields, and so on (See FIG. 6). A given model type will not generate inferences for field types outside its designated scope. Secondly, even within those permitted type categories, models that fail to meet accuracy measures during validation are excluded from the final ensemble. Together, these constraints limit which types of unpopulated fields receive inferences and ensure a sufficient degree of reliability.)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to have modified the elements in claims 1, 11, and 16 of the reference application to include “sufficient degree of reliability, the sufficient degree of reliability being ensured by limiting types of unpopulated fields for which inferences are generated by the first inference model;”. The motivation for doing so would be improve accuracy and reliability during the model building process. See Paragraph 2, “When these missing values are not handled appropriately during model building, then the predictive model is not reliable, and any decision based on the predictive model may result in losses for a company.”

Claims 3, 11, and 17 are substantially identical to the reference application, with the only difference being “user selected”. 

Regarding claims 3, 11, and 17:
Leemann teaches “user selected” (Page 1, “Some users consent to their data being used whereas others object and keep their data undisclosed.” Page 3, “It is the users’ choice to decide if they want to disclose z to the system, which results in an availability variable a ∈ {0,1}. Accordingly, only imputed samples z∗ = {z if a=1, else N/A} are observed, where a value of N/A indicates that a user did not reveal the optional information, e.g., did not use the companion app.”) 
	Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to have modified the elements in claims 4, 14, and 19 of the reference application include “user selected”. The motivation for doing so would be to ensure the predictive models remain functional when users withhold some personal information. See page 2 of Leemann, “the non-sharers do not want the additional information to be considered in the decision making process; in return, they are willing to sacrifice some accuracy, but they do not want to face other systematic disadvantages.”

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Examiner’s Note: Some rejections will include an Examiner’s Note (labeled ‘EN’) to provide additional context or rationale explaining the basis for the rejection.

Claims 1, 5, 8 and 9, 13 and 15, 19 are rejected user 35 U.S.C. 103 as being unpatentable over US patent application US 20130226838 A1 Chu et al. in view of non-patent literature Leemann et al. (“I PREFER NOT TO SAY: PROTECTING USER CONSENT IN MODELS WITH OPTIONAL PERSONAL DATA”, hereinafter “Leemann”).

Claim 1
Chu teaches:
A method of managing operation of a data pipeline, the method comprising: (Paragraph 9, “Provided are a computer implemented method, computer program product, and system for imputing a missing value for each of one or more predictor variables.”)

obtaining data comprising a populated field and an unpopulated field, (…) (Paragraph 2, “Predictive models are widely used and are often built on demographic, survey, and other data that contain many missing values.” Paragraph 41, “Unlike regression imputation and multiple imputation methods, the missing value imputation system 110 uses only the target variable to impute missing values in predictor variables. Thus only univariate and bivariate statistics between the target variable and a predictor variable with missing values are used to build imputation models, regardless of their measurement levels, and those statistics can be computed for all predictor variables within each Mapper or data source independently.”) 

making a determination regarding whether a first inference model can predict at least the unpopulated field (Paragraph 80, “The missing value imputation system 110 builds imputation models based only on the target variable information and based on determining which imputation model type to construct (which is based on the data source type and the measurement levels of the predictor variable and target variable (i.e., categorical or continuous)).” – EN: this denotes a step of determining which imputation model type to construct based on the measurement levels of the variables. This determination assesses whether a given model is suited to predict the unpopulated variable, as different model types are applicable to different variable types (see FIG. 6).) with a sufficient degree of reliability, (Paragraph 34, “In block 304, a global validation sample is scored by each Mapper to evaluate the accuracy of an imputation model. The Reducer selects the top K imputation models out of N possible imputation models based on some accuracy measures as the final ensemble model for each of the one or more predictor variables with missing values.” – EN: this denotes evaluating each imputation model against a validation sample using accuracy measures and selects only the top-performing models. This accuracy-based selection process ensures that only models meeting a “sufficient degree of reliability” are used to generate imputations for the unpopulated fields.) the sufficient degree of reliability being ensured by limiting types of unpopulated fields for which inferences are generated by the first inference model; (Paragraph 80, “determining which imputation model type to construct (which is based on the data source type and the measurement levels of the predictor variable and target variable (i.e., categorical or continuous)).” Paragraph 34, “The Reducer selects the top K imputation models out of N possible imputation models based on some accuracy measures as the final ensemble model for each of the one or more predictor variables with missing values.” Also see FIG. 6 and Paragraph 78, which shows that each model type is restricted to a specific combination of predictor variable type and target variable type (for example, piecewise linear regression is limited to “continuous/continuous. Conditional mode is limited to categorical/categorical.). – EN: The instant application in paragraph 20, explains that “limiting types of unpopulated fields” means the system identifies which field types can be reliably predicted and restricts inference generation to those qualifying types only. The purpose may be to prevent the model from generating unreliable predictions for field types it is not suited to handle. Under the Broadest reasonable interpretation, “limiting types of unpopulated fields for which inferences are generated by the first inference model” can reasonably be read as constraining a given inference model to generate inferences only for those types of unpopulated fields that match the model’s design parameters, thereby ensuring reliability. The prior art Chu’s system ensures reliability through a two-part limiting mechanism. First, the system restricts which types of unpopulated fields each imputation model can generate inferences for by matching model architecture to the measurement level of the field, for example, a piecewise linear regression model is limited to generating inferences only for continuous-type unpopulated fields, and a conditional mode model is limited only to categorical-type fields, and so on (See FIG. 6). A given model type will not generate inferences for field types outside its designated scope. Secondly, even within those permitted type categories, models that fail to meet accuracy measures during validation are excluded from the final ensemble. Together, these constraints limit which types of unpopulated fields receive inferences and ensure a sufficient degree of reliability.)

 in an instance of the determination in which the first inference model can predict at least the unpopulated field: (Paragraph 74, “If the k.sup.th record is a missing value in predictor variable X with a known target variable value, y.sub.k, then, the missing value will be imputed with a predictor variable category by judging which distribution the target variable value y.sub.k is more likely to belong to, that is the missing value will be imputed as follows…”) generating an inference using the first inference model; (Paragraph 35, “In block 306, the missing value imputation system 110 imputes the missing value for each of the one or more predictor variables using the data from the multiple data sources, one or more formed ensemble models, and a selected imputation strategy”) 

populating the unpopulated field using the inference to obtain supplemented data; and (Paragraph 83, “The missing value imputation system 110 uses one or more imputation models to generate one set of values for inputs with missing values.”) 

providing the supplemented data to a downstream consumer. (Paragraph 28, “Finally, the complete data sets for all possible predictor variables are used to build any models for prediction, discovery, and interpretation of relationships between the target variable and a set of the predictor variables.”)

Chu does not explicitly disclose: the unpopulated field lacking information due to user selected limitations on information collection;

However, Leemann teaches:
the unpopulated field lacking information due to user selected limitations on information collection; (Page 1, “Some users consent to their data being used whereas others object and keep their data undisclosed.” Page 3, “It is the users’ choice to decide if they want to disclose z to the system, which results in an availability variable a ∈ {0,1}. Accordingly, only imputed samples z∗ = {z if a=1, else N/A} are observed, where a value of N/A indicates that a user did not reveal the optional information, e.g., did not use the companion app.”)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the work of Chu and Leemann to have the unpopulated fields that are used in the data pipeline system be due to user set limitations on the data. The motivation for doing so would be to ensure the predictive models remain functional when users withhold some personal information. See page 2, “the non-sharers do not want the additional information to be considered in the decision making process; in return, they are willing to sacrifice some accuracy, but they do not want to face other systematic disadvantages.”

Claim 5
Chu further teaches:
wherein making the determination comprises: identifying a type of the unpopulated field; and (Paragraph 46, “determining whether the measured level of the predictor variable X is continuous.” Paragraph 9, “a type of imputation model to construct is determined based on the one or more data sources, a measurement level of the predictor variable, and a measurement level of the target variable;”) identifying that the type of the unpopulated field is one of the types of the unpopulated fields for which the inferences are generated by the first inference model. (Paragraph 45, “According to the measurement levels of the predictor variable X and the target variable Y, four types of imputation models for the predictor variable X may be built.” Paragraph 80, “determining which imputation model type to construct (which is based on the data source type and the measurement levels of the predictor variable and target variable (i.e., categorical or continuous)).” – EN: this denotes the system identifies the measurement level (type) of the predictor variable and matches it to the one of the defined imputation model categories.)

Claim 8
Chu further teaches:
providing a computer-implemented service using the supplemented data provided to the downstream consumer. (Paragraph 28, “Finally, the complete data sets for all possible predictor variables are used to build any models for prediction, discovery, and interpretation of relationships between the target variable and a set of the predictor variables.” Paragraph 118, “Examples of workloads and functions which may be provided from this layer include: mapping and navigation; software development and lifecycle management; virtual classroom education delivery; data analytics processing; transaction processing; and missing value imputation for predictive models.”)

Claim 9
Chu further teaches:
 A non-transitory machine-readable medium having instructions stored therein, (Paragraph 121, “In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.”) which when executed by a processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising: (Paragraph 27, “The missing value imputation system 110 provides an efficient system to impute missing values of inputs/predictor variables for the subsequent model building processes on large and distributed data sources (e.g., using a Map-Reduce approach).” Paragraph 125, “These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.”)
The remaining limitations of claim 9 are substantially the same as claim 1, therefore 9 is rejected under the same rationale as claim 1. 

Claim 13 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 5. Therefore, claim 13 is rejected under the same rationale as claim 5. 

Claim 15
Chu further teaches:
A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations for managing operation of a data pipeline, the operations comprising: (Paragraph 27, “The missing value imputation system 110 provides an efficient system to impute missing values of inputs/predictor variables for the subsequent model building processes on large and distributed data sources (e.g., using a Map-Reduce approach).” Paragraph 107, “As shown in FIG. 11, computer system/server 1112 in cloud computing node 1110 is shown in the form of a general-purpose computing device. The components of computer system/server 1112 may include, but are not limited to, one or more processors or processing units 1116, a system memory 1128, and a bus 1118 that couples various system components including system memory 1128 to processor 1116.”) 
The remaining limitations of claim 15 are substantially the same as claim 1, therefore 9 is rejected under the same rationale as claim 1. 

Claim 19 is a system claim that recites substantially the same limitations as claim 5. Therefore claim 19 is rejected under the same rationale as claim 5.

Claims 2-4, 6-7, 10-12, 14, 16-18, 20 are rejected user 35 U.S.C. 103 as being unpatentable over US patent application US 20130226838 A1 Chu et al. in view of non-patent literature Leemann et al. (“I PREFER NOT TO SAY: PROTECTING USER CONSENT IN MODELS WITH OPTIONAL PERSONAL DATA”, hereinafter “Leemann”) further in view of US Patent application US 20180046926 A1 Achin et al.

Claim 2
Achin teaches:
prior to obtaining the data: using a second inference model to qualify which fields of the data are predictable using other fields of the data; and (Paragraph 382, “for a given feature, the engine 110 takes all its values across all observations, shuffles them, and reassigns them (e.g., randomly reassigns them) to the observations. This random shuffling may reduce (e.g., destroy) any predictive value for that feature. The engine may then rescore the model on the dataset with the shuffled feature values, producing a new value for the accuracy metric.” Paragraph 398, “In step 1050, for each of the predictive modeling procedures (or fitted models) the system 100 calculates the predictive value of the feature F. In some embodiments, the predictive value of the feature F for a modeling procedure or model is calculated based on the change in accuracy (e.g., based on the difference between the first and second accuracy scores for model).” – EN: this denotes using the fitted predictive models to test whether one field of the data can be used to predict another field of the data. The system does this by scrambling the values of a given feature and measuring whether the model’s prediction accuracy decreases. If accuracy drops significantly, the system knows that field was important for predicting the target field, and vice versa (Paragraph 382). The system then assigns a predictive value score to each field based on these results, which serves as the qualification of which fields of the data are predictable using other fields (Paragraph 398).  This process is performed by the model separate from the “first inference model” and occurs during the model development prior to obtaining new data for prediction (see Paragraph 55 regarding the separate models and Paragraph 242 regarding the occurrence prior to obtaining new data.) performing a training process based on the fields of the data that are predictable to obtain the first inference model. (Paragraph 402, “the system 100 performs feature generation and/or feature engineering based on the model-specific predictive values. For example, the system 100 may prune “less important” features from the dataset.” Paragraph 55, “performing the particular predictive modeling procedure further includes fitting the particular predictive model to the second initial dataset” – EN: this denotes that after the fitted predictive models qualifies which fields are predictable by calculating their predictive scores, the system uses those scores to refine the dataset, keeping fields that were determined to be predictive and pruning fields that were not (Paragraph 402). The system then fits a new predictive model to this refined dataset that reflects only the fields determined to be predictable. (Paragraph 55).) 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the missing value imputation system of Chu with the feature importance evaluation and training process of Achin. The motivation for doing so would be to improve the reliability and accuracy of the model by ensuring that it is only trained on features with demonstrated predictive value. (See Paragraph 257 of Achin, “Moreover, as a consequence of the predictive modeling system 100 exploring more different modeling methods and including more possible predictors, the resulting models may be more accurate than those obtained by traditional methods.”)

Claim 3
Leemann further teaches:
obtaining second data comprising the populated field and a second populated field, (Page 1, “In the current pricing model all potential customers are asked to fill out an application form where they enter base features, for instance information such as their state of residence and age. To improve the pricing model, the insurance offers an additional service, a “companion fitness app” through which additional health data about the customer’s physical condition are collected.” – EN: this denotes the base features which corresponds to the populated field and features from the fitness app which corresponds to second populated field.) the second populated field comprising information due to second user selected limitations on the information collection, (Page 1, “The customers decide whether to use the app or not” Page 2, “individuals who voluntarily share data (sharers) explicitly want the additional information to be considered and want to obtain more accurate predictions.”) and content of the second populated field being barred by the user selected limitations. (Page 1, “alternatively, customers can sign up for a policy without consenting to use the app.” Page 1 – abstract, “the decision not to share data can be considered as information in itself that should be protected to respect users’ privacy.” – EN: this denotes non-sharers who do not consent to the app, and therefore their decision “bars” the collection and use of that optional data.)

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the missing value imputation data pipeline of Chu and feature importance evaluation and training process of Achin with the user consent-based data collection of Leemann. The motivation for doing so would be improve the accuracy of the imputation models in the data pipeline by using data from users who voluntarily provide different fields of information, thereby enabling more reliable predictions when certain fields are unavailable for other users. See Page 2 of Leemann which discusses, “Contribution. We address the problem of how to fairly and privately predict outcomes for users who share optional data and those who do not. Previous work has overlooked this important issue, and we fill this gap by making the following contributions: • Definition. We introduce models with Protected User Consent (PUC), which are optimal under our protection requirement AIR. PUC models outperform or match the performance of a model trained only on the base features, showing that there is no trade-off between the decision maker’s interest in improved predictions and the non-sharer’s privacy preferences.”

Claim 4
Leemann further teaches:
wherein the data being in regard to a first user, and the second data being in regard to a second user. (Page 1, “The group of non-sharing individuals who do not want to provide additional information, for instance due to privacy concerns. We refer to them as non-sharers.” Page 2, “On the other hand, individuals who voluntarily share data (sharers) explicitly want the additional information to be considered”)
Refer to the motivation presented in claim 3. 

Claim 6
Achin further teaches:
wherein the first inference model is based on qualified training data, (Paragraph 423, “Each second-order observation includes observed values of one or more second input variables and values of the output variables predicted by the first-order model based on values of the first input variables corresponding to the values of the second input variables. Generating the second-order input data may include, for each second-order observation: obtaining the observed values of the second input variables and corresponding observed values of the first input variables, and applying the first-order predictive model to the corresponding observed values of the first input variables to generate the predicted values of the output variables.” – EN: this denotes second-order model that is trained on second-order training data that has been specifically generated through a qualification process involving the first-order model, rather than training directly on raw observed data.) the qualified training data comprising a subset of all available training data, (Paragraph 425, “In step 1124, second-order training data and second-order testing data are generated from the second-order input data.” Paragraph 67, “generating, from the second-order input data, second-order training data and second-order testing data” Paragraph 414, “For each feature in the first-order model, there is a corresponding set of feature values from the original dataset, or derived from the original dataset. The second-order modeling technique may use the same features as the first-order model, and may therefore use the original values of such features, or a subset thereof, for the second-order modeling technique's training and test data.” – EN: this denotes the second-order training data is a subset of the second-order input data, which itself is a curated/derived portion of the broader available dataset.) the subset of the all available training data being selected based on a second inference model. (Paragraph 414, “instead of using the actual values of the target from the dataset, the second-order modeling technique uses the predicted values of the target from the first-order model.” Paragraph 423, “applying the first-order predictive model to the corresponding observed values of the first input variables to generate the predicted values of the output variables.” Paragraph 415, “Such alternatives may include other real world data from either the same or different data sources, real world data combined with machine-generated data (e.g., for the purpose of covering a broader range of possibilities than present in the real world sample) (e.g., via interpolation and extrapolation), or data completely generated by a machine-based stochastic model. In some embodiments, the value of the target variable used for training the second-order model is the predicted value from the first-order model. – EN: this denotes the second-order training data is generated and selected based on the first-order model – the first order model determines the content of the training data by generating the predicted target values that become the labels, and by determining which features and observations are suitable for the second-order training process.” 

Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the missing value imputation data pipeline of Chu and the user consent-based data collection of Leemann with the second-order model training approach of Achin, in which a second model qualifies and selects the training data used to train a first model. The motivation for doing so would be to produce a more accurate and efficient inference model, as Achin teaches that models trained on data curated through another model are “just as accurate or even more accurate than the corresponding first-order models, and the software that implements the second-order models is substantially more efficient than the software that implements the corresponding first-order models” (Achin Paragraph 66). 










Claim 7 
Achin further teaches: 
wherein the second inference model is a self-supervised learning inference model, and the first inference model being a supervised learning inference model. (Figure 11A, 
    PNG
    media_image1.png
    360
    755
    media_image1.png
    Greyscale

EN: this denotes 1110, a first order model which is fitted to observed data and then used to generate predicted target values from the data itself (example, generating its own labels from the data without external annotation. This corresponds to the second self-supervised model. 1120 denotes the second-order model is trained on labeled input-output pairs where the labels are the first order model’s predictions. This corresponds to the supervised first model. 
Before the effective filing date of the invention, it would have been obvious to one of ordinary skill in the art to combine the missing value imputation data pipeline of Chu and the user consent-based data collection of Leemann with the two-stage modeling approach of Achin, in which a self-supervised first order model generates labeled data used to train a supervised second-order model. The motivation for doing so would be to leverage the strengths of both learning models. See Paragraphs 63-66 in Achin, “[0063] Second-Order Predictive Modeling [0064] Certain modeling techniques tend to produce opaque and/or complex models that are difficult to understand and difficult to implement efficiently in software. Software implementing such models may use substantial computing resources to produce predictions that could be produced much more efficiently using software that implements other, equally accurate models. [0065] There is a need for techniques for reducing the opaqueness and/or complexity of a first-order predictive model M1 that predicts the values of one or more output variables (“targets”) T based on the values of one or more input variables (“features”) F1, without significantly decreasing the model's accuracy. The inventors have recognized and appreciated that these needs can be met by building a second-order model M2 of the first-order model M1. The second-order model may predict the first-order model's predicted values for the targets T based on the same features F1 (or a subset thereof) and/or one or more features not used by the first-order model.”

Claim 10 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 2. Therefore, claim 10 is rejected under the same rationale as claim 2. 
Claim 11 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 3. Therefore, claim 11 is rejected under the same rationale as claim 3. 
Claim 12 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 4. Therefore, claim 12 is rejected under the same rationale as claim 4. 
Claim 14 is a non-transitory machine-readable medium claim that recites substantially the same limitations as claim 6. Therefore, claim 14 is rejected under the same rationale as claim 6. 
Claim 16 is a system claim that recites substantially the same limitations as claim 2. Therefore claim 16 is rejected under the same rationale as claim 2.
Claim 17 is a system claim that recites substantially the same limitations as claim 3. Therefore claim 17 is rejected under the same rationale as claim 3.
Claim 18 is a system claim that recites substantially the same limitations as claim 4. Therefore claim 18 is rejected under the same rationale as claim 4.
Claim 20 is a system claim that recites substantially the same limitations as claim 6. Therefore claim 20 is rejected under the same rationale as claim 6.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NAYMUR RAHMAN ALI whose telephone number is (571)272-0007. The examiner can normally be reached Mon-Fri. 9:30-6:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached at (571)270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/NAYMUR RAHMAN ALI/Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123
Read full office action
GENERATION OF SUPPLEMENTED DATA FOR USE IN A DATA PIPELINE

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

GENERATION OF SUPPLEMENTED DATA FOR USE IN A DATA PIPELINE

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Strategy Recommendation AI-generated — please review before filing

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email