DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR
1.17(e), was filed in this application after final rejection. Since this application is eligible for continued
examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the
finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's
submission filed on 5 January 2026 has been entered.
Response to Amendment
The amendment filed on 5 January 2026 has been entered.
Claims 1-5, 7-12, 14-18, 20 are pending.
Claims 1, 8, 15 are currently amended.
Response to Arguments
Applicant's arguments filed on 5 January 2026 have been fully considered, but they are not persuasive.
Applicant’s remarks, regarding the rejections of claims under 35 USC 103, have been fully considered.
Applicant respectfully submits that the cited references do not disclose every element of the claims. Applicant notes amended independent Claims 1, 8, and 15 recite: determining the respective adaptability of the respective existing code snippet based on the presence of general code elements therein, the presence of domain specific code elements therein, and the mappability of identified domain
specific code elements;
Applicant submits the cited references do not appear to disclose determining the respective adaptability of a given existing code snipped based on looking at all three of the factors enumerated in the cited portion above. Applicant notes Claim 1 previously recited that the adaptability was based on "one or more" of the above factors and the Office Action pointed to a portion of Chan that discusses determining the mappability of data in a dataset to a prebuilt object. (Office Action, p. 10, citing to Chan, para. [0028]). However, such mappability does not appear to be the same as the mappability of the claims. Further, there appears to be no mention of using the other factors as part of the overall adaptability determination either.
Examiner respectfully disagrees. In response to Applicant's argument that the references fail to show certain features of the invention, it is noted that the features upon which Applicant relies (i.e., “determining the respective adaptability of the respective existing code snippet based on the presence of general code elements therein, the presence of domain specific code elements therein, and the mappability of identified domain specific code elements”), under broadest reasonable interpretation (BRI), are given their plain meaning, unless such meaning is inconsistent with the Specification, see MPEP § 2111.01(I).
As clarified in the present Office Action below, Examiner submits Chan teaches “determining the respective adaptability of the respective existing code snippet based on the presence of general code elements therein, the presence of domain specific code elements therein, and the mappability of identified domain specific code elements” (cf. Chan, [0089] In step 506, the machine learning framework system performs a presence of domain specific code elements therein direct table and/or column match based on the data mappings. The machine learning framework system the mappability of identified domain specific code elements determines if the match is successful (step 508). If not successful, the machine learning framework system, in step 510, uses data mapping rules of the prebuilt object (e.g., data mapping rules 258). For example, the rules may use a look-up table of corresponding terms (e.g., synonyms) for the table and/or column names. If there is a match here (step 512), or by a successful direct match, then the process continues to step 516. Otherwise, all possible table and/or column matches are based on the presence of general code elements therein determined based on data type (step 514).; [0090] In step 516, the determining the respective adaptability of the respective existing code snippet machine learning framework system scores the matches.).
As outlined above, Chan teaches (mappability) matching prebuilt objects, which may “include source code for a function” Chan [0023] to data sources, which include various data formats and types Chan [0031]. In order to match the prebuilt objects to the data sources for building the machine learning model framework, Chan discusses determining both matching the prebuilt objects to direct table and/or column match, similar to the (domain specific code elements) of the claimed invention, as well as matching the prebuilt objects to simply data type of the data source, similar to the (general code elements) of the claimed invention. This allows for the prebuilt objects to be matched to both the generic data type, which is applicable across multiple datasets, or directly matched to the elements in the data, applicable to the specific data source, in the same manner as disclosed by the Specification of the claimed invention ([00139] General code elements may include those code elements that may be generally applicable across multiple datasets. By contrast, domain specific code elements may include those code elements that are specific to the dataset to which the corresponding code snippet is applied.). This matching is then (determining the respective adaptability of the respective existing code snippet) scored by the machine learning framework system to evaluate prebuilt object and model performance Chan [0090].
The rejection of Claim 1 under 35 USC 103 has been maintained. Similarly, the rejections of Claims 8, 15 under 35 USC 103 have been maintained.
Rejections of Claims 2-5, 7, 9-12, 14, 16-18, 20, which depend directly or indirectly from Claims 1, 8, 15, under 35 USC 103 have been maintained.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-5, 7-12, 14-18, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shaikh et al. (U.S. Pre-Grant Publication No. 2020/0097845, hereinafter 'Shaikh'), in view of Polleri et al. (U.S. Pre-Grant Publication No. 2021/0081837, hereinafter ‘Polleri), and further in view of Chan (U.S. Pre-Grant Publication No. 20190228261, previously made of record).
Regarding claim 1 and analogous claims 8, 15, Shaikh teaches A method comprising ([0017] The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.):
obtaining a machine learning (ML) pipeline skeleton that indicates a set of first functional blocks to use to process a new dataset of a new ML project ([0002] Data engineering platforms build machine learning (ML) pipeline skeleton pipelines that transform the data into formats that users can utilize. In addition, data engineering platforms may utilize a machine learning model or algorithm to run over the stored data to analyze the data and identify patterns in the data.; [0023] The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each set of first functional blocks block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).; [0079] If the computer determines that the computer received a user input to upload a new dataset to the computer, yes output of step 908, then the computer process a new dataset of a new ML project computes a set of semantics corresponding to the new dataset (step 910).),
each first functional block of the set of first functional blocks having a respective functionality ([0023] The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each first functional blocks having a respective functionality block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).);
for each respective first functional block of the set of first functional blocks, the method includes: obtaining a plurality of existing code snippets from one or more existing ML pipelines of one or more existing ML projects, each of the existing code snippets instantiating a second functional block of the existing ML pipelines and being a potential instantiation of the respective first functional block ([0058] As soon as a user accesses any stored dataset on the data engineering platform, illustrative embodiments use the dataset's pre-computed semantics to find from one or more existing ML pipelines of one or more existing ML projects matching machine learning models or plurality of existing code snippets source codes that can classify the input data or find a source code to train a machine learning model using the input data. Thus, illustrative embodiments may be useful for identifying potential uses of the dataset, existing code snippets instantiating a second functional block of the existing ML pipelines and being a potential instantiation of the respective first functional block identifying machine learning models created by other users that utilized the input dataset for a particular task, and finding best matching machine learning models and source codes for performing various tasks on the input dataset.);
selecting a particular existing code snippet from the plurality of existing code snippets for implementation of the respective first functional block based on the respective determined adaptabilities of the respective existing code snippets ([0043] After recommendation manager 218 generates of the respective first functional block based on the respective determined adaptabilities of the respective existing code snippets relatedness score 240 for each machine learning model in machine learning models 232 and each set of source codes in source codes 234, recommendation manager 218 compares relatedness score 240 for each machine learning model in machine learning models 232 and each set of source codes in source codes 234 with relatedness score threshold 242. Relatedness score threshold 242 represents a predefined minimum relatedness score value. In other words, recommendation manager 218 selects those candidate data analysis assets (i.e., machine learning models and source codes) having a corresponding relatedness score greater than relatedness score threshold 242 and no longer considers those data analysis assets having a corresponding relatedness score less than relatedness score threshold 242 as viable candidates.; [0081] Optionally, the computer automatically selecting a particular existing code snippet from the plurality of existing code snippets for implementation selects a highest-ranking candidate data analysis asset in the ranked list of candidate analysis assets (step 926). In addition, the computer applies the highest-ranking candidate data analysis asset to the dataset to classify the dataset for performing a particular task using the dataset (step 928).);
Shaikh fails to teach determining a respective adaptability, with respect to the new dataset, of each of the existing code snippets for use with respect to the new dataset, the determining of the respective adaptability of a respective existing code snippet including: determining whether the respective existing code snippet includes general code elements that are generally applicable across multiple datasets; determining whether the respective existing code snippet includes domain specific code elements that are specific to a particular name or value of a corresponding existing dataset to which the respective existing code snippet is specifically applied; in response to the respective code snippet including one or more domain specific code elements, determining whether the one or more domain specific code elements are mappable to names or values included in the new dataset; and determining the respective adaptability of the respective existing code snippet based on the presence of general code elements therein, the presence of domain specific code elements therein, and the mappability of identified domain specific code elements; and generating a set of candidate concrete pipelines based on the selected particular existing code snippets; selecting, from the set of candidate concrete pipelines, a particular concrete pipeline for the new ML project based on a performance of the particular concrete pipeline as applied to the new dataset; and training an ML model on the new dataset based on operations as dictated by the particular concrete pipeline such that the ML model is trained to make predictions related to a new ML task corresponding to the new ML project.
Polleri teaches generating a set of candidate concrete pipelines based on the selected particular existing code snippets ([0049] The machine learning platform 100 can generate highly customizable applications. The library components 168 contain a set of predefined, off-the-shelf workflows or pipelines 136, which the application developer can incorporate into a new machine learning application 112. A workflow specifies various micro services routines 140, software modules 144 and/or infrastructure modules 148 configured in a particular way for a type or class of problem. In addition to this, it is also possible to generating a set of candidate concrete pipelines define new workflows or pipelines 136 by re-using the library components or changing an existing workflow or pipeline 136. The infrastructure modules 148 can also include services such as data gathering, process monitoring, and logging.; [0264] As noted above, certain techniques described herein may be implemented to predict outcomes of software code integration requests. In some embodiments, a model execution engine (e.g., within a code integration request prediction server or plug-in within a software development environment) may receive input data corresponding to a request to based on the selected particular existing code snippets integrate an external code base into a source code project or component. Such input data may identify one or more external code bases (e.g., open source software functions, libraries, etc.) associated with the source code project and component, including the external code base to be integrated and/or additional external code bases that have been previous integrated within the same project or component.);
selecting, from the set of candidate concrete pipelines, a particular concrete pipeline for the new ML project based on a performance of the particular concrete pipeline as applied to the new dataset ([0043] The techniques can utilize existing selecting, from the set of candidate concrete pipelines, a particular concrete pipeline for the new ML project based on a performance of the particular concrete pipeline as applied to the new dataset data ontologies for generating machine learning solutions for a high-precision search of relevant services to compose pipelines with minimal human intervention. For data sets without existing ontologies, one or more ontologies be generated.); and
Shaikh and Polleri are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Shaikh, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Polleri to Shaikh before the effective filing date of the claimed invention in order to obtain suitable candidate machine learning models or pipelines instances to compute an estimate of the confidence for the candidate machine learning models for the point prediction (cf. Polleri, [0331] The point prediction technique can be accomplished by searching an instance as query against a set of training data to obtain suitable candidate machine learning models or pipelines 136 instances to compute an estimate of the confidence for the candidate machine learning models for the point prediction. The technique can produce an estimation of the uncertainty of a particular machine learning system for a point prediction.).
Chan teaches determining a respective adaptability, with respect to the new dataset, of each of the existing code snippets for use with respect to the new dataset, the determining of the respective adaptability of a respective existing code snippet including ([0064] In step 304, the machine learning framework system obtains one or more datasets (e.g., original data 264). In some embodiments, a data onboarding engine (e.g., data onboarding engine 212) and/or a communication engine (e.g., communication engine 230) may obtain the one or more datasets from one or more data source systems (e.g., data source systems 104) over a communication network (e.g., communications network 108).; [0066] In step 308, the machine learning framework system determining a respective adaptability, with respect to the new dataset, of each of the existing code snippets for use with respect to the new dataset selects a particular prebuilt machine learning framework object from the plurality of machine learning framework objects based on the one or more datasets and the user-specified context for creating the particular machine learning application. In some embodiments, a prebuilt object selection engine (e.g., prebuilt object selection engine 208) selects the particular prebuilt machine learning framework object.; [0023] For example, a determining of the respective adaptability of a respective existing code snippet prebuilt object may include source code for a function (or a function definition from which source code may be generated), and/or it may include an API for that function.):
determining whether the respective existing code snippet includes general code elements that are generally applicable across multiple datasets ([0024] In some embodiments, prebuilt objects may be coarsely defined (or, “underdefined”) to facilitate reusability of the prebuilt objects for generating a variety of different machine learning applications. For example, the determining whether the respective existing code snippet includes general code elements that are generally applicable across multiple datasets prebuilt objects may define required data input types, but not define the actual data input identifiers (e.g., table names and/or column names) themselves. In a more specific example, a prebuilt object may define a table with fields for data mapping, but the fields themselves are specifically not defined. The machine learning framework system 102 may determine the actual fields during the machine learning application generation process in order to generate a particular machine learning application.; [0025] In some embodiments, a prebuilt object may define prebuilt machine learning components (or, simply, “prebuilt components”) for implementing a machine learning process and generating a corresponding machine learning application.);
determining whether the respective existing code snippet includes domain specific code elements that are specific to a particular name or value of a corresponding existing dataset to which the respective existing code snippet is specifically applied ([0023] For example, data mapping requirements may define particular data types, table and/or column information (e.g., table names or other identifiers and/or column names or other identifiers), and/or the like, for onboarding data and/or preparing data for a machine learning process.; [0024] In some embodiments, prebuilt objects may be coarsely defined (or, “underdefined”) to facilitate reusability of the prebuilt objects for generating a variety of different machine learning applications. For example, the determining whether the respective existing code snippet includes domain specific code elements prebuilt objects may define that are specific to a particular name or value of a corresponding existing dataset to which the respective existing code snippet is specifically applied required data input types, but not define the actual data input identifiers (e.g., table names and/or column names) themselves.);
in response to the respective code snippet including one or more domain specific code elements, determining whether the one or more domain specific code elements are mappable to names or values included in the new dataset ([0028] In some embodiments, the machine learning framework system 102 may function to configure a prebuilt object and/or prebuilt components to generate a machine learning application. For example, the machine learning framework system 102 may modify the prebuilt components from their initial coarsely defined state, to a more granular state capable of being compiled and/or executed. For example, based on determining whether the one or more domain specific code elements are mappable to names or values included in the new dataset mapping table and/or column information of original data to table and/or column in response to the respective code snippet including one or more domain specific code elements requirements of the prebuilt object, the system may more granularly define prebuilt components associated with an onboarding and/or data preparation service, and/or other machine learning services.); and
determining the respective adaptability of the respective existing code snippet based on the presence of general code elements therein, the presence of domain specific code elements therein, and the mappability of identified domain specific code elements ([0089] In step 504, the machine learning framework system preconfigures required tables and/or columns based on table and/or column requirements defined by a prebuilt object (e.g., prebuilt object 250). In step 506, the machine learning framework system performs a presence of domain specific code elements therein direct table and/or column match based on the data mappings. The machine learning framework system the mappability of identified domain specific code elements determines if the match is successful (step 508). If not successful, the machine learning framework system, in step 510, uses data mapping rules of the prebuilt object (e.g., data mapping rules 258). For example, the rules may use a look-up table of corresponding terms (e.g., synonyms) for the table and/or column names. If there is a match here (step 512), or by a successful direct match, then the process continues to step 516. Otherwise, all possible table and/or column matches are based on the presence of general code elements therein determined based on data type (step 514).; [0090] In step 516, the determining the respective adaptability of the respective existing code snippet machine learning framework system scores the matches. In step 518, the machine learning framework system iterates the matches (steps 520-526) until exit criteria is matched. This may include generating all possible features based on prebuilt object direct matches, simple rules and data profile to form feature matrix (step 520), selecting available models as defined by prebuilt object (step 522), measuring (e.g., scoring) model performance based on prebuilt object predefined metrics (step 524), and logging all outputs and test results (step 526).); and
training an ML model on the new dataset based on operations as dictated by the particular concrete pipeline such that the ML model is trained to make predictions related to a new ML task corresponding to the new ML project ([0055] The deployed system may, for example, allow client systems to ML model is trained to make predictions related to a new ML task corresponding to the new ML project create new machine learning applications (e.g., within one or more constraints defined by the machine learning framework system 102), update existing machine learning applications, and/or the like, without having to communicate with the machine learning framework system 102.; [0056] A source code deployment machine learning application may be or include any number of applications configured to training an ML model on the new dataset based on operations as dictated by the particular concrete pipeline create, train, and deploy machine learning applications. In other words, in some embodiments, systems and methods discussed herein may: (1) create and deploy machine learning models; and/or (2) create and deploy systems and processes for creating new machine learning models. In the latter case, it will be appreciated that a third party may receive and utilize systems for creating new machine learning models based on changing data and changing problems while leveraging their industry expertise.).
Shaikh, Polleri, and Chan are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Shaikh and Polleri, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Chan to Shaikh before the effective filing date of the claimed invention in order to reduce computing resource requirements (e.g., processor and/or memory requirements) and allow the computing system and/or machine learning application to be more scalable and/or otherwise efficiently modifiable relative to traditional systems (cf. Chan, [0004] A claimed solution rooted in computer technology overcomes many of the problems encountered when developing machine learning applications. specifically arising in the realm of computer technology. In various embodiments, a computing system is configured to generate and/or provide reusable prebuilt objects and prebuilt components to rapidly and efficiently generate and deploy machine learning applications (e.g., recommendation systems). The prebuilt objects may each define a particular machine learning process and requirements for building a corresponding machine learning application based on a particular set of data and/or other requirements (e.g., machine learning model performance requirements). The level of detail may be initially low (e.g., specifying data input type requirements, but not specifying the particular data input identifiers), so that they may be reused to generate a variety of different machine learning applications. The prebuilt objects may also include and/or reference various prebuilt components associated with particular machine learning services (e.g., data onboarding, data preparation, feature generation, machine learning modeling, and/or model deployment) for implementing the machine learning process with the particular set of data and/or requirements. This may, for example, reduce computing resource requirements (e.g., processor and/or memory requirements) and allow the computing system and/or machine learning application to be more scalable and/or otherwise efficiently modifiable relative to traditional systems.).
Regarding claim 2, Shaikh, modified by Polleri and Chan, teaches The method of claim 1.
Shaikh teaches wherein obtaining a particular existing code snippet of the plurality of existing code snippets for a respective first functional block includes: identifying the second functional block corresponding to the particular existing code snippet based on the second functional block having a functionality that corresponds to the functionality of the respective first functional block ([0038] Semantics 228 represent the features, attributes, and characteristics that correspond to input dataset 226. Semantics 228 may include, for example, schema and content of input dataset 226; types of users who have previously used input dataset 226; types of problems that users were trying to solve using input dataset 226; types of data pattern analysis algorithms, data transformations, and/or identifying the second functional block corresponding to the particular existing code snippet source codes previously applied to input dataset 226; and machine learning models previously generated using input dataset 226.; [0004] The different illustrative embodiments also calculate a based on the second functional block having a functionality that corresponds to the functionality of the respective first functional block relatedness score between the particular input dataset and a plurality of candidate data analysis assets stored on the computer based on semantics corresponding to the particular input dataset and semantics corresponding to each candidate data analysis asset of the plurality of candidate data analysis assets. The plurality of candidate data analysis assets includes machine learning models and source codes. The semantics corresponding to the particular input dataset include schema and content of the particular input dataset, types of users who have used the particular input dataset previously, types of problems that users were trying to solve using the particular input dataset, types of data pattern analysis algorithms, data transformations, and source codes previously applied to the particular input dataset, and machine learning models previously trained using the particular input dataset.);
identifying an existing dataset of the existing ML projects based on the second functional block being applied to the existing dataset via implementation of the second functional block through the particular existing code snippet ([0004] The different illustrative embodiments also calculate a relatedness score between the particular input dataset and a plurality of candidate data analysis assets stored on the computer based on semantics corresponding to the particular input dataset and semantics corresponding to each candidate data analysis asset of the plurality of candidate data analysis assets. The plurality of candidate data analysis assets includes machine learning models and source codes. The semantics corresponding to the particular input dataset include schema and content of the particular input dataset, types of users who have used the identifying an existing dataset of the existing ML projects based on the second functional block being applied to the existing dataset particular input dataset previously, types of problems that users were trying to solve using the particular input dataset, types of data pattern analysis algorithms, data transformations, and implementation of the second functional block through the particular existing code snippet source codes previously applied to the particular input dataset, and machine learning models previously trained using the particular input dataset.);
determining a similarity between the new dataset and the identified existing dataset ([0083] The process begins when the computer performs an analysis of a dataset (step 1002). The computer understands semantics of the dataset based on the analysis (step 1004). The computer utilizes determining a similarity between the new dataset and the identified existing dataset
historical usage patterns of the dataset and similar datasets by users to provide recommendations for existing trained machine learning models and source codes based on the semantics of the dataset (step 1006).); and
selecting the particular existing code snippet as a potential instantiation of the respective first functional block based on the similarity and in response to the particular existing code snippet corresponding to the identified existing dataset ([0083] The process begins when the computer performs an analysis of a dataset (step 1002). The computer understands semantics of the dataset based on the analysis (step 1004). The computer utilizes based on the similarity and in response to the particular existing code snippet corresponding to the identified existing dataset historical usage patterns of the dataset and similar datasets by users to provide recommendations for existing trained machine learning models and source codes based on the semantics of the dataset (step 1006).; [0084] The computer displays the recommendations for the existing trained machine learning models and source codes that are based on the semantics of the dataset (step 1008). Afterward, the computer receives a user selection of a machine learning model and a source code in the displayed recommendations (step 1010). The computer selecting the particular existing code snippet as a potential instantiation of the respective first functional block trains the selected machine learning model with the dataset using the selected source code (step 1012). Thereafter, the process terminates.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 3, Shaikh, modified by Polleri and Chan, teaches The method of claim 2.
Polleri teaches wherein determining the similarity is based on similarities between one or more features of the new dataset with respect to the one or more features of the identified dataset ([0306] At 1406, the functionality includes features of the identified dataset extracting one or more features from the data storage. The data storage can include one or more labels that characterize the data.; [0307] At 1408, the functionality includes developing a weighted list of common representations for each feature. The technique can determine a determining the similarity is based on similarities between one or more features ranking of each the one or more features extracted from the data storage based at least in part on an influence of the one or more features to the solution using the machine learning application.; [0311] At 1414, the functionality includes features of the new dataset with respect to the one or more features of the identified dataset updating weighted list from new data. When new data is added to the data storage, a matching service can automatically detect which features should be fed into the machine learning solution based at least in part on the weighted list previously computed.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 4, Shaikh, modified by Polleri and Chan, teaches The method of claim 2.
Shaikh teaches wherein selecting the particular existing code snippet based on the determined similarity is based on a similarity ranking of a particular identified dataset that corresponds to the particular existing code snippet ([0042] Recommendation manager 218 generates determined similarity is based on a similarity ranking of a particular identified dataset that corresponds to the particular existing code snippet relatedness score 240 for each machine learning model in machine learning models 232 and each set of source code in source codes 234 as they correlate to input dataset 226. In other words, relatedness score 240 represents a degree of similarity or closeness (i.e., measure of strength of relationship) that a particular machine learning model or a particular set of source code has with input dataset 226.; [0043] In other words, recommendation manager 218 selecting the particular existing code snippet selects those candidate data analysis assets (i.e., machine learning models and source codes) having a corresponding relatedness score greater than relatedness score threshold 242 and no longer considers those data analysis assets having a corresponding relatedness score less than relatedness score threshold 242 as viable candidates.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 5, Shaikh, modified by Polleri and Chan, teaches The method of claim 1.
Shaikh teaches further comprising ranking the particular existing code snippet based on the determined similarity between the new dataset and the identified existing dataset ([0058] As soon as a user accesses any stored dataset on the data engineering platform, illustrative embodiments use the dataset's pre-computed semantics to ranking the particular existing code snippet find matching machine learning models or source codes that can classify the input data or find a source code to train a machine learning model using the input data. Thus, illustrative embodiments may be useful for identifying potential uses of the dataset, identifying machine learning models created by other users that utilized the input dataset for a particular task, and determined similarity between the new dataset and the identified existing dataset finding best matching machine learning models and source codes for performing various tasks on the input dataset. In addition, illustrative embodiments identify related work in a specific area using the given input dataset or similar datasets.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 7, Shaikh, modified by Polleri and Chan, teaches The method of claim 1.
Shaikh teaches compatibility of the respective existing code snippet with respect to cardinality of the new dataset ([0080] Further, the computer calculates a relatedness score between the dataset (i.e., the existing dataset or the new dataset) and a plurality of candidate data analysis assets (i.e., machine learning models and source codes) stored on the computer based on semantics corresponding to the dataset and semantics corresponding to each candidate data analysis asset of the plurality of candidate data analysis assets (step 914). Afterward, the computer selects those candidate data analysis assets having a corresponding relatedness score greater than a defined relatedness score threshold value (step 916).).
a data type of input data and output data of the respective existing code snippet ([0038] Semantics 228 represent the features, attributes, and characteristics that correspond to input dataset 226. Semantics 228 may include, for example, schema and content of input dataset 226; types of users who have previously used input dataset 226; types of problems that users were trying to solve using input dataset 226; types of data pattern analysis algorithms, data transformations, and/or source codes previously applied to input dataset 226; and machine learning models previously generated using input dataset 226. Recommendation manager 218 also utilizes semantics 228 of input dataset 226 to generate the data analysis asset recommendations.); or
Polleri teaches wherein determining the respective adaptability of a respective existing code snippet is further based on one or more of: compatibility of the respective existing code snippet with respect to cardinality of the new dataset ([0262] Due to the potential risks, issues, and implications of integrating external libraries and code bases into software projects, an organizations may include software architecture authorization system to analyze code integration requests, and to determining the respective adaptability of a respective existing code snippet approve or deny such code integration requests based on one or more potential code integration issues, including license compliance or compatibility, security vulnerabilities, costs, further software dependencies, the recency and priority of the software project, the availability of security patches, and the existence of safer alternative libraries.; [0264] As noted above, certain techniques described herein may be implemented to predict outcomes of software code integration requests. In some embodiments, a model execution engine (e.g., within a code integration request prediction server or plug-in within a software development environment) may receive input data corresponding to a request to integrate an external code base into a source code project or component. Such input data may identify one or more external code bases (e.g., open source software functions, libraries, etc.) associated with the source code project and component, including the external code base to be integrated and/or additional external code bases that have been previous integrated within the same project or component. Additionally, the input data for code integration request may include one or more characteristics of the source code project or component, such as the compatibility of the respective existing code snippet with respect to cardinality of the new dataset associated product or project of the source code component, the associated developer or organization, the purpose for integrating the external code base or functionality to be leveraged within the external code base, etc.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 9, Shaikh, modified by Polleri and Chan, teaches The one or more non-transitory computer-readable storage media of claim 8.
Shaikh teaches wherein obtaining a particular existing code snippet of the plurality of existing code snippets for a respective first functional block includes: identifying the second functional block corresponding to the particular existing code snippet based on the second functional block having a functionality that corresponds to the functionality of the respective first functional block ([0038] Semantics 228 represent the features, attributes, and characteristics that correspond to input dataset 226. Semantics 228 may include, for example, schema and content of input dataset 226; types of users who have previously used input dataset 226; types of problems that users were trying to solve using input dataset 226; types of data pattern analysis algorithms, data transformations, and/or identifying the second functional block corresponding to the particular existing code snippet source codes previously applied to input dataset 226; and machine learning models previously generated using input dataset 226.; [0004] The different illustrative embodiments also calculate a based on the second functional block having a functionality that corresponds to the functionality of the respective first functional block relatedness score between the particular input dataset and a plurality of candidate data analysis assets stored on the computer based on semantics corresponding to the particular input dataset and semantics corresponding to each candidate data analysis asset of the plurality of candidate data analysis assets. The plurality of candidate data analysis assets includes machine learning models and source codes. The semantics corresponding to the particular input dataset include schema and content of the particular input dataset, types of users who have used the particular input dataset previously, types of problems that users were trying to solve using the particular input dataset, types of data pattern analysis algorithms, data transformations, and source codes previously applied to the particular input dataset, and machine learning models previously trained using the particular input dataset.);
identifying an existing dataset of the existing ML projects based on the second functional block being applied to the existing dataset via implementation of the second functional block through the particular existing code snippet ([0004] The different illustrative embodiments also calculate a relatedness score between the particular input dataset and a plurality of candidate data analysis assets stored on the computer based on semantics corresponding to the particular input dataset and semantics corresponding to each candidate data analysis asset of the plurality of candidate data analysis assets. The plurality of candidate data analysis assets includes machine learning models and source codes. The semantics corresponding to the particular input dataset include schema and content of the particular input dataset, types of users who have used the identifying an existing dataset of the existing ML projects based on the second functional block being applied to the existing dataset particular input dataset previously, types of problems that users were trying to solve using the particular input dataset, types of data pattern analysis algorithms, data transformations, and implementation of the second functional block through the particular existing code snippet source codes previously applied to the particular input dataset, and machine learning models previously trained using the particular input dataset.);
determining a similarity between the new dataset and the identified existing dataset ([0083] The process begins when the computer performs an analysis of a dataset (step 1002). The computer understands semantics of the dataset based on the analysis (step 1004). The computer utilizes determining a similarity between the new dataset and the identified existing dataset
historical usage patterns of the dataset and similar datasets by users to provide recommendations for existing trained machine learning models and source codes based on the semantics of the dataset (step 1006).); and
selecting the particular existing code snippet as a potential instantiation of the respective first functional block based on the similarity and in response to the particular existing code snippet corresponding to the identified existing dataset ([0083] The process begins when the computer performs an analysis of a dataset (step 1002). The computer understands semantics of the dataset based on the analysis (step 1004). The computer utilizes based on the similarity and in response to the particular existing code snippet corresponding to the identified existing dataset historical usage patterns of the dataset and similar datasets by users to provide recommendations for existing trained machine learning models and source codes based on the semantics of the dataset (step 1006).; [0084] The computer displays the recommendations for the existing trained machine learning models and source codes that are based on the semantics of the dataset (step 1008). Afterward, the computer receives a user selection of a machine learning model and a source code in the displayed recommendations (step 1010). The computer selecting the particular existing code snippet as a potential instantiation of the respective first functional block trains the selected machine learning model with the dataset using the selected source code (step 1012). Thereafter, the process terminates.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 10, Shaikh, modified by Polleri and Chan, teaches The one or more non-transitory computer-readable storage media of claim 9.
Polleri teaches wherein determining the similarity is based on similarities between one or more features of the new dataset with respect to the one or more features of the identified dataset ([0306] At 1406, the functionality includes features of the identified dataset extracting one or more features from the data storage. The data storage can include one or more labels that characterize the data.; [0307] At 1408, the functionality includes developing a weighted list of common representations for each feature. The technique can determine a determining the similarity is based on similarities between one or more features ranking of each the one or more features extracted from the data storage based at least in part on an influence of the one or more features to the solution using the machine learning application.; [0311] At 1414, the functionality includes features of the new dataset with respect to the one or more features of the identified dataset updating weighted list from new data. When new data is added to the data storage, a matching service can automatically detect which features should be fed into the machine learning solution based at least in part on the weighted list previously computed.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 11, Shaikh, modified by Polleri and Chan, teaches The one or more non-transitory computer-readable storage media of claim 9.
Shaikh teaches wherein selecting the particular existing code snippet based on the determined similarity is based on a similarity ranking of a particular identified dataset that corresponds to the particular existing code snippet ([0042] Recommendation manager 218 generates determined similarity is based on a similarity ranking of a particular identified dataset that corresponds to the particular existing code snippet relatedness score 240 for each machine learning model in machine learning models 232 and each set of source code in source codes 234 as they correlate to input dataset 226. In other words, relatedness score 240 represents a degree of similarity or closeness (i.e., measure of strength of relationship) that a particular machine learning model or a particular set of source code has with input dataset 226.; [0043] In other words, recommendation manager 218 selecting the particular existing code snippet selects those candidate data analysis assets (i.e., machine learning models and source codes) having a corresponding relatedness score greater than relatedness score threshold 242 and no longer considers those data analysis assets having a corresponding relatedness score less than relatedness score threshold 242 as viable candidates.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 12, Shaikh, modified by Polleri and Chan, teaches The one or more non-transitory computer-readable storage media of claim 8.
Shaikh teaches the operations further comprising ranking the particular existing code snippet based on the determined similarity between the new dataset and the identified existing dataset ([0058] As soon as a user accesses any stored dataset on the data engineering platform, illustrative embodiments use the dataset's pre-computed semantics to ranking the particular existing code snippet find matching machine learning models or source codes that can classify the input data or find a source code to train a machine learning model using the input data. Thus, illustrative embodiments may be useful for identifying potential uses of the dataset, identifying machine learning models created by other users that utilized the input dataset for a particular task, and determined similarity between the new dataset and the identified existing dataset finding best matching machine learning models and source codes for performing various tasks on the input dataset. In addition, illustrative embodiments identify related work in a specific area using the given input dataset or similar datasets.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 14, Shaikh, modified by Polleri and Chan, teaches The one or more non-transitory computer-readable storage media of claim 8.
Shaikh teaches compatibility of the respective existing code snippet with respect to cardinality of the new dataset ([0080] Further, the computer calculates a relatedness score between the dataset (i.e., the existing dataset or the new dataset) and a plurality of candidate data analysis assets (i.e., machine learning models and source codes) stored on the computer based on semantics corresponding to the dataset and semantics corresponding to each candidate data analysis asset of the plurality of candidate data analysis assets (step 914). Afterward, the computer selects those candidate data analysis assets having a corresponding relatedness score greater than a defined relatedness score threshold value (step 916).).
a data type of input data and output data of the respective existing code snippet ([0038] Semantics 228 represent the features, attributes, and characteristics that correspond to input dataset 226. Semantics 228 may include, for example, schema and content of input dataset 226; types of users who have previously used input dataset 226; types of problems that users were trying to solve using input dataset 226; types of data pattern analysis algorithms, data transformations, and/or source codes previously applied to input dataset 226; and machine learning models previously generated using input dataset 226. Recommendation manager 218 also utilizes semantics 228 of input dataset 226 to generate the data analysis asset recommendations.); or
Polleri teaches wherein determining the respective adaptability of a respective existing code snippet is further based on one or more of: or compatibility of the respective existing code snippet with respect to cardinality of the new dataset ([0262] Due to the potential risks, issues, and implications of integrating external libraries and code bases into software projects, an organizations may include software architecture authorization system to analyze code integration requests, and to determining the respective adaptability of a respective existing code snippet approve or deny such code integration requests based on one or more potential code integration issues, including license compliance or compatibility, security vulnerabilities, costs, further software dependencies, the recency and priority of the software project, the availability of security patches, and the existence of safer alternative libraries.; [0264] As noted above, certain techniques described herein may be implemented to predict outcomes of software code integration requests. In some embodiments, a model execution engine (e.g., within a code integration request prediction server or plug-in within a software development environment) may receive input data corresponding to a request to integrate an external code base into a source code project or component. Such input data may identify one or more external code bases (e.g., open source software functions, libraries, etc.) associated with the source code project and component, including the external code base to be integrated and/or additional external code bases that have been previous integrated within the same project or component. Additionally, the input data for code integration request may include one or more characteristics of the source code project or component, such as the compatibility of the respective existing code snippet with respect to cardinality of the new dataset associated product or project of the source code component, the associated developer or organization, the purpose for integrating the external code base or functionality to be leveraged within the external code base, etc.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 16, Shaikh, modified by Polleri and Chan, teaches The system of claim 15.
Shaikh teaches wherein obtaining a particular existing code snippet of the plurality of existing code snippets for a respective first functional block includes: identifying the second functional block corresponding to the particular existing code snippet based on the second functional block having a functionality that corresponds to the functionality of the respective first functional block ([0038] Semantics 228 represent the features, attributes, and characteristics that correspond to input dataset 226. Semantics 228 may include, for example, schema and content of input dataset 226; types of users who have previously used input dataset 226; types of problems that users were trying to solve using input dataset 226; types of data pattern analysis algorithms, data transformations, and/or identifying the second functional block corresponding to the particular existing code snippet source codes previously applied to input dataset 226; and machine learning models previously generated using input dataset 226.; [0004] The different illustrative embodiments also calculate a based on the second functional block having a functionality that corresponds to the functionality of the respective first functional block relatedness score between the particular input dataset and a plurality of candidate data analysis assets stored on the computer based on semantics corresponding to the particular input dataset and semantics corresponding to each candidate data analysis asset of the plurality of candidate data analysis assets. The plurality of candidate data analysis assets includes machine learning models and source codes. The semantics corresponding to the particular input dataset include schema and content of the particular input dataset, types of users who have used the particular input dataset previously, types of problems that users were trying to solve using the particular input dataset, types of data pattern analysis algorithms, data transformations, and source codes previously applied to the particular input dataset, and machine learning models previously trained using the particular input dataset.);
identifying an existing dataset of the existing ML projects based on the second functional block being applied to the existing dataset via implementation of the second functional block through the particular existing code snippet ([0004] The different illustrative embodiments also calculate a relatedness score between the particular input dataset and a plurality of candidate data analysis assets stored on the computer based on semantics corresponding to the particular input dataset and semantics corresponding to each candidate data analysis asset of the plurality of candidate data analysis assets. The plurality of candidate data analysis assets includes machine learning models and source codes. The semantics corresponding to the particular input dataset include schema and content of the particular input dataset, types of users who have used the identifying an existing dataset of the existing ML projects based on the second functional block being applied to the existing dataset particular input dataset previously, types of problems that users were trying to solve using the particular input dataset, types of data pattern analysis algorithms, data transformations, and implementation of the second functional block through the particular existing code snippet source codes previously applied to the particular input dataset, and machine learning models previously trained using the particular input dataset.);
determining a similarity between the new dataset and the identified existing dataset ([0083] The process begins when the computer performs an analysis of a dataset (step 1002). The computer understands semantics of the dataset based on the analysis (step 1004). The computer utilizes determining a similarity between the new dataset and the identified existing dataset
historical usage patterns of the dataset and similar datasets by users to provide recommendations for existing trained machine learning models and source codes based on the semantics of the dataset (step 1006).); and
selecting the particular existing code snippet as a potential instantiation of the respective first functional block based on the similarity and in response to the particular existing code snippet corresponding to the identified existing dataset ([0083] The process begins when the computer performs an analysis of a dataset (step 1002). The computer understands semantics of the dataset based on the analysis (step 1004). The computer utilizes based on the similarity and in response to the particular existing code snippet corresponding to the identified existing dataset historical usage patterns of the dataset and similar datasets by users to provide recommendations for existing trained machine learning models and source codes based on the semantics of the dataset (step 1006).; [0084] The computer displays the recommendations for the existing trained machine learning models and source codes that are based on the semantics of the dataset (step 1008). Afterward, the computer receives a user selection of a machine learning model and a source code in the displayed recommendations (step 1010). The computer selecting the particular existing code snippet as a potential instantiation of the respective first functional block trains the selected machine learning model with the dataset using the selected source code (step 1012). Thereafter, the process terminates.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 17, Shaikh, modified by Polleri and Chan, teaches The system of claim 16.
Polleri teaches wherein determining the similarity is based on similarities between one or more features of the new dataset with respect to the one or more features of the identified dataset ([0306] At 1406, the functionality includes features of the identified dataset extracting one or more features from the data storage. The data storage can include one or more labels that characterize the data.; [0307] At 1408, the functionality includes developing a weighted list of common representations for each feature. The technique can determine a determining the similarity is based on similarities between one or more features ranking of each the one or more features extracted from the data storage based at least in part on an influence of the one or more features to the solution using the machine learning application.; [0311] At 1414, the functionality includes features of the new dataset with respect to the one or more features of the identified dataset updating weighted list from new data. When new data is added to the data storage, a matching service can automatically detect which features should be fed into the machine learning solution based at least in part on the weighted list previously computed.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 18, Shaikh, modified by Polleri and Chan, teaches The system of claim 15.
Shaikh teaches the operations further comprising ranking the particular existing code snippet based on the determined similarity between the new dataset and the identified existing dataset ([0058] As soon as a user accesses any stored dataset on the data engineering platform, illustrative embodiments use the dataset's pre-computed semantics to ranking the particular existing code snippet find matching machine learning models or source codes that can classify the input data or find a source code to train a machine learning model using the input data. Thus, illustrative embodiments may be useful for identifying potential uses of the dataset, identifying machine learning models created by other users that utilized the input dataset for a particular task, and determined similarity between the new dataset and the identified existing dataset finding best matching machine learning models and source codes for performing various tasks on the input dataset. In addition, illustrative embodiments identify related work in a specific area using the given input dataset or similar datasets.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 20, Shaikh, modified by Polleri and Chan, teaches The system of claim 15.
Shaikh teaches compatibility of the respective existing code snippet with respect to cardinality of the new dataset ([0080] Further, the computer calculates a relatedness score between the dataset (i.e., the existing dataset or the new dataset) and a plurality of candidate data analysis assets (i.e., machine learning models and source codes) stored on the computer based on semantics corresponding to the dataset and semantics corresponding to each candidate data analysis asset of the plurality of candidate data analysis assets (step 914). Afterward, the computer selects those candidate data analysis assets having a corresponding relatedness score greater than a defined relatedness score threshold value (step 916).).
a data type of input data and output data of the respective existing code snippet ([0038] Semantics 228 represent the features, attributes, and characteristics that correspond to input dataset 226. Semantics 228 may include, for example, schema and content of input dataset 226; types of users who have previously used input dataset 226; types of problems that users were trying to solve using input dataset 226; types of data pattern analysis algorithms, data transformations, and/or source codes previously applied to input dataset 226; and machine learning models previously generated using input dataset 226. Recommendation manager 218 also utilizes semantics 228 of input dataset 226 to generate the data analysis asset recommendations.); or
Polleri teaches wherein determining the respective adaptability of a respective existing code snippet is further based on one or more of: or compatibility of the respective existing code snippet with respect to cardinality of the new dataset ([0262] Due to the potential risks, issues, and implications of integrating external libraries and code bases into software projects, an organizations may include software architecture authorization system to analyze code integration requests, and to determining the respective adaptability of a respective existing code snippet approve or deny such code integration requests based on one or more potential code integration issues, including license compliance or compatibility, security vulnerabilities, costs, further software dependencies, the recency and priority of the software project, the availability of security patches, and the existence of safer alternative libraries.; [0264] As noted above, certain techniques described herein may be implemented to predict outcomes of software code integration requests. In some embodiments, a model execution engine (e.g., within a code integration request prediction server or plug-in within a software development environment) may receive input data corresponding to a request to integrate an external code base into a source code project or component. Such input data may identify one or more external code bases (e.g., open source software functions, libraries, etc.) associated with the source code project and component, including the external code base to be integrated and/or additional external code bases that have been previous integrated within the same project or component. Additionally, the input data for code integration request may include one or more characteristics of the source code project or component, such as the compatibility of the respective existing code snippet with respect to cardinality of the new dataset associated product or project of the source code component, the associated developer or organization, the purpose for integrating the external code base or functionality to be leveraged within the external code base, etc.).
Shaikh, Polleri, and Chan are combinable for the same rationale as set forth above with respect to claim 1.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Sugimura et al. (NPL: “Building a Reproducible Machine Learning Pipeline”) teaches a framework comprised of four main components (data, feature, scoring, and evaluation layers), which are comprised of well defined transformations, enabling replication of the model and reusing the transformations across different models.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAGGIE MAIDO whose telephone number is (703) 756-1953. The examiner can normally be reached M-Th: 6am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MM/Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129