DETAILED ACTION
Notice of AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Regarding U.S. Provisional Patent Application No. 63/341,936 (filed 5/13/2022), Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) is acknowledged.
Information Disclosure Statement
The information disclosure statement submitted on 8/10/2023 has been considered.
Claim Objections
Claims 1, 8, and 15 are objected to because of the following informalities:
In claim 1, line 3, “when installed on client node configured to” should read “when installed on a client node from the set of client nodes is configured to “ or a similar amendment to correct the grammar of such limitation.
In claim 8, line 5, “when installed on client node configured to” should read “when installed on a client node from the set of client nodes is configured to “ or a similar amendment to correct the grammar of such limitation.
In claim 15, line 6, “when installed on client node configured to” should read “when installed on a client node from the set of client nodes is configured to “ or a similar amendment to correct the grammar of such limitation.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 7-11, and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over US 20210174257 A1, hereinafter referenced as POTHULA, in view of US 20190156927 A1, hereinafter referenced as VIRKAR.
Regarding Claim 1
POTHULA discloses:
A method, comprising: (POTHULA, para. 0110: “FIG. 5 is a diagram that illustrates an exemplary computing system 1000 by which embodiments of the present technique may be implemented. Various portions of systems and methods described herein, may include or be executed on one or more computer systems similar to computing system 1000.”)
providing a data conversion module for deployment in a set of client nodes, the data conversion module when installed on client node configured to generate a set of components, the set of components including at least a pre-assessment tool and a data conversion model for performing one or more conversion operations on client data associated with the client node; (POTHULA, para. 0018: “To mitigate some of these challenges, and in some embodiments all of them and others, some embodiments determine, and dynamically re-assess during run time (or a single time during model design), the relevance of data (e.g., some or all) upon which inputs to the federated machine learning model are based (e.g., the inputs themselves, or parameters of engineered features that are inputs to the model) to the performance (e.g., accuracy, precision, or F2 score) federated machine learning model or to suitability of that model or sub-models (e.g., at different federated processes) to transfer learning. Relevance may be determined with a variety of measures, including various statistical tests like T-student, Chi square, Mahalanobis distance, Shapley values, local interpretable model-agnostic explanations (LIME), cross-entropy, and the like, to assess the relevance of inputs to the federated machine learning model to performance or transfer learning.”;
POTHULA, para. 0024: “A variety of different computing architectures are contemplated. Controller 12 is shown in a software-as-a-service (SaaS) configuration, but in some cases, some or all of the different nodes 17 may have a local instance operating in a distributed fashion.”
POTHULA, para. 0037: “In some embodiments, the datasets 14 may go through an ingestion process in the nodes 17, which in some cases, may be different processes in different nodes (e.g., data may be binned differently, like on a weekly or monthly basis, and data may be updated differently), where some of the following functions may take place: data and schema drift may be controlled, file headers may be checked, version numbers may be added to incoming files, data may be routed into clean/error queues, data files may be archived in their raw format, error records may be cleaned, column types may be changed from string to specific data types, incremental data may be processed, data normalization may be done through primary surrogate keys added, de-duplication, referential integrity may be checked, data quality may be checked (DQM) through value thresholds and value format, client specific column names may be formed, data may be encoded in dimensional star schema, column names may be changed from user specific to domain specific, extension tables as key value stores may be added for user specific attributes, data may be changed from dimensional star schema to denormalized flat table, and granularity of data may be adjusted for events, customer-product pairs, and customers.”
Examiner’s Note: the local instance of software on each node 17 (corresponding to a recited “client node”) includes software to perform data conversions such as “column types may be changed from string to specific data types” or “data may be changed from dimensional star schema to denormalized flat table” (corresponding to recited “data conversion model”), which each instance is installed at the local client node, and where the logic for assessing data relevance corresponds to the recited “pre-assessment tool” and the logic for performing data conversions corresponds to the recited “data conversion model for providing one or more conversion operations on client data associated with the client node”)
providing, to each client node, a base model to the client node for training by the client node using the client data; (POTHULA, para. 0025: “In some embodiments, the computing environment 10 may include multiple datasets (e.g. event records) 14 of multiple nodes 17 (e.g., computing systems with constraints on data sharing) of a federated machine learning model 19. The nodes 17 may have sub-models 15 that take as inputs or training data input from datasets 14 and form a federated machine learning model pipeline (or other topology, like a tree structure in an ensemble model that merges branches at node 17′) that operates in service of node 17′, which may use the outputs of the federated machine learning model 19 to determine to effect various actions via the action-channel servers 16. In some embodiments, the controller 12 may implement the processes described below to mitigate some of the challenges with federated learning in real time complex event processing, e.g., by coordinating among the sub-models 15 to configure and dynamically adjust the federated machine learning model 19.”;
POTHULA, para. 0040: “the sub-models 15 are trained on the respective datasets 14, e.g., in distinct optimizations, like stochastic gradient descent, simulated annealing, genetic optimizations, or the like, in distinct greedy optimizations relative to the other sub-models, e.g., at different times, on different data sets.”
Examiner’s Note: the sub-models 15 of each node 17 correspond to the recited “base model”, where each sub-model is trained on the local “respective dataset 14” (corresponding to recited “client data”), and where the controller 12 configures and adjusts the different components of the federated learning model, including providing the sub-models 15 to each node)
receiving, from the pre-assessment tool on each client node, statistics and abstract information describing the client data for the client node; (POTHULA, para. 0018: “To mitigate some of these challenges, and in some embodiments all of them and others, some embodiments determine, and dynamically re-assess during run time (or a single time during model design), the relevance of data (e.g., some or all) upon which inputs to the federated machine learning model are based (e.g., the inputs themselves, or parameters of engineered features that are inputs to the model) to the performance (e.g., accuracy, precision, or F2 score) federated machine learning model or to suitability of that model or sub-models (e.g., at different federated processes) to transfer learning. Relevance may be determined with a variety of measures, including various statistical tests like T-student, Chi square, Mahalanobis distance, Shapley values, local interpretable model-agnostic explanations (LIME), cross-entropy, and the like, to assess the relevance of inputs to the federated machine learning model to performance or transfer learning.”;
POTHULA, para. 0025: “In some embodiments, the controller 12 may implement the processes described below to mitigate some of the challenges with federated learning in real time complex event processing, e.g., by coordinating among the sub-models 15 to configure and dynamically adjust the federated machine learning model 19.”;
Examiner’s Note: the controller 12 receives the assessment of inputs to the federated machine learning model (corresponding to received “statistics and abstract information describing the client data for the client node”)
generating, for each client node, conversion logic ..., wherein the conversion logic includes instructions for converting values of the client data for one or more data fields to match a target schema for training the base model; (POTHULA, para. 0037: “In some embodiments, the datasets 14 may go through an ingestion process in the nodes 17, which in some cases, may be different processes in different nodes (e.g., data may be binned differently, like on a weekly or monthly basis, and data may be updated differently), where some of the following functions may take place: data and schema drift may be controlled, file headers may be checked, version numbers may be added to incoming files, data may be routed into clean/error queues, data files may be archived in their raw format, error records may be cleaned, column types may be changed from string to specific data types, incremental data may be processed, data normalization may be done through primary surrogate keys added, de-duplication, referential integrity may be checked, data quality may be checked (DQM) through value thresholds and value format, client specific column names may be formed, data may be encoded in dimensional star schema, column names may be changed from user specific to domain specific, extension tables as key value stores may be added for user specific attributes, data may be changed from dimensional star schema to denormalized flat table, and granularity of data may be adjusted for events, customer-product pairs, and customers.”
POTHULA, para. 0038: “In some embodiments, the sub-models 15 are configured to ingest data from a dataset 14, which in some cases may include tokens, telemetry data, risk data, third-party data, social data, and customer data. The datasets 14 may be input as a batch or a stream into an extract, transform, and load (ETL) pipeline, which may load the data in a standardized, normalized, validated, and cleansed format in data store.”;
Examiner’s Note: POTHULA discloses an ETL pipeline that standardizes, normalizes, cleans, and validates data (see instant specification, para. 0006, which similar describes an ETL model that converts client data into a common data model), and where for example “column names may be changed from user specific to domain specific” to convert column names specific to a particular client entity to the domain utilized by the federated machine-learning model 19)
providing, to each client node, the conversion logic to the data conversion model of the client node; and (POTHULA, para. 0025: “In some embodiments, the computing environment 10 may include multiple datasets (e.g. event records) 14 of multiple nodes 17 (e.g., computing systems with constraints on data sharing) of a federated machine learning model 19. The nodes 17 may have sub-models 15 that take as inputs or training data input from datasets 14 and form a federated machine learning model pipeline (or other topology, like a tree structure in an ensemble model that merges branches at node 17′) that operates in service of node 17′, which may use the outputs of the federated machine learning model 19 to determine to effect various actions via the action-channel servers 16. In some embodiments, the controller 12 may implement the processes described below to mitigate some of the challenges with federated learning in real time complex event processing, e.g., by coordinating among the sub-models 15 to configure and dynamically adjust the federated machine learning model 19.”;
POTHULA, para. 0040: “the sub-models 15 are trained on the respective datasets 14, e.g., in distinct optimizations, like stochastic gradient descent, simulated annealing, genetic optimizations, or the like, in distinct greedy optimizations relative to the other sub-models, e.g., at different times, on different data sets.”
Examiner’s Note: the controller 12 implements the conversion logic (of paras. 0037-0038 of POTHULA), and provides the software to the individual nodes 17 in order for such nodes to convert respective datasets 14 to a format that can be ingested by the local sub-model 15)
receiving, from each client node, trained parameters of the base model from the client node, (POTHULA, para. 0025: “The controller 12 may include a data atlas 18 and an artificial intelligence (AI) atlas 20. The data atlas 18 may be configured to ingest third-party machine learning models (or other forms of AI models), like sub-models that constitute a federated machine learning model. In some cases, each of those sub-models may specify a model architecture (like a network of perceptrons in a neural network, such as a directed graph indicating which perceptron's outputs feed into which perceptron's inputs, or other types of models discussed below), model parameters in a trained or untrained state (e.g., weights and biases of the perceptrons), and hyperparameters of the model. In some cases, sub-models may be obtained from a plurality or all (e.g., more than 2, 3, 20, or 50) different entities.”;
Examiner’s Note: Controller 12 ingests (corresponding to recited “receives”) the sub-models 15 from each node 17, where each sub-model includes the trained weights and biases (corresponding to recited “trained parameters of the base model”))
wherein the parameters of the base model are trained using converted values by executing the data conversion model to perform one or more conversion operations using the conversion logic. (POTHULA, para. 0037: “In some embodiments, the datasets 14 may go through an ingestion process in the nodes 17, which in some cases, may be different processes in different nodes (e.g., data may be binned differently, like on a weekly or monthly basis, and data may be updated differently), where some of the following functions may take place: data and schema drift may be controlled, file headers may be checked, version numbers may be added to incoming files, data may be routed into clean/error queues, data files may be archived in their raw format, error records may be cleaned, column types may be changed from string to specific data types, incremental data may be processed, data normalization may be done through primary surrogate keys added, de-duplication, referential integrity may be checked, data quality may be checked (DQM) through value thresholds and value format, client specific column names may be formed, data may be encoded in dimensional star schema, column names may be changed from user specific to domain specific, extension tables as key value stores may be added for user specific attributes, data may be changed from dimensional star schema to denormalized flat table, and granularity of data may be adjusted for events, customer-product pairs, and customers.”
POTHULA, para. 0038: “In some embodiments, the sub-models 15 are configured to ingest data from a dataset 14, which in some cases may include tokens, telemetry data, risk data, third-party data, social data, and customer data. The datasets 14 may be input as a batch or a stream into an extract, transform, and load (ETL) pipeline, which may load the data in a standardized, normalized, validated, and cleansed format in data store.”;
Examiner’s Note: POTHULA discloses an ETL pipeline that standardizes, normalizes, cleans, and validates data (see instant specification, para. 0006, which similar describes an ETL model that converts client data into a common data model), and where for example “column names may be changed from user specific to domain specific” to convert column names specific to a particular client entity to the domain utilized by the federated machine-learning model 19, such that this converted data is used to train the individual sub-models 15)
However, POTHULA fails to explicitly teach:
based on the received statistics and abstract information on the client data for the client node
However, in a related field of endeavor (managing large amounts of data, including data with privacy concerns and regulations such as medical data, see paras. 0003-0005), VIRKAR teaches:
based on the received statistics and abstract information on the client data for the client node (VIRKAR, para. 0007: “ The computer system may include a plurality of data converters executed by the processing resources, each data converter configured to convert the obtained data from a corresponding data adaptor in the respective data format to a common data format including metadata based on the obtained data. The computer system may include a data repository for storing data in the common data format.”
VIRKAR, para. 0090: “In an aspect, metadata describing data may be referred to as first metadata and data tags that describe metadata may be referred to as second metadata.”;
VIRKAR, para. 0102: “In block 1030, the method 1000 may include converting the obtained data from a corresponding data adaptor in the respective data format to a common data format including first metadata based on the obtained data. In an aspect, for example, the data converters 4 may convert the obtained data from a corresponding data adaptor in the respective data format to a common data format including the first metadata based on the obtained data.”;
Examiner’s Note: the examiner notes that the broadest reasonable interpretation of “received statistics and abstract information” includes received metadata as disclosed by para. 0036 of the instant specification; VIKKAR discloses converting data from one data format to a common data format based on obtained data, including first metadata that describes the stored data)
The POTHULA-VIRKAR combination makes obvious:
generating, for each client node, conversion logic based on the received statistics and abstract information on the client data for the client node, wherein the conversion logic includes instructions for converting values of the client data for one or more data fields to match a target schema for training the base model (Examiner’s Note: the data conversion teachings of POTHULA (paras. 0037-0038) now convert data, including metadata such as column types, to a common format based on such metadata, as taught by VIRKAR)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of POTHULA with VIRKAR as explained above. As disclosed by VIRKAR, one of ordinary skill would have been motivated to do so in order to “support medical research” and provide medical researchers with access to data across systems that are “not compatible with each other.” (paras. 0004-0005).
Regarding Claim 2
POTHULA and VIRKAR disclose the method of claim 1 as explained above. POTHULA further teaches and makes obvious:
wherein conversion logic generated for a first client node in the set of client nodes is different from conversion logic generated for a second client node. (POTHULA, para. 0037: “In some embodiments, the datasets 14 may go through an ingestion process in the nodes 17, which in some cases, may be different processes in different nodes (e.g., data may be binned differently, like on a weekly or monthly basis, and data may be updated differently), where some of the following functions may take place: data and schema drift may be controlled, file headers may be checked, version numbers may be added to incoming files, data may be routed into clean/error queues, data files may be archived in their raw format, error records may be cleaned, column types may be changed from string to specific data types, incremental data may be processed, data normalization may be done through primary surrogate keys added, de-duplication, referential integrity may be checked, data quality may be checked (DQM) through value thresholds and value format, client specific column names may be formed, data may be encoded in dimensional star schema, column names may be changed from user specific to domain specific, extension tables as key value stores may be added for user specific attributes, data may be changed from dimensional star schema to denormalized flat table, and granularity of data may be adjusted for events, customer-product pairs, and customers.”;
Examiner’s Note: POTHULA discloses that the data conversion process for each node 17 may be different, and therefore the conversion logic for a first node and a second node will be different if the underlying data in each node is different (e.g., if the data is binned differently))
Regarding Claim 3
POTHULA and VIRKAR disclose the method of claim 1 as explained above. POTHULA further teaches and makes obvious:
wherein the conversion logic includes logic for one or a combination of data field conversion, a conversion of measurements, a value bucketing conversion, and vocabulary conversion from a first vocabulary to a second vocabulary. (POTHULA, para. 0037: “In some embodiments, the datasets 14 may go through an ingestion process in the nodes 17, which in some cases, may be different processes in different nodes (e.g., data may be binned differently, like on a weekly or monthly basis, and data may be updated differently), where some of the following functions may take place: data and schema drift may be controlled, file headers may be checked, version numbers may be added to incoming files, data may be routed into clean/error queues, data files may be archived in their raw format, error records may be cleaned, column types may be changed from string to specific data types, incremental data may be processed, data normalization may be done through primary surrogate keys added, de-duplication, referential integrity may be checked, data quality may be checked (DQM) through value thresholds and value format, client specific column names may be formed, data may be encoded in dimensional star schema, column names may be changed from user specific to domain specific, extension tables as key value stores may be added for user specific attributes, data may be changed from dimensional star schema to denormalized flat table, and granularity of data may be adjusted for events, customer-product pairs, and customers.”;
Examiner’s Note: POTHULA discloses that “column names may be changed from user specific to domain specific” which corresponds to the recited “data field conversion” option, where the broadest reasonable interpretation of this limitation only requires one of the alternative options in this limitation to be present)
Regarding Claim 4
POTHULA and VIRKAR disclose the method of claim 1 as explained above. POTHULA further teaches and makes obvious:
the statistics and abstract information received from a client node including one or a combination of metadata on data fields of the client data or statistics on values of the client data, without exposing actual values of the client data. (POTHULA, para. 0018: “To mitigate some of these challenges, and in some embodiments all of them and others, some embodiments determine, and dynamically re-assess during run time (or a single time during model design), the relevance of data (e.g., some or all) upon which inputs to the federated machine learning model are based (e.g., the inputs themselves, or parameters of engineered features that are inputs to the model) to the performance (e.g., accuracy, precision, or F2 score) federated machine learning model or to suitability of that model or sub-models (e.g., at different federated processes) to transfer learning. Relevance may be determined with a variety of measures, including various statistical tests like T-student, Chi square, Mahalanobis distance, Shapley values, local interpretable model-agnostic explanations (LIME), cross-entropy, and the like, to assess the relevance of inputs to the federated machine learning model to performance or transfer learning.”;
POTHULA, para. 0021: “In some incarnations, federated learning is a distributed machine learning framework that allows a collective model to be constructed from data that is distributed across (and in some cases, not shared between) data owners. In some cases, data may be scattered across different organizations and may not be easily integrated under many legal and practical constraints. Federated transfer learning (FTL) may be used to improve statistical models under a data federation that allow knowledge to be shared without compromising user privacy and facilitate complementary knowledge to be transferred in the network. As a result, a target-domain party (e.g., one without access to the data of some or all of the participants in the federated machine learning model) may build more flexible and powerful models by leveraging rich labels from a source-domain party.”;
POTHULA, para. 0027: “In some embodiments, the entities may pre-process and filter their information in a data federation to allow knowledge to be shared with other entities without compromising user privacy and enable complementary knowledge to be transferred in the network. In some cases, the datasets 14 have the properties discussed above that make some other forms of federated learning difficult, e.g., heterogenous update frequencies, schemas, granularity, ETL processes, etc., and the datasets 14 in some cases are not shared between the nodes 17 (e.g., in part or in whole).”;
Examiner’s Note: POTHULA discloses assessing the relevance of data, including statistical tests of data (corresponding to recited “statistics on values of the client data”) without actually sharing the client data to avoid comprising user privacy)
Regarding Claim 7
POTHULA and VIRKAR disclose the method of claim 1 as explained above. POTHULA further teaches and makes obvious:
further comprising combining the trained parameters of the base models from the set of client nodes to form a trained master model. (POTHULA, para. 0020: “ Some embodiments include a federated learning platform that can be used by multiple entities that do not wish to share raw data (e.g., different businesses or other enterprises) to collaborate mutually and build robust models. Data sharing may be achieved either by building a meta-model from the sub-models each party (e.g. an enterprise) builds so that only model parameters are transferred or by using encryption techniques to allow safe communications in between different parties.”;
Examiner’s Note: the combined “meta-model” built from the sub-models corresponds to the trained “trained master model”)
Regarding Claim 8
POTHULA teaches:
A non-transitory computer readable medium comprising stored instructions, the stored instructions when executed by at least one processor of one or more computing devices, cause the one or more computing devices to: (POTHULA, para. 0007: “Some aspects include a tangible, non-transitory, machine-readable medium storing instructions that, when executed by a data processing apparatus, cause the data processing apparatus to perform operations including the above-mentioned process.”)
The remaining limitations of claim 8 correspond to the method of claim 1 and therefore claim 8 is rejected for the same reasons explained above with respect to claim 1.
Claim 9 depends from claim 8 and claims a non-transitory computer readable medium that corresponds to the method of claim 2, and is therefore rejected for the same reasons explained with respect to claims 2 and 8.
Claim 10 depends from claim 8 and claims a non-transitory computer readable medium that corresponds to the method of claim 3, and is therefore rejected for the same reasons explained with respect to claims 3 and 8.
Claim 11 depends from claim 8 and claims a non-transitory computer readable medium that corresponds to the method of claim 4, and is therefore rejected for the same reasons explained with respect to claims 4 and 8.
Claim 14 depends from claim 8 and claims a non-transitory computer readable medium that corresponds to the method of claim 7, and is therefore rejected for the same reasons explained with respect to claims 7 and 8.
Regarding Claim 15
POTHULA teaches:
A computer system comprising: one or more computer processors; and one or more computer readable mediums storing instructions that, when executed by the one or more computer processors, cause the computer system to: (POTHULA, para. 0007: “Some aspects include a tangible, non-transitory, machine-readable medium storing instructions that, when executed by a data processing apparatus, cause the data processing apparatus to perform operations including the above-mentioned process.”)
The remaining limitations of claim 15 correspond to the method of claim 1 and therefore claim 15 is rejected for the same reasons explained above with respect to claim 1.
Claim 16 depends from claim 15 and claims a system that corresponds to the method of claim 2, and is therefore rejected for the same reasons explained with respect to claims 2 and 15.
Claim 17 depends from claim 15 and claims a system that corresponds to the method of claim 3, and is therefore rejected for the same reasons explained with respect to claims 3 and 15.
Claim 18 depends from claim 15 and claims a system that corresponds to the method of claim 4, and is therefore rejected for the same reasons explained with respect to claims 4 and 15.
Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over POTHULA in view of VIRKAR and further in view of US 20200151030 A1, hereinafter referenced as SCHROEDER.
Regarding Claim 5
POTHULA and VIRKAR disclose the method of claim 1 as explained above. However, POTHULA and VIRKAR fail to explicitly teach:
the statistics and abstract information for a client node including local vocabulary used to encode one or more values in the client data, and the method further comprising:
generating a mapping from the local vocabulary to a standardized vocabulary; and
providing the mapping to the data conversion module.
However, in a related field of endeavor (data and device integration, see para. 0001), SCHROEDER teaches and makes obvious:
the statistics and abstract information for a client node including local vocabulary used to encode one or more values in the client data, and the method further comprising: (SCHROEDER, para. 0003: “In one example method, application-specific vocabularies for each of a plurality of applications are identified at design time, wherein each of the applications is associated with a corresponding vocabulary.”;
Examiner’s Note: the POTHULA-VIRKAR-SCHROEDER combination now modifies the system of POTHULA such that application-specific vocabularies of SCHROEDER are used for the dataset at each client node 17 of POTHULA, such that each application-specific vocabulary is used to organize and understand the data in such dataset)
generating a mapping from the local vocabulary to a standardized vocabulary; and (SCHROEDER, para. 0006: “In some instances, computing the implicit mappings between the first application and the second application includes identifying the first application and the second application to be integrated, identifying the application-specific vocabulary and predefined mapping to the common vocabulary for each of the first application and the second application, and computing the implicit mapping between the vocabularies of the first and second application based on the identified mappings between the identified application-specific vocabulary and the common vocabulary.”;
Examiner’s Note: the POTHULA-VIRKAR-SCHROEDER combination now modifies the system of POTHULA such that application-specific vocabularies of SCHROEDER are mapped to a common vocabulary as in SCHROEDER)
providing the mapping to the data conversion module. (SCHROEDER, para. 0006: “In some instances, computing the implicit mappings between the first application and the second application includes identifying the first application and the second application to be integrated, identifying the application-specific vocabulary and predefined mapping to the common vocabulary for each of the first application and the second application, and computing the implicit mapping between the vocabularies of the first and second application based on the identified mappings between the identified application-specific vocabulary and the common vocabulary.”;
Examiner’s Note: the POTHULA-VIRKAR-SCHROEDER combination now modifies the system of POTHULA such that application-specific vocabularies of SCHROEDER are mapped to a common vocabulary as in SCHROEDER, and then the logic for performing data conversion of POTHULA (see paras. 0037-0038) utilizes such data mapping when converting data to a different format)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of POTHULA with VIRKAR and SCHROEDER as explained above. As disclosed by SCHROEDER, one of ordinary skill would have been motivated to do so because in “order to allow for communication between different systems, a specific 1:1 mapping of terms and syntax must be defined to allow the disparate systems communicate throughout a process.” (para. 0002).
Claim 12 depends from claim 8 and claims a non-transitory computer readable medium that corresponds to the method of claim 5, and is therefore rejected for the same reasons explained with respect to claims 5 and 8.
Claim 19 depends from claim 15 and claims a system that corresponds to the method of claim 5, and is therefore rejected for the same reasons explained with respect to claims 5 and 15.
Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over POTHULA in view of VIRKAR and further in view of US 20200034351 A1, hereinafter referenced as MATSUGATANI.
Regarding Claim 6
POTHULA and VIRKAR disclose the method of claim 1 as explained above. However, POTHULA fails to explicitly teach:
receiving, from a client node, an indication that the statistics and abstract information for the client data has been updated;
generating updated conversion logic reflecting the changes to the statistics and abstract information; and
providing the updated conversion logic to the data conversion module of the client node.
However, in a related field of endeavor (managing large amounts of data, including data with privacy concerns and regulations such as medical data, see paras. 0003-0005), VIRKAR teaches and makes obvious:
generating updated conversion logic ... (VIRKAR, para. 0034: “As new external data providers 1 utilize the research management system 100, a simple modification to configure a new data adapter 2 and data converter 4 may be implemented. Once the data adaptor 2 and data converter 4 are changed or updated ... The modification of the data adaptor 2 and/or data converter 4 can happen on the fly and with little or no changes to the underlying processes, procedures or data storage services.”)
providing the updated conversion logic to the data conversion module of the client node. (VIRKAR, para. 0033: “The data converters 4 may be lightweight interfaces that can take a specific data format (e.g., ODM, HL7, CSV or TXT) and convert each transaction record received from the data providers 1 into a standard JSON document. Since external data providers 1 may provide data in different categories of data, the data converters 4 may use JSON as a standard data format.”;
Examiner’s Note: VIRKAR discloses that the data converters 4 are lightweight interfaces; the POTHULA-VIRKAR combination now has the controller 12 of POTHULA implement the conversion logic (of paras. 0037-0038 of POTHULA), and provides the software (including the data converter 4 of VIRKAR) to the individual nodes 17 in order for such nodes to convert respective datasets 14 to a format that can be ingested by the local sub-model 15)
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of POTHULA with VIRKAR as explained above. As disclosed by VIRKAR, one of ordinary skill would have been motivated to do so in order to “support medical research” and provide medical researchers with access to data across systems that are “not compatible with each other.” (paras. 0004-0005).
However, POTHULA and VIRKAR fail to explicitly teach:
receiving, from a client node, an indication that the statistics and abstract information for the client data has been updated;
... reflecting the changes to the statistics and abstract information; and
However, in a related field of endeavor (collecting and consolidating data, see para. 0008), MATSUGATANI teaches:
receiving, from a client node, an indication that the statistics and abstract information for the client data has been updated; (MATSUGATANI, para. 0013: “In some embodiments, the edge computing module is configured to receive (i) a verification notice from the cloud-based server indicating whether the changed primitive and corresponding raw data has been verified with a plurality of other detected changed primitives and raw data provided to the cloud-based server from one or more sources other than the node, and (ii) the updated data from the cloud-based server and associated with the changed primitive.”;
Examiner’s Note: the POTHULA-VIRKAR-MATSUGATANI combination now has the controller 12 of POTHULA receive a notification that data has changed (as taught by MATSUGATANI), where such changed data relates to the data relevance statistics of POTHULA)
generating updated conversion logic reflecting the changes to the statistics and abstract information; and (MATSUGATANI, para. 0013: “In some embodiments, the edge computing module is configured to receive (i) a verification notice from the cloud-based server indicating whether the changed primitive and corresponding raw data has been verified with a plurality of other detected changed primitives and raw data provided to the cloud-based server from one or more sources other than the node, and (ii) the updated data from the cloud-based server and associated with the changed primitive.”;
Examiner’s Note: the POTHULA-VIRKAR-MATSUGATANI combination now has the controller 12 of POTHULA receive a notification that data has changed (as taught by MATSUGATANI), where such changed data relates to the data relevance statistics of POTHULA, and based on such changed data, updates the data converter 4 of VIRKAR which is utilized by POTHULA to perform data conversion (as discussed in paras. 0037-0038))
Before the effective filing date of the present application, it would have been obvious to one of ordinary skill in the art to combine the teachings of POTHULA with VIRKAR and MATSUGATANI as explained above. As disclosed by MATSUGATANI, one of ordinary skill would have been motivated to do so in order to know whether to “update the master database based on the changed” information. (para. 0016). One of ordinary skill would further be motivated to update software to ensure that up-to-date metadata (such as column headers) are used in the data conversion process.
Claim 13 depends from claim 8 and claims a non-transitory computer readable medium that corresponds to the method of claim 6, and is therefore rejected for the same reasons explained with respect to claims 6 and 8.
Claim 20 depends from claim 15 and claims a system that corresponds to the method of claim 6, and is therefore rejected for the same reasons explained with respect to claims 6 and 15.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20230021563 A1 (Narayanam). “FIG. 1 is a diagram illustrating system architecture, according to an embodiment of the invention. By way of illustration, FIG. 1 depicts data standardization operation input 102, which can include, for example, identification of a particular data standardization operation selected by at least one data scientist and/or other user. Such input 102 is provided to server 106. Also depicted in FIG. 1 are client devices 104-1, 104-2, 104-3, . . . 104-N, collectively referred to herein as client devices 104. Metadata are shared (without violating one or more privacy constraints) by client devices 104 to server 106, and server 106, using federated learning infrastructure, executes the data standardization operation from input 102 using at least a portion of the metadata provided by client devices 104. Subsequently, based at least in part on the execution of the data standardization operation, server 106 outputs standardized versions of data, residing in each client device 104, to the given client devices 104, while also preserving data privacy in accordance with one or more privacy constraints. Regarding the arrow from the server 106 that indicates “new global model,” this refers to the model learned at the server by making use of the inputs and/or information shared by the client devices 104. Because clients cannot share the data directly, the clients can only share some higher-order statistics (e.g., such as embedding vectors). The server 106 makes use of these informative embedding vectors to learn a model at the server-side to derive and/or generate a global model that has collective information from the client devices 104. Additionally, the different shapes illustrated in FIG. 1 in connection with the lines from the client devices 104 to the server 106 merely indicate that the information shared by each client device 104 to the sever 106 (for the data standardization problem) is different.” (para. 0024).
US 20230177113 A1 (Witherspoon). “The present application generally relates to information technology and, more particularly, to data processing. For example, consider the setting of federated learning (FL), wherein different clients possess their own private data which cannot be shared with others (e.g., due to privacy constraints). In this setting, a need may arise to carry out a machine learning-based task or a business intelligence task which can require consistent class labels across nodes such that the model can train effectively by taking diverse examples from the different nodes belonging to the same class. As used herein, a node refers to a participating client in a federated learning setting, while a class refers to a set of labels in supervised machine learning.” (para. 0001).
US 20230229640 A1 (Peng). “Embodiments of the present disclosure are directed to a collaborative data schema management system for federated learning, referred to herein as “federated data manager” (FDM). Among other things, FDM enables the members of a federated learning alliance (e.g., organizations, data scientists, etc.) to (1) propose data schemas for use by the alliance, (2) identify and bind local datasets to proposed schemas, (3) create, based on the proposed schemas, training datasets for addressing various ML tasks, and (4) control, for each training dataset, which of the local datasets bound to that training dataset (and thus, which alliance members) will actually participate in the training of a particular ML model. Significantly, FDM enables these features while ensuring that the contents (i.e., data samples) of the members' local datasets remain hidden from each other, thereby preserving the privacy of that data.” (para. 0013).
US 20210326698 A1 (Bukharev). “By contrast, data from data source(s) of one domain may not be in the same form as data from data source(s) of another domain. One hospital or hospital system may have data that is heterogeneous relative to data from another hospital or hospital system. As noted in the background, training and applying artificial intelligence and/or machine learning models across different domains—and therefore using heterogeneous data—can be challenging. Techniques described here may facilitate training of model(s) in model-to-data (e.g., federated learning) environments by training both global model weights and, for each domain in the model-to-data environment, local model weights. Consequently, each domain may be equipped with what can be referred to as an “adaptor”—e.g., a local set of machine learning model weights or an entire local model—that transforms or otherwise converts data in a form that is specific to that domain to a form that is “global,” “normalized,” or more generally, domain-independent across the entire model-to-data environment.” (para. 0006).
US 20220405651 A1 (Knuesel). “For federated learning, the data at the multiple clients should ideally represent a similar relationship between input and output variables. Under this condition, having more federation members is like having more data. Having more training data usually improves model performance until convergence. However, this ideal is often difficult to realize in practice. Although clients typically have somewhat similar data, e.g., because being face with similar prediction needs, the local data is dissimilar having different formats and/or encodings. One way to address this problem is to apply a mapping rule before applying a model, to map local data to a form that is better suited for federated learning. Various aspects of such mapping rules are illustrated herein.” (para. 0130).
US 20230025754 A1 (Hassanzadeh). “The present disclosure relates generally to training of machine learning (ML) models using encrypted data that preserves privacy in an untrusted environment. The techniques described herein may leverage third party cloud services to support training of a diagnosis prediction ML model by multiple medical institutions using distributed executable file packages.” (para. 0001).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL C LEE whose telephone number is (571)272-4933. The examiner can normally be reached M-F 12:00 pm - 8:00 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached at 571-272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL C. LEE/Examiner, Art Unit 2128