DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
2. The Amendment filed on October 1st 2025 has been entered. Claims 1, 6 and 20 have been amended and claims 3, 9, 10, 18 and 19 have been cancelled, new claims 21 – 25 have been added. Claims 1, 2, 4 – 8, 11 – 17 and 20 - 25 are currently pending.
Response to Arguments
35 U.S.C. §103
3. Applicant's arguments, see Remarks pp. 1 - 3, filed October 1st 2025, with
respect to the rejections of claims 1, 2, 4, 5 and 20 under 35 U.S.C. §103 have been fully
considered but they are not persuasive.
The gravamen of applicant’s arguments is that the amendments to the independent claims that recite data classifications of columns of the datasets and a further determination of similarity of the columns are not taught by the cited references singularly or in combination
Examiner respectfully disagrees and submits the Bui references teaches similarity scores and percentages of categorized fields of datasets in the Abstract and subsequent Figs. 4 - 7 and subsequent paragraphs [0013] – [0014], [0029], [0051] – [0053], [0058], [0062], [0065] and [0068]. Subsequent dependent claims inherit such defects.
Claim Rejections – 35 U.S.C. §103
4. The following is a quotation of 35 U.S.C. 103 which forms the basis for all
obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
5. The factual inquiries set forth in Graham v John Deere Co., 383 U.S. 1, 148 USPQ
459 (1966), that are applied for establishing a background for determining obviousness
under 35 U.S.C. 103 are summarized as follows:
a. Determining the scope and contents of the prior art
b. Ascertaining the differences between the prior art and the claims at issue
c. Resolving the level of ordinary skill in the pertinent art
d. Considering objective evidence present in the application indicating
obviousness or nonobviousness
Claims 1, 2, 4, 5, 20, 21, 22, 23 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Achin et al. (United States Patent Publication Number 20180060738) hereinafter Achin, in view of Bui et al., (United States Patent Publication Number 20210365344) hereinafter Bui and in further view of Darji et al., (United States Patent Publication Number 20240427645) hereinafter Darji
Regarding claim 1 Achin teaches a computing system (distributed computing system [0042]) for dataset consolidation, comprising: at least one processor; (any suitable processor or collection of processors, [0339]) a communication interface (user interface [0221]) communicatively coupled to the at least one processor; (any suitable processor or collection of processors, [0339]) and a memory device (sufficient memory [0222[]) storing executable code that, (executable Javascript code, [0228]) when executed, (executed [0339]) causes the at least one processor to: (any suitable processor or collection of processors, [0339]) facilitate saving data storage through data storage management of one or more data storage locations (saves observations and predictions in a local system or back to an instance of the data services layer. [0182]); (saves it to file storage 830 [0255]) by training, (training [0105]) via an iterative training (recursively [0069]) and testing loop,( because it applies machine learning recursively to predict which techniques are most likely to succeed for the prediction problem at issue. [0087]) a predictive model (predictive model [0073]) using training data (training datasets [0102]) to detect data redundancies (the system 100 may prune
"less important" features [0318]) from two or more datasets (any two of wide datasets, tall datasets, sparse datasets, dense datasets, datasets that do or do not include text, datasets that include variables of various data types (e.g., numerical, ordinal, categorical, interpreted (e.g., date, time, text), etc.), datasets that include variables with various statistical properties ( e.g., statistical properties relating to the variable's missing values, cardinality, distribution, etc.), etc. [0050]) stored to the one or more data storage locations, (file storage 830 [0255]) the training(training [0105]) including testing the predictive model (testing a predictive model includes cross-validating the model using different folds of training datasets associated with the prediction problem. [0102]) by predicting a target variable (the prediction problem includes a categorical target variable [0083]) and iteratively adjusting weights and calculations during each subsequent iteration (the number of modeling procedures executed in an iteration of steps 330 and 340 may tend to decrease as the number of iterations increases, and the amount of data used for training and/or testing the generated models may tend to increase as the number of iterations increases. Thus, the earlier iterations may "cast a wide net" by executing a relatively large number of modeling procedures on relatively small datasets, and the later iterations may perform more rigorous testing of the most promising modeling procedures identified during the earlier iterations. Alternatively or in addition, the earlier iterations may implement a more coarse-grained evaluation of the search space, and the later iterations may implement more fine-grained evaluations of the portions of the search space determined to be most promising. [0109]) in order to improve predictability (modify one or more of the modeling techniques to improve accuracy (e.g., by returning to step 430), alter the dataset (e.g., by returning to step 402), etc [0149]) of the target variable, (target variable [0083]) wherein the predictive model (predictive model [0102]) is trained to (trained to [0087]) identify data similarities among the two or more datasets (wide datasets, tall datasets, sparse datasets, dense datasets, datasets that do or do not include text, datasets that include variables of various data types (e.g., numerical, ordinal, categorical, interpreted (e.g., date, time, text), etc.), datasets that include variables with various statistical properties ( e.g., statistical properties relating to the variable's missing values, cardinality, distribution, etc.), etc. [0050]) stored to the one or more data storage locations; (file storage 830 [0255]) deploy, (deploy [0153]) based on any error (an error in the preparation of the dataset [0098]) in predicting (predicting the target [0123]) the target variable (target variable [0083]) being less than a predetermined level, (if the predictive value of the feature is less than a threshold value, if the feature has one of the M lowest predictive values among the features in the dataset, if the feature does not have one of the N highest predictive values among the features in the dataset, etc. [0318]) the predictive model; (Fig. 3 predictive model [0034]) apply the deployed predictive model (deployed predictive models) [0045]) to at least two datasets (any two of wide datasets, tall datasets, sparse datasets, dense datasets, datasets that do or do not include text, datasets that include variables of various data types (e.g., numerical, ordinal, categorical, interpreted (e.g., date, time, text), etc.), datasets that include variables with various statistical properties ( e.g., statistical properties relating to the variable's missing values, cardinality, distribution, etc.), etc. [0050]) to quantify a percentage of similarity (the library 130 of modeling techniques includes tools for assessing the similarities ( or differences) between predictive modeling techniques. Such tools may express the similarity between two predictive modeling techniques as a score (e.g., on a predetermined scale), a classification (e.g., "highly similar", "somewhat similar", "somewhat dissimilar", "highly dissimilar"), a binary determination (e.g., "similar" or "not similar"), etc. Such tools may determine the similarity between two predictive modeling techniques based on the processing steps that are common to the modeling techniques, based on the data indicative of the results of applying the two predictive modeling techniques to the same or similar prediction problems, etc. [0055])among the at least two datasets; (any two of wide datasets, tall datasets, sparse datasets, dense datasets, datasets that do or do not include text, datasets that include variables of various data types (e.g., numerical, ordinal, categorical, interpreted (e.g., date, time, text), etc.), datasets that include variables with various statistical properties ( e.g., statistical properties relating to the variable's missing values, cardinality, distribution, etc.), etc. [0050]) determine, based on the applying, that the percentage of similarity (a large number ( or high percentage) of characteristics in common [0061]) surpasses a predefined threshold percentage; (exceeding a threshold similarity value with respect to the modeling procedure at issue [0085])
Achin does not fully disclose apply data classification to two or more datasets, wherein the data classification comprises categorizing each structured column of a plurality of structured columns of two or more datasets into a particular category; derive semantic logic from the at least two datasets to interpret importance of retaining the at least two datasets; transmit, to a user device and based on the percentage of similarity surpassing the predefined threshold percentage, one or more electronic notifications that indicate the percentage of similarity among the at least two datasets, wherein the percentage of similarity increases based upon the particular category of at least one of the structured columns being of a same category; and the interpreted importance of retaining the at least two datasets; and receive an indication from the user device to consolidate the at least two datasets; initiate display, via a graphical user interface of the user device, a user interface (UI) dashboard that includes (a) the at least two datasets, (b) the one or more electronic notifications, and (c) an indication that one dataset of the at least two datasets likely includes sensitive data requiring security measures to protect the sensitive data; receive, from the user device and in response to selection of one or more control inputs, an indication to consolidate the at least two datasets; and consolidate, in response to receiving the indication, the at least two datasets by deleting a dataset of the at least two datasets from the one or more storage locations, the consolidating including applying a security measure to the sensitive data, the security measure including data masking.
Bui teaches apply (implement [0064]) data classification (semantic encoder 310, [0036]) such as “data classification” to two or more datasets, (new dataset, existing dataset [0013]; a first dataset, Dataset A, a second dataset, Dataset B [0014]); wherein the data classification (any one of classifications from a schema encoder 306, numerical distribution encoder 308, semantic encoder 310, value-format encoder 311, and update pattern encoder 312 [0036]) such as “data classification” comprises categorizing (an encoder module that is operable to generate "semantic encode values," which the data monitoring system 102 may use to compare the semantic content of string-type data in a particular field of new dataset 110 to the semantic content of string-type data in one or more fields of previously analyzed datasets. [0018]) such as “categorizing” each structured column (each field [0044]) of a plurality of structured columns (in one or more fields [0041]) of two or more datasets(new dataset, existing dataset [0013]; a first dataset, Dataset A, a second dataset, Dataset B [0014]); into a particular category; (semantic content category [0039]) derive semantic logic from the at least two datasets (Additionally, in some embodiments, the data monitoring system 102 includes an encoder module that is operable to generate "semantic encode values," which the data monitoring system 102 may use to compare the semantic content of string-type data in a particular field of new dataset 110 to the semantic content of string-type data in one or more fields of previously analyzed datasets. [0019]) to interpret importance of retaining the at least two datasets; (the data encoder module 104 may generate various encode values 112 (such as encode values EAlEAS, in the example above) and store these encode values 112 in encode value data store 106 for subsequent use in determining whether a new dataset 110 matches Dataset A. [0035]) and (it may be desirable to validate the manner in which the data in dataset 110 is being updated with reference to previous versions of the same dataset [0042]) transmit, to a user device (the data monitoring dashboard module 204 is operable to send notifications (e.g., via email, SMS messaging, etc.) to various users associated with the data monitoring system 102 or the new dataset 110, according to some embodiments [0027]) and based on the percentage of similarity surpassing the predefined threshold percentage, (match determination module 404 may determine whether two datasets "match" by determining whether the similarity score(s) for those two datasets satisfy a "similarity criterion," which may vary according to different embodiments. For example, when comparing a given pair of encode values, match determination module 404 may determine whether the similarity score exceeds a particular threshold value. [0052]) one or more electronic notifications (send notifications (e.g., via email, SMS messaging, etc.) to various users associated with the data monitoring system 102 or the new dataset 110, according to some embodiments [0027]) that indicate the percentage of similarity among the at least two datasets (For example, for each of the comparisons, comparator 402 may generate an output value (also referred to as a "similarity score") that indicates a similarity between the two encode values being compared. The format of the similarity scores may vary depending on the particular comparison algorithm(s) used by comparator 402. In the depicted embodiment, assume that comparator 402 generates similarity scores in the range of 0.0-1.0, with a higher value indicating a higher degree of similarity between the two encode values (e.g., such that an exact match would be denoted by a similarity score of 1.0). [0051]) and the interpreted importance of retaining the at least two datasets; (the data encoder module 104 may generate various encode values 112 (such as encode values EAlEAS, in the example above) and store these encode values 112 in encode value data store 106 for subsequent use in determining whether a new dataset 110 matches Dataset A. [0035]) and (it may be desirable to validate the manner in which the data in dataset 110 is being updated with reference to previous versions of the same dataset [0042]) wherein the percentage of similarity (degree of similarity [0051]) increases based upon (satisfy a similarity criterion [0053]) the particular category (semantic content category [0039]) of at least one of the structured columns (in one or more fields [0041]) being of a same (matches [0053]) category; (semantic content category [0039])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Achin to incorporate the teachings of Bui wherein apply data classification to two or more datasets, wherein the data classification comprises categorizing each structured column of a plurality of structured columns of two or more datasets into a particular category; derive semantic logic from the at least two datasets to interpret importance of retaining the at least two datasets; wherein the percentage of similarity increases based upon the particular category of at least one of the structured columns being of a same category; transmit, to a user device and based on the percentage of similarity surpassing the predefined threshold percentage, one or more electronic notifications that indicate the percentage of similarity among the at least two datasets and the interpreted importance of retaining the at least two datasets. By doing so in addition to various multi-dimensional
vector comparison techniques, various other suitable techniques may be used to contribute to the semantic classification of the data values in a dataset. For example, in
some embodiments, topic modeling techniques (such as latent Dirichlet allocation) or nearest neighbor/most-similar search techniques may also be used to compare semantic encode values for new dataset 110 and existing datasets. Bui [0040].
Darji teaches initiate display,(to a display for presentation to a user [0207]) via a graphical user interface (graphical user interface [0207]) of the user device, (computing devices (also referred to as "client devices" herein [0025]) a user interface (UI) dashboard (a display (e.g., a display screen), [0207]) such as “user interface (UI) dashboard” that includes (a) the at least two datasets, (user’s datasets [0218]) (b) the one or more electronic notifications, (notification of electronic imbalance [0310]) and (c) an indication that one dataset of the at least two datasets likely includes sensitive data requiring security measures to protect the sensitive data; (Consider an example in which the storage service includes a service that, when selected and applied, causes personally identifiable information ('PII') contained in a dataset to be obfuscated when the dataset is accessed. In such an example, the storage systems 374a, 374b, 374c, 374d, 374n may be configured to obfuscate PII when servicing read requests directed to the dataset. Alternatively, the storage systems 374a, 374b, 374c, 374d, 374n may service reads by returning data that includes the PII, but the edge management service 382 itself may obfuscate the PII as the data is passed through the edge management service 382 on its way from the storage systems 374a, 374b, 374c, 374d, 374n to the host devices 378a, 378b, 378c, 378d, 378n. [0214]) receive, (receive [0101]) from the user device (computing devices (also referred to as "client devices" herein [0025]) and in response to selection of one or more control inputs, (selecting 802 one of the at least one rebalancing proposals based on a rebalancing proposal selection policy [0327]) SEE ALSO selection of archiving policy [0228] an indication to consolidate (to allow the user to consolidate data [0259]) the at least two datasets; (user’s datasets [0218]) and consolidate, (consolidate [0259]) in response to receiving the indication, (indications that a workload imbalance is occurring or predicted to occur [0309]) the at least two datasets (user’s datasets [0218]) by deleting a dataset (a portion of a user's dataset that have been invalidated (e.g., the portion has been replaced with an updated portion, the portion has been deleted) are archived within 24 hours of the data being invalidated. [0228]) of the at least two datasets (user’s datasets [0218]) from the one or more storage locations, (underlying storage systems [0299]) the consolidating (consolidate [0259]) including applying a security measure to the sensitive data, the security measure including data masking (Modifying 518 data to be sent to a host device in response to a request to access the dataset may be carried out, for example, by the edge management service 382 masking or removing PII to adhere to a requirement that PII not be shared with accessors of the dataset, by encrypting/decrypting data to enforce a requirement that the dataset be accessed and stored using end-to-end encryption, and so on [0299])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Achin to incorporate the teachings of Darji wherein initiate display, via a graphical user interface of the user device, a user interface (UI) dashboard that includes (a) the at least two datasets, (b) the one or more electronic notifications, and (c) an indication that one dataset of the at least two datasets likely includes sensitive data requiring security measures to protect the sensitive data; receive, from the user device and in response to selection of one or more control inputs, an indication to consolidate the at least two datasets; and consolidate, in response to receiving the indication, the at least two datasets by deleting a dataset of the at least two datasets from the one or more storage locations, the consolidating including applying a security measure to the sensitive data, the security measure including data masking. By doing so personally identifiable information ('PII') contained in a dataset to be obfuscated when the dataset is accessed. Darji [0214]
Claim 20 corresponds to claim 1 and is rejected accordingly
Regarding claim 2 Achin in view of Bui and Darji teaches the computing system of claim 1,
Achin as modified further teaches wherein the applying the deployed predictive model (deployed predictive models) [0045]) to the at least two datasets (any two of wide datasets, tall datasets, sparse datasets, dense datasets, datasets that do or do not include text, datasets that include variables of various data types (e.g., numerical, ordinal, categorical, interpreted (e.g., date, time, text), etc.), datasets that include variables with various statistical properties ( e.g., statistical properties relating to the variable's missing values, cardinality, distribution, etc.), etc. [0050]) is based on receiving an indication indicating (Fig. 3 (330) transmit instructions [0095]) that the deployed predictive model(deployed predictive models) [0045]) is to be applied to the at least two datasets. (any two of wide datasets, tall datasets, sparse datasets, dense datasets, datasets that do or do not include text, datasets that include variables of various data types (e.g., numerical, ordinal, categorical, interpreted (e.g., date, time, text), etc.), datasets that include variables with various statistical properties ( e.g., statistical properties relating to the variable's missing values, cardinality, distribution, etc.), etc. [0050])
Regarding claim 4 Achin in view of Bui and Darji teaches the computing system of claim 1,
Achin as modified does not fully disclose wherein the consolidating the at least two datasets includes merging one dataset of the at least two datasets with another dataset of the at least two datasets.
Darji teaches wherein the consolidating (to allow the user to consolidate data [0259]) the at least two datasets (user’s datasets [0218]) includes merging one dataset (replicating a dataset through snapshot based replication from a replication source
such as a first storage system to a replication target such as a second storage system … based on (e.g., time, number of operations, an RPO setting), or in some other way. [0273]) of the at least two datasets (user’s datasets [0218]) with another dataset (entire dataset or a subset of the dataset [0273]) of the at least two datasets. (user’s datasets [0218]) SEE REPLICATON SCENERIO [0271] – [0272]
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Achin to incorporate the teachings of Darji wherein the consolidating the at least two datasets includes merging one dataset of the at least two datasets with another dataset of the at least two datasets. By doing so two or more of the storage systems may synchronously replicate a dataset between each other. In synchronous replication, distinct copies of a particular dataset. Darji [0271].
Regarding claim 5 Achin in view of Bui and Darji teaches the computing system of claim 1,
Achin as modified further discloses wherein the percentage of similarity(a large number ( or high percentage) of characteristics in common [0061]) is quantified based on interpreting meaning of words (interpreted (e.g., date, time, text), etc.), [0050]) included in the at least two datasets. (any two of wide datasets, tall datasets, sparse datasets, dense datasets, datasets that do or do not include text, datasets that include variables of various data types (e.g., numerical, ordinal, categorical, interpreted (e.g., date, time, text), etc.), datasets that include variables with various statistical properties ( e.g., statistical properties relating to the variable's missing values, cardinality, distribution, etc.), etc. [0050])
Regarding claim 21 Achin in view of Bui and Darji teaches the computing system of claim 1
Achin does not fully disclose wherein the executable code, when executed, further causes the at least one processor receive, from the user device, one or more inputs indicating that one or more datasets should be stored to the one or more data storage locations.
Darji teaches wherein the executable code, (program instructions [0`40]0 when executed, further causes the at least one processor (processor [0160] – [0161]) receive, (received [0028]) from the user device, (Computing devices (also referred to as "client devices" herein) may be embodied, for example, a server in a data center, a workstation, a personal computer, a notebook, or the like. [0025])one or more inputs (In fact, the determination as to which storage system should be identified 508 as being the storage system to store the dataset may be based on multiple criteria. [0290]) indicating that one or more datasets should be stored to the one or more data storage locations (In such an example, once one or more storage systems have been identified as being candidates for storing the dataset, each of the candidates may be evaluated to identify a best fit based on some criteria. For example, the storage system that can store the dataset at the lowest cost may be identified 508 as being the storage system to store the dataset. [0290])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Achin to incorporate the teachings of Darji wherein the executable code, when executed, further causes the at least one processor receive, from the user device, one or more inputs indicating that one or more datasets should be stored to the one or more data storage locations. By doing so after a finish operation, however, the zone may not be opened a written to further without first performing a zone reset operation. Darji [0051]
Regarding claim 22 Achin in view of Bui and Darji teaches the computing system of claim 1
Achin does not fully disclose wherein the semantic logic incorporates natural language processing.
Bui teaches wherein the semantic logic incorporates natural language processing (For example, in some embodiments, semantic encoder 310 is operable to use one or more natural language processing (NLP) language models [0039])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Achin to incorporate the teachings of Bui wherein the semantic logic incorporates natural language processing. By doing so vector word-embedding representations can be calculated of data values in a field of the datasets (e.g., existing datasets retrieved from the live data store 216, new dataset 110, etc.) that contains string-type data. Bui [0039]
Regarding claim 23 Achin in view of Bui and Darji teaches the computing system of claim 1
Achin does not fully disclose wherein a data cleaning process is performed prior to applying the data classification.
Bui teaches wherein a data cleaning process (cleaning empty data records, removing non-supported data type columns, removing quasi-numerical variables, etc. [0046]) such as “cleaning process” is performed prior to (prior to [0059] applying the data classification. (semantic encoder 310, [0036]) such as “data classification”
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Achin to incorporate the teachings of Bui wherein a data cleaning process is performed prior to applying the data classification. By doing so data processor 304 is operable to process the data in new dataset 110 and convert it into one or more formats for input into the various sub-modules of data encoder module 104. Bui [0046]
Regarding claim 24 Achin in view of Bui and Darji teaches the computing system of claim 23,
Achin does not fully disclose wherein the data masking comprises auditing the at least two datasets in accordance with predefined rules to correct errors that would render the data values incongruent with the data classification.
Darji teaches wherein the data masking (masking [0299]) comprises auditing the at least two datasets (datasets [0298]) in accordance with predefined rules ( managed in a way so as to adhere to some non-governmental guidance (e.g., to adhere to best practices for auditing purposes), the one or more data compliance services may be offered to a user to ensure that the user's datasets are managed in a way so as to adhere to a particular clients or organizations requirements, [0218]) to correct errors
that would render the data values incongruent with the data classification. (correcting any errors (if present) [0085]
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Achin to incorporate the teachings of Darji wherein the data masking comprises auditing the at least two datasets in accordance with predefined rules to correct errors that would render the data values incongruent with the data classification. By doing so the read data can be reassembled. Darji [0085]
Claims 6 – 8, 11, 12 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Jennifer Laeitia Prendki (United States Patent Publication Number 20220138561) hereinafter Prendki, in view of Achin et al. (United States Patent Publication Number 20180060738) hereinafter Achin, in view of Bui et al., (United States Patent Publication Number 20210365344) hereinafter Bui and in further view of Darji et al., (United States Patent Publication Number 20240427645) hereinafter Darji
Regarding claim 6 Prendki teaches a computing system facilitating data redundancy consolidation, (Fig. 10, computer system [0125]) the computing system (Fig. 10, computer system [0125]) comprising: at least one processor; (Fig. 10 processor [0127]) a communication interface communicatively coupled (Fig. 10, communication interface [0127]) to the at least one processor; (Fig. 10 processor [0127]) and a memory device (Fig. 10 memory [0127]) storing executable code that, when executed, (executable instructions [0128]) causes the at least one processor to: (Fig. 10 processor [0127]) display, via a graphical user interface (Fig 10 an input/output (I/O) interface 1008, [0127]) of a computing device, (computing device [0125]) a user interface (UI) dashboard depicting: (monitor [0131]) at least two separate datasets (Fig. 1B (150) receive input data set of training data comprising a plurality of records and
previously used to train a second machine learning model [0053]; (158) receive second dataset of prospective training data [0057]) determined, by a backend system, (a Deep Convolutional Neural Network [0042]) such as “backend system” to likely be redundant; (Since filters are built on historical data, data that has been seen as useful in the original training dataset will be predicted as useful well it might in fact be redundant [0049])
Prendki does not fully disclose one or more notifications that include a percentage of similarity among the at least two separate datasets that are stored to one or more storage locations of the backend system and were determined to likely be redundant at least in part by determining that a particular category of at least one structured column of a plurality of structured columns of two or more datasets being of a same category ; interpreted importance of retaining the at least two datasets an indication that one dataset of the at least two separate datasets likely includes sensitive data requiring security measures to protect the sensitive data; and one or more control inputs the selection of which initiates consolidation of a dataset of the at least two separate datasets; receive, via the computing device, a user input selecting a control input of the one or more control inputs; and transmit to the one or more storage locations of the backend system, a control signal to consolidate the dataset of the at least two separate datasets by deleting the dataset of the at least two separate datasets from the one or more storage locations, the consolidating including applying a security measure to the sensitive data, the security measure including data masking .
Achin teaches a one or more notifications (offers notification [0234]) that include a percentage of similarity (the library 130 of modeling techniques includes tools for assessing the similarities ( or differences) between predictive modeling techniques. Such tools may express the similarity between two predictive modeling techniques as a score (e.g., on a predetermined scale), a classification (e.g., "highly similar", "somewhat similar", "somewhat dissimilar", "highly dissimilar"), a binary determination (e.g., "similar" or "not similar"), etc. Such tools may determine the similarity between two predictive modeling techniques based on the processing steps that are common to the modeling techniques, based on the data indicative of the results of applying the two predictive modeling techniques to the same or similar prediction problems, etc. [0055])among the at least two separate datasets (any two of wide datasets, tall datasets, sparse datasets, dense datasets, datasets that do or do not include text, datasets that include variables of various data types (e.g., numerical, ordinal, categorical, interpreted (e.g., date, time, text), etc.), datasets that include variables with various statistical properties ( e.g., statistical properties relating to the variable's missing values, cardinality, distribution, etc.), etc. [0050])stored to one or more storage locations of the backend system; (file storage 830 [0255]) and were determined to likely be redundant; (When a template T is invoked on a dataset sample S, the template checks the storage structure to determine whether the results of executing
that template on that dataset sample are already stored. If so, rather than reprocessing the dataset sample to obtain the same results, the template simply retrieves the corresponding results from the storage structure, returns those results, and terminates. [0116])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Achin one or more notifications that include a percentage of similarity among the at least two separate datasets that are stored to one or more storage locations of the backend system and were determined to likely be redundant. By doing so a tool that can measure feature importance for an arbitrary predictive model or for a diverse set of predictive models. Achin [0012].
Darji teaches interpreted importance of retaining the at least two datasets; (Another example of storage services that may be presented to a user, selected by a user, and ultimately applied to a dataset associated with the user can include one or more data archiving services (including data offloading services). Such data archiving services may be embodied, for example, as services that may be provided to consumers (i.e., a user) of the data archiving services to ensure that the user's datasets are archived in a certain way, such as according to a certain set of preferences, parameters, and the like. [0227]) an indication (some data (e.g., data that has personally identifiable information) in a dataset [0283]) that one dataset (source dataset [0195]) of the at least two separate datasets (source dataset; replicated dataset [0195]) likely includes sensitive data (personally identifiable information ('PII') contained in a dataset [0214]) requiring security measures to protect (to be obfuscated [0214]) the sensitive data; (personally identifiable information ('PII') contained in a dataset [0214]) and one or more control inputs (data portability services [0259]) the selection (selected by a user [0259]) of which initiates consolidation (consolidate [0259]) of a dataset (user’s datasets [0259]) of the at least two separate datasets; (source dataset; replicated dataset [0195]) receive, via the computing device, (computing devices (also referred to as "client devices" herein [0025]) a user input selecting a control input (user selecting may be provided to consumers (i.e., a user) of the data portability services to allow the user to perform various data movement, data conversion, or similar processes on the user's datasets. For example, one or more data portability services may be
offered to a user to allow the user to migrate their datasets from one storage resource to another storage resources, to allow the user to convert their dataset from one format ( e.g., block data) to another format (e.g., object data), to allow the user to consolidate data, to allow the user to transfer their datasets from one data controller (e.g., a first cloud-services vendor) to another data controller (e.g., a second cloud-services vendor), to allow the users to convert their dataset from being compliant with a first set of regulations to being compliant with a second set of regulations, and so on [0259]) of the one or more control inputs; (data portability services [0259]) and transmit ( migrate [0259]) to the one or more storage locations (storage locations [0075]) of the backend system, (backend storage systems, [0251]) a control signal (various data movement, data conversion, or similar processes on the user's datasets. [0259]) to consolidate (consolidate [0259]) the dataset of the at least two separate datasets (source dataset; replicated dataset [0195]) by deleting the dataset (a portion of a user's dataset that have been invalidated (e.g., the portion has been replaced with an updated portion, the portion has been deleted) are archived within 24 hours of the data being invalidated. [0228])of the at least two separate datasets (source dataset; replicated dataset [0195]) from the one or more storage locations, (storage locations [0075])the consolidating (consolidate [0259]) including applying a security measure(to be obfuscated [0214]) to the sensitive data, (personally identifiable information ('PII') contained in a dataset [0214]) the security measure(to be obfuscated [0214]) including data masking (Modifying 518 data to be sent to a host device in response to a request to access the dataset may be carried out, for example, by the edge management service 382 masking or removing PII to adhere to a requirement that PII not be shared with accessors of the dataset, by encrypting/decrypting data to enforce a requirement that the dataset be accessed and stored using end-to-end encryption, and so on [0299])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Moshyedi an indication that one dataset of the at least two separate datasets likely includes sensitive data requiring security measures to protect the sensitive data; and one or more control inputs the selection of which initiates consolidation of a dataset of the at least two separate datasets; receive, via the computing device, a user input selecting a control input of the one or more control inputs; and transmit to the one or more storage locations of the backend system, a control signal to consolidate the dataset of the at least two separate datasets by deleting the dataset of the at least two separate datasets from the one or more storage locations, the consolidating including applying a security measure to the sensitive data, the security measure including data masking . By doing so personally identifiable information ('PII') contained in a dataset to be obfuscated when the dataset is accessed. Darji [0214]
Bui teaches to likely be redundant at least in part(an exact duplicate [0014]) by determining (determining [0025]) that a particular category(semantic content category [0039]) of at least one structured column (each field [0044]) of a plurality of structured columns (in one or more fields [0041])of two or more datasets (new dataset, existing dataset [0013]; a first dataset, Dataset A, a second dataset, Dataset B [0014]) being of a same (matches [0053]) category(semantic content category [0039])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Bui wherein to likely be redundant at least in part by determining that a particular category of at least one structured column of a plurality of structured columns of two or more datasets being of a same category. By doing so this, in tum, saves both time for the users ( e.g., data scientists) and storage space that would otherwise be wasted storing redundant or overlapping
datasets, thereby improving the functioning of the system as a whole. Bui [0015]
Regarding claim 7 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki as modified further teaches wherein the two or more datasets (multiple different types of datasets [0093]) are determined to likely be redundant (to make the learning process much faster by injecting the most valuable data first, in order to faster reach the point where the information contained in the remaining of the data is redundant with the rest, or useless ( or even harmful to the model) [0091]) based on a prediction performed (prediction of each sample [0087]) by a predictive model (predicitive model [0096])
Regarding claim 8 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki as modified further teaches wherein the UI dashboard (monitor [0131]) such as “UI dashboard” further depicts one or more prompts indicating that the dataset (dataset [0089]) of the at least two separate datasets (multiple different types of datasets [0093]) is likely a subset (Filter refers to a classifier (in most cases, binary) that separates a first subset of data having high information value from a second subset of data having less or no information value. [0021]) of another dataset of the at least two separate datasets. (multiple different types of datasets [0093])
Regarding claim 11 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki as modified further teaches wherein the security measure comprises data tokenization (FIG. 2 illustrates another view of the flow of the proposed procedures described herein. Note that the trimming step, which consists of hashing the data in order to provide more security [0059])
Regarding claim 12 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki as modified does not fully disclose wherein the sensitive data comprises personally identifiable customer information.
Darji teaches wherein the sensitive data (sensitive data [0119])comprises personally identifiable customer information (personally identifiable information ('PII') contained in a dataset [0214])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Darji wherein the sensitive data comprises personally identifiable customer information. By doing so security concerns may be addressed. Darji [0119].
Regarding claim 25 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki does not fully disclose wherein the one or more electronic notifications further include a second indication on which particular dataset of the at least two datasets has been modified most recently.
Bui teaches wherein the one or more electronic notifications (notifications (e.g., via email, SMS messaging, etc .) [0027]) further include a second indication (the data monitoring dashboard module 204 is operable to send notifications (e.g., via email, SMS messaging, etc .) to various users associated with the data monitoring system 102 or the new dataset 110, according to some embodiments. [0027]) on which particular dataset (new dataset 110, [0027]) of the at least two datasets (various datasets [0026]) has been modified most recently (periodically updated. [0029])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Bui wherein the one or more electronic notifications further include a second indication on which particular dataset of the at least two datasets has been modified most recently. By doing so as the records in a
dataset are updated, the value distributions ( e.g., the numerical distribution, etc.) of the data in that dataset may also change such that the previously generated encode values 112 for that dataset no longer accurately correspond to the data
in that dataset. Bui [0029]
Claims 13 - 17 are rejected under 35 U.S.C. 103 as being unpatentable over Jennifer Laeitia Prendki (United States Patent Publication Number 20220138561) hereinafter Prendki, in view of Achin et al. (United States Patent Publication Number 20180060738) hereinafter Achin, in view of Bui et al., (United States Patent Publication Number 20210365344) hereinafter Bui, in view of Darji et al., (United States Patent Publication Number 20240427645) hereinafter Darji and in further view of Moshyedi et al. (United States Publication Number 20240233010), hereinafter referred to as Moshyedi.
Regarding claim 13 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki as modified does not fully disclose wherein the UI dashboard further depicts a detailed control input for accessing data details about the at least two separate datasets, wherein selection of the detailed control input facilitates displaying data content of each of the at least two separate datasets.
Moshyedi teaches wherein the UI dashboard (input/output device [0076]) further depicts a detailed control input for accessing data details (a display for displaying digital images, [0076]) about the at least two separate datasets, (provide visualizations of datasets and data models on the user device 402. [0077]) wherein selection of the detailed control input facilitates displaying data content(a display for displaying digital images, [0076]) of each of the at least two separate datasets. (provide visualizations of datasets and data models on the user device 402. [0077])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Moshyedi wherein the UI dashboard further depicts a detailed control input for accessing data details about the at least two separate datasets, wherein selection of the detailed control input facilitates displaying data content of each of the at least two separate datasets. By doing so the account determination system 320 may also be configured to display properties of data models
and data model training results including, for example, architecture, loss functions, cross entropy, activation function values, embedding layer structure and/or outputs, convolution results, node outputs, or the like on the user device 402. Moshyedi [0077].
Regarding claim 14 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki as modified does not fully disclose, wherein the executable code, when executed, further causes the at least one processor to: display, via the graphical user interface, an authentication page for receiving authentication information of a user; receive, via the authentication page, the authentication information of the user; verify the authentication information of the user; and provide, based on the authentication information being verified, access to the UI dashboard.
Moshyedi teaches wherein the executable code, (program code [0113]) when executed, further causes the at least one processor to: (one or more processor [0114]) display, via the graphical user interface, an authentication page for receiving authentication information of a user; (web server 410 may host a financial service provider website that a user device may access by providing an attempted login that are
authenticated by the account determination system 320 [0081]) receive, via the authentication page, the authentication information of the user; (web server 410 may
include software tools, similar to those described with respect to user device 402 above, that may allow web server 410 to obtain network identification data from user device
402. [0081]) verify the authentication information of the user; (authenticated by the account determination system 320 [0081]) and provide, based on the authentication information being verified, access to the UI dashboard.( Web server 410 may include a computer system configured to generate and provide one or more websites accessible to customers, as well as any other individuals involved in accessing account consolidation system 408's normal operations. [0081])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Moshyedi wherein the executable code, when executed, further causes the at least one processor to: display, via the graphical user interface, an authentication page for receiving authentication information of a user; receive, via the authentication page, the authentication information of the user; verify the authentication information of the user; and provide, based on the authentication information being verified, access to the UI dashboard. By doing so the account
consolidation system 408 may include one or more servers and computer systems for performing one or more functions associated with products and/or services that the organization provides. Moshyedi [0080].
Regarding claim 15 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki as modified further teaches wherein depicting the detailed control input is restricted to user accounts of credentialed users that are permitted to access sensitive data. (Fig. 8 Data and Model Privacy – your model and your data are never accessed by the Alecito system and remain fully private [0015]
Prendki as modified does not fully disclose wherein the UI dashboard further depicts a detailed control input for accessing data details about the at least two separate datasets,
Moshyedi teaches wherein the UI dashboard further depicts a detailed control input (user device to display, via a GUI, [0036]) for accessing data details about the at least two separate datasets, (a listing of the plurality of accounts and one or more selectable user input objects proximate each of the plurality of accounts,
wherein the user device is associated with the user [0036])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Moshyedi wherein the UI dashboard further depicts a detailed control input for accessing data details about the at least two separate datasets. By doing so the system may then normalize the terms and benefits associated
with the second credit card in comparison to those associated with the first credit card, such that the second credit card may be consolidated into the first credit card. Moshyedi [0090].
Regarding claim 16 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki as modified does not fully disclose wherein the UI dashboard further depicts a scanning control input to initiate a review of the at least two separate datasets, wherein the at least two separate datasets are depicted based on a user selecting the scanning control input.
Moshyedi teaches wherein the UI dashboard further depicts a scanning control input (The account determination system 320 may include programs (scripts, functions, algorithms) to configure data for visualizations [0077]) to initiate a review of the at least two separate datasets, wherein the at least two separate datasets are depicted based on a user selecting the scanning control input. (and provide visualizations of datasets and data models on the user device 402. This may include programs to generate graphs and display graphs. The account determination system 320 may include programs to generate histograms, scatter plots, time series, or the like on the user device 402. [0077])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Moshyedi wherein the UI dashboard further depicts a scanning control input to initiate a review of the at least two separate datasets, wherein the at least two separate datasets are depicted based on a user selecting the scanning control input. By doing so The account determination system 320 may also be configured to display properties of data models and data model training results. Moshyedi [0077].
Regarding claim 17 Prendki in view of Achin, Bui and Darji teaches the computing system of claim 6,
Prendki as modified does not fully disclose wherein the executable code, when executed, further causes the at least one processor to receive, via selection of a scanning control input, an indication to initiate comparing data of at least two separate datasets, and based on receiving the indication, transmit an initiation signal to the backend system to perform a comparison of the at least two separate datasets.
Moshyedi teaches wherein the executable code, when executed, (machine learning model [0093] such as “executable code” further causes the at least one processor to receive,(one or more processors [0093]) via selection of a scanning control input, (user device [0093]) such as “scanning control input” an indication (a first response to the first notification; responsive to receiving the first response, [0093]) to initiate comparing data of at least two separate datasets, (responsive to receiving the first response, normalize, via a second MLM, one or more respective features of one or
more second accounts of the plurality of accounts in comparison to the one or more respective features of the first account; [0093]) and based on receiving the indication, (a first response to the first notification; responsive to receiving the first response, [0093]) transmit an initiation signal (execute software programs that perform processes [0113]) such as “transmit an initiation signal” to the backend system (The database 416 may also serve as a back-up storage device and may contain data and information that is also stored on, for example, database 360, as discussed with reference to FIG. 3. [0084]) to perform a comparison of the at least two separate datasets. (normalize, via a second MLM, one or more respective features of one or more second accounts of the plurality of accounts in comparison to the one or more respective features of the first account; [0093])
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Prendki to incorporate the teachings of Moshyedi wherein the executable code, when executed, further causes the at least one processor to receive, via selection of a scanning control input, an indication to initiate comparing data of at least two separate datasets, and based on receiving the indication, transmit an initiation signal to the backend system to perform a comparison of the at least two separate datasets. By doing so responsive to receiving the second response, consolidate the one or more second accounts into the first account based on the second recommendation. Moshyedi [0093].
Conclusion
6. THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire
THREE MONTHS from the mailing date of this action. In the event a first reply is
filed within TWO MONTHS of the mailing date of this final action and the advisory action
is not mailed until after the end of the THREE-MONTH shortened statutory
period, then the shortened statutory period will expire on the date the advisory
action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be
calculated from the mailing date of the advisory action. In no event, however, will
the statutory period for reply expire later than SIX MONTHS from the date of this
final action.
Examiner interviews are available via telephone, in-person, and video
conferencing using a USPTO supplied web-based collaboration tool. To schedule an
interview, applicant is encouraged to use the USPTO Automated Interview Request
(AIR) at http://www.uspto.gov/interviewpractice.
7. Any inquiry concerning this communication or earlier communications from the
examiner should be directed to Kweku Halm whose telephone number is (469)295-
9144. The examiner can normally be reached on 9:00AM - 5:30PM Mon - Thur. If
attempts to reach the examiner by telephone are unsuccessful, the examiner's
supervisor, Sanjiv Shah can be reached on (571) 272 - 4098. The fax phone
number for the organization where this application or proceeding is assigned is 571-273-
8300.
Information regarding the status of an application may be obtained from the
Patent Application Information Retrieval (PAIR) system. Status information for published
applications may be obtained from either Private PAIR or Public PAIR. Status information
for unpublished applications is available through Private PAIR only. For more
information about the PAIR system, see http://pair-direct.uspto.gov. Should you have
questions on access to the Private PAIR system, contact the Electronic Business Center
(EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer
Service Representative or access to the automated information system, call 800-786-
9199 (IN USA OR CANADA) or 571-272-1000.
/KWEKU WILLIAM HALM/Examiner, Art Unit 2166
/SANJIV SHAH/Supervisory Patent Examiner, Art Unit 2166