DETAILED ACTION
This action is in response to the claims filed 01/26/2023 for Application number 18/159,902. Claims 1-20 are currently pending.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 01/26/2023, 07/18/2023, and 11/05/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 12-14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over David et al. ("US 20190370665 A1", hereinafter "David") in view of Cao et al. ("Towards Making Systems Forget with Machine Unlearning", hereinafter "Cao").
Regarding claim 1, David teaches An electronic device (¶0067), comprising:
circuitry (¶0048) configured to:
receive a data subset of a first dataset associated with a user (“Local endpoint device(s) 214 may include a randomization engine for generating fully or semi-random input data for probing the target model. Local endpoint device(s) 214 may include one or more input device(s) 222 for receiving input from a user (e.g., neural network parameters, such as, numbers, sizes, dimensions and configurations of neurons, synapses, and layers, accuracy or training thresholds, etc.).” [¶0046; randomly probing input data from an original dataset corresponds to a data subset of a first dataset]), wherein a first machine learning model is trained based on the first dataset associated with the user (“A local device (e.g., 214 of FIG. 2) cannot directly copy or retrain target neural network 100 using conventional methods because the target neural network 100 itself as well as the original training dataset (“first dataset”) used to train target neural network 100 are inaccessible to the local device.” [¶0039; target neural network corresponds to a first machine learning model.]);
train a second machine learning model based on the received data subset (“Local endpoint device(s) 214 may use the random probe dataset (received data subset) to train a new model to mimic the target model, in various embodiments, memory 220 may store the entire random probe dataset used to train the new model at once (corresponds to training a second machine learning model), or may incrementally store on-the-fly each single or set of multiple training samples used in the current iteration or epoch, after which the subset is deleted (e.g., by active deletion or replacing the least recently used sample by a new sample).” [¶0045]);
and
update the trained first machine learning model, [based on the application of the transformation function on the trained first machine learning model], wherein the update of the trained first machine learning model corresponds to an unlearning of at least one of the received data subset or a set of features associated with the second machine learning model (“Because the random probe data acts as a placeholder for the original training dataset, adding or deleting data therefrom will effect substantially the same change in the model as if the data were added or deleted to/from the original training dataset. Thus, the target model may be modified or improved (“updating”) without ever accessing the original training dataset.” [¶0030; deleting data corresponds to “unlearning”]).
However David fails to explicitly teach apply a transformation function on the trained first machine learning model based on the trained second machine learning model
updating… based on the application of the transformation function on the trained first machine learning model
Cao teaches apply a transformation function on the trained first machine learning model based on the trained second machine learning model (“Fig. 1: Unlearning idea. Instead of making a model directly depend on each training data sample (left), we convert the learning algorithm into a summation form (right). Specifically, each summation is the sum of transformed data samples, where the transformation functions gi are efficiently computable.” [pg. 464, Fig. 1])
updating… based on the application of the transformation function on the trained first machine learning model (“To prepare for unlearning, we transform learning algorithms in a system to a form consisting of a small number of summations [33]. Each summation is the sum of some efficiently computable transformation of the training data samples. The learning algorithms depend only on the summations, not individual data. These summations are saved together with the trained model. (The rest of the system may still ask for individual data and there is no injected noise as there is in differential privacy.) Then, in the unlearning process, we subtract the data to forget from each summation, and then update the model.” [pg. 464, bottom right para – pg. 465, top left para])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify David’s data unlearning method by implementing the transformation function of Cao’s machine unlearning method. One would have been motivated to make this modification as it would lead to efficient unlearning without retraining from scratch. [pg. 464, B. Machine Unlearning, ¶1, Cao]
Regarding claim 2, David/Cao teaches The electronic device according to claim 1, David further teaches wherein the circuitry is further configured to receive a first user input indicative of a time duration associated with the data subset, wherein the data subset is received based on the received first user input. (“Using different probe datasets may increase the diversity of training data, which typically increases the accuracy with which the new model mimics the target model in the same amount of training time or yields a similar accuracy in a faster training time.” [¶0026])
Regarding claim 3, David/Cao teaches The electronic device according to claim 2, Cao teaches wherein the trained first machine learning model corresponds to a recommendation model, and the updated first machine learning model is configured to output personalized recommendations, based on the received first user input. (“To support unlearning in LensKit, we converted its recommendation algorithm into the summation form.” [pg. 471, B. Analytical Results, ¶1; See further “First, for each data subset, we randomly chose a rating to forget, ran both unlearning and retraining, and compared the recommendation results for each user and the item-item similarity matrices computed” [pg. 472, C. Empirical Results, ¶1]])
It would have been obvious to one of ordinary skill in the art before the effective date to modify David’s machine learning models by substituting the recommendation model/algorithm of Cao. One would have been motivated to use a recommendation model within the field of endeavor of machine unlearning in order to provide a user with useful recommendations while removing noise and bogus recommendations. [pg. 478-479, Conclusion and Future work, Cao]
Regarding claim 4, David/Cao teaches The electronic device according to claim 1, David teaches wherein the circuitry is further configured to:
compare an output of the trained first machine learning model with a threshold (“An above threshold or asymptotically levelling measure of error may trigger the training process to end.” [¶0057]);
determine the output as faulty based on the comparison of the output of the trained first machine learning model with the threshold (“outputting a result of the NN applied to the dataset, calculating errors between the expected (e.g., target) and actual outputs, and adjusting NN weights to minimize errors. Training may be repeated until the error is minimized or converges.” [¶0004]);
transmit a notification based on the determination that the output is faulty (“Local endpoint device(s) 214 may include one or more input device(s) 222 for receiving input from a user (e.g., neural network parameters, such as, numbers, sizes, dimensions and configurations of neurons, synapses, and layers, accuracy or training thresholds, etc.). Local endpoint device(s) 214 may include one or more output device(s) 216 (e.g., a monitor or screen) for displaying data to a user generated by device 214 or 202.” [¶0046]); and
receive a second user input based on the transmitted notification indicative of the faulty output, wherein the data subset is received based on the second user input. (“Local endpoint device(s) 214 may send remote server 202 requests to make model predictions for a set of one or more inputs. Remote server 202 may run those inputs through the target model to generate corresponding outputs, and send those outputs to the local endpoint device(s) 214.” [¶0043; user input is inherent given outputs are being sent to the local endpoint devices])
Regarding claim 12, it is substantially similar to claim 1 respectively, and is rejected in the same manner, the same art, and reasoning applying.
Regarding claims 13 and 14, they are substantially similar to claims 2 and 3 respectively, and are rejected in the same manner, the same art, and reasoning applying.
Claim 20 recites features similar to claim 1 and is rejected for at least the same reasons therein. Claim 20 additionally requires A non-transitory computer-readable medium having stored thereon, computer- executable instructions that when executed by an electronic device, causes the electronic device to execute operations, the operations comprising (David, ¶0060, “Embodiments of the invention may include an article such as a non-transitory computer or processor readable medium, or a computer or processor non-transitory storage medium”)
Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over David in view of Cao and further in view of Warnecke et al. ("Machine Unlearning of Features and Labels", hereinafter "Warnecke").
Regarding claim 5, David/Cao teaches The electronic device according to claim 1, Cao teaches extract a first set of labels associated with the first dataset;
extract a second set of labels associated with the received data subset;
(“The system extracts the values of the selected features from each training data sample into a feature vector. It feeds the feature vectors and the malicious or benign labels of all training data samples into some machine learning algorithm to construct a succinct model” [pg. 466, Model training; each data sample would include a first dataset and subset])
however fails to explicitly teach wherein the circuitry is further configured to:
remove the extracted second set of labels associated with the received data subset from the extracted first set of labels associated with the first dataset; and
determine a third set of labels based on the removal of the extracted second set of labels associated with the data subset from the extracted first set of labels associated with the first dataset.
Warnecke teaches
remove the extracted second set of labels associated with the received data subset from the extracted first set of labels associated with the first dataset (“As the second type of unlearning, we focus on correcting labels. This form of unlearning is necessary if the labels captured in a model contain unwanted information. For example, in generative language models, the training text is used as input features (preceding characters) and labels (target characters) [27, 48]. Hence, defects can only be eliminated if the labels are unlearned as well.” [pg. 7, Replacing labels]); and
determine a third set of labels based on the removal of the extracted second set of labels associated with the data subset from the extracted first set of labels associated with the first dataset (“The new labels Y (“third set of labels”) can be individually selected for each data point, as long as they come from the domain Y, that is, Y ⊂ Y. Note that the replaced labels and features can be easily combined in one set of perturbations Z˜, so that defects affecting both can be corrected in a single update. In Section 6.2, we demonstrate that this combination can be used to remove unintended memorization from generative language models with high efficiency.” [pg. 7, Replacing labels]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify David’s/Cao’s teachings by removing labels and determining additional new labels as part of the machine unlearning process as taught by Warnecke. One would have been motivated to make this modification in order to efficiently update the model while enabling the removal of features and labels. [pg. 2, ¶3, Warnecke]
Regarding claim 15, it is substantially similar to claim 5 respectively, and is rejected in the same manner, the same art, and reasoning applying.
Claims 8, 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over David in view of Cao and further in view of Lu et al. ("Anomaly Detection Method for Substation Equipment Based on Feature Matching and Multi-Semantic Classification", hereinafter "Lu").
Regarding claim 8, David/Cao teaches The electronic device according to claim 1, however fails to explicitly teach wherein the circuitry is further configured to:
construct a stack layer associated with the transformation function, wherein the stack layer is configured to stack the trained first machine learning model and the trained second machine learning model to update the trained first machine learning model.
Lu teaches construct a stack layer associated with the transformation function, wherein the stack layer is configured to stack the trained first machine learning model and the trained second machine learning model to update the trained first machine learning model. (“In ResNet, the identity mapping can be constructed by superimposing a y = x layer on the basis of a shallow network. Then, the identity mapping is performed by jumping connections, which can skip one or more network layers, and the output is superimposed with the output of the stack layer. A non-linear transformation function H(x) is defined for the underlying mapping, and the stacked layers fit another mapping function F(x)” [pg. 110, A. Basic Feature Network, ¶1])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of David/Cao in order to implement the stack layer as taught by Lu. One would have been motivated to make this modification in order to solve the problem of network gradient vanishing or gradient explosion with deepening of depth. [pg. 110, A. Basic Feature Network, ¶1, Lu]
Regarding, claim 9, David/Cao/Lu teaches The electronic device according to claim 8, wherein the transformation function includes the constructed stack layer and a set of deep neural network (DNN) layers. (“Then, the identity mapping is performed by jumping connections, which can skip one or more network layers, and the output is superimposed with the output of the stack layer. A non-linear transformation function H(x) is defined for the underlying mapping, and the stacked layers fit another mapping function F(x)” [pg. 110, A. Basic Feature Network, ¶1])
Regarding claim 18, it is substantially similar to claim 8 respectively, and is rejected in the same manner, the same art, and reasoning applying.
Claims 10, 11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over David in view of Cao and Lu and further in view of Xu et al. ("Understanding and Improving Layer Normalization", hereinafter "Xu").
Regarding claim 10, David/Cao/Lu teaches The electronic device according to claim 8, however fails to explicitly teach wherein the transformation function corresponds to a dot product of a first output of the trained first machine learning with a second output of the trained second machine learning model.
Xu teaches wherein the transformation function corresponds to a dot product of a first output of the trained first machine learning with a second output of the trained second machine learning model. (“AdaNorm adopts a new transformation function which can adaptively control scaling weights towards different inputs… where z = (z1, z2, . . . , zH) is the output of AdaNorm and ʘ is a dot product operation.” [pg. 7, 4. AdaNorm])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the David’s/Cao’s/Lu’s teachings in order to use a dot product as a transformation function as taught by Xu. One would have been motivated to make this modification in order to use an adaptive transformation function that can adaptively scale weights. [6. Conclusion, pg. 9, Xu]
Regarding claim 11, David/Cao/Lu/Xu teaches The electronic device according to claim 10, Xu further teaches wherein the transformation function further corresponds to a normalization of the dot product based on the first output. (“To address the over-fitting problem, we propose a new normalization method, Adaptive Normalization (AdaNorm), by replacing the bias and gain with a new transformation function” [Abstract, See 4. AdaNorm for “dot product”])
Same motivation to combine the teachings of David/Cao/Lu/Xu as in claim 10.
Regarding claim 19, it is substantially similar to claim 10 respectively, and is rejected in the same manner, the same art, and reasoning applying.
Allowable Subject Matter
Claims 6, 7, 16, and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. None of the prior art, either alone or in combination fairly discloses limitations of claims 6, 7, 16, and 17, in particular:
Claim 6:
determine whether each label of the determined third set of labels corresponds to a categorical label;
determine a count of the determined third set of labels based on the determination that each of the determined third set of labels corresponds to the categorical label; and
determine a fourth label based on the determined count of the determined third set of labels, wherein the fourth label corresponds to a maximum count in the determined third set of labels, and
the second machine learning model is further trained based on the determined fourth label.
Claim 7:
determine whether each label of the determined third set of labels corresponds to a numerical label;
determine a mean of the determined third set of labels based on the determination that each of the determined third set of labels corresponds to the numerical label; and
determine a fifth label based on the determined mean of the determined third set of labels, wherein the second machine learning model is further trained based on the determined fifth label.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Golatkar et al. (“Eternal Sunshine of the Spotless Net: Selective Forgetting in Deep Networks”) discloses the concept of selectively forgetting a particular subset of data used for training a deep neural network. (Abstract)
Ullah et al. (“US 20230118785 A1”) discloses a method for machine unlearning and retraining (Abstract).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL H HOANG/Examiner, Art Unit 2122