Last updated: May 29, 2026

Application No. 18/159,902

MACHINE LEARNING MODEL UPDATE BASED ON DATASET OR FEATURE UNLEARNING

Non-Final OA §103

Filed

Jan 26, 2023

Priority

Mar 30, 2022 — provisional 63/362,139

Examiner

HOANG, MICHAEL H

Art Unit

2122

Tech Center

2100 — Computer Architecture & Software

Assignee

Sony Group Corporation

OA Round

2 (Non-Final)

This examiner grants 52% of cases after interview

— +24.3% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 142 resolved cases, 2023–2026

Examiner Intelligence

HOANG, MICHAEL H View full profile →

Grants 52% of resolved cases

Career Allowance Rate

74 granted / 142 resolved

-2.9% vs TC avg

Strong +24% interview lift

Without

With

+24.3%

Interview Lift

resolved cases with interview

Typical timeline

4y 5m

Avg Prosecution

20 currently pending

Career history

163

Total Applications

across all art units

Statute-Specific Performance

§101

7.0%

-33.0% vs TC avg

§103

82.1%

+42.1% vs TC avg

§102

3.2%

-36.8% vs TC avg

§112

2.5%

-37.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 142 resolved cases

Office Action

§103

DETAILED ACTION
This action is in response to the claims filed 04/06/2026 for Application number 18/159,902. Claims 1-2, 6-7, 10, 12, 16-17, and 19-20 have been amended. Thus, claims 1-20 are currently pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 12-14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over David et al. ("US 20190370665 A1", hereinafter "David") in view of Cao et al. ("Towards Making Systems Forget with Machine Unlearning", hereinafter "Cao").

Regarding claim 1, David teaches An electronic device (¶0067), comprising: 
circuitry (¶0048) configured to: 
receive a data subset of a first dataset associated with a user (“Local endpoint device(s) 214 may include a randomization engine for generating fully or semi-random input data for probing the target model. Local endpoint device(s) 214 may include one or more input device(s) 222 for receiving input from a user (e.g., neural network parameters, such as, numbers, sizes, dimensions and configurations of neurons, synapses, and layers, accuracy or training thresholds, etc.).” [¶0046; randomly probing input data from an original dataset corresponds to a data subset of a first dataset]);
train a first machine learning model based on the first dataset associated with the user (“A local device (e.g., 214 of FIG. 2) cannot directly copy or retrain target neural network 100 using conventional methods because the target neural network 100 itself as well as the original training dataset (“first dataset”) used to train target neural network 100 are inaccessible to the local device.” [¶0039; target neural network corresponds to a first machine learning model.]); 
train a second machine learning model based on the received data subset (“Local endpoint device(s) 214 may use the random probe dataset (received data subset) to train a new model to mimic the target model, in various embodiments, memory 220 may store the entire random probe dataset used to train the new model at once (corresponds to training a second machine learning model), or may incrementally store on-the-fly each single or set of multiple training samples used in the current iteration or epoch, after which the subset is deleted (e.g., by active deletion or replacing the least recently used sample by a new sample).” [¶0045]); 
and 
update the trained first machine learning model, [based on the application of the transformation function on the trained first machine learning model], wherein the update of the trained first machine learning model corresponds to an unlearning of at least one of the received data subset or a set of features associated with the second machine learning model (“Because the random probe data acts as a placeholder for the original training dataset, adding or deleting data therefrom will effect substantially the same change in the model as if the data were added or deleted to/from the original training dataset. Thus, the target model may be modified or improved (“updating”) without ever accessing the original training dataset.” [¶0030; deleting data corresponds to “unlearning”]).
However David fails to explicitly teach apply, based on the trained second machine learning model, a transformation function on the trained first machine learning model 
updating… based on the application of the transformation function on the trained first machine learning model 
Cao teaches apply, based on the trained second machine learning model, a transformation function on the trained first machine learning model 
(“Fig. 1: Unlearning idea. Instead of making a model directly depend on each training data sample (left), we convert the learning algorithm into a summation form (right). Specifically, each summation is the sum of transformed data samples, where the transformation functions gi are efficiently computable.” [pg. 464, Fig. 1])
updating… based on the application of the transformation function on the trained first machine learning model (“To prepare for unlearning, we transform learning algorithms in a system to a form consisting of a small number of summations [33]. Each summation is the sum of some efficiently computable transformation of the training data samples. The learning algorithms depend only on the summations, not individual data. These summations are saved together with the trained model. (The rest of the system may still ask for individual data and there is no injected noise as there is in differential privacy.) Then, in the unlearning process, we subtract the data to forget from each summation, and then update the model.” [pg. 464, bottom right para – pg. 465, top left para])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify David’s data unlearning method by implementing the transformation function of Cao’s machine unlearning method. One would have been motivated to make this modification as it would lead to efficient unlearning without retraining from scratch. [pg. 464, B. Machine Unlearning, ¶1, Cao]

Regarding claim 2, David/Cao teaches The electronic device according to claim 1, David further teaches wherein the circuitry is further configured to receive a first user input indicative of a time duration associated with the data subset and the data subset is received based on the received first user input. (“Using different probe datasets may increase the diversity of training data, which typically increases the accuracy with which the new model mimics the target model in the same amount of training time or yields a similar accuracy in a faster training time.” [¶0026])

Regarding claim 3, David/Cao teaches The electronic device according to claim 2, Cao teaches wherein the trained first machine learning model corresponds to a recommendation model, and the updated first machine learning model is configured to output personalized recommendations, based on the received first user input. (“To support unlearning in LensKit, we converted its recommendation algorithm into the summation form.” [pg. 471, B. Analytical Results, ¶1; See further “First, for each data subset, we randomly chose a rating to forget, ran both unlearning and retraining, and compared the recommendation results for each user and the item-item similarity matrices computed” [pg. 472, C. Empirical Results, ¶1]])
It would have been obvious to one of ordinary skill in the art before the effective date to modify David’s machine learning models by substituting the recommendation model/algorithm of Cao. One would have been motivated to use a recommendation model within the field of endeavor of machine unlearning in order to provide a user with useful recommendations while removing noise and bogus recommendations. [pg. 478-479, Conclusion and Future work, Cao]

Regarding claim 4, David/Cao teaches The electronic device according to claim 1, David teaches wherein the circuitry is further configured to: 
compare an output of the trained first machine learning model with a threshold (“An above threshold or asymptotically levelling measure of error may trigger the training process to end.” [¶0057]); 
determine the output as faulty based on the comparison of the output of the trained first machine learning model with the threshold (“outputting a result of the NN applied to the dataset, calculating errors between the expected (e.g., target) and actual outputs, and adjusting NN weights to minimize errors. Training may be repeated until the error is minimized or converges.” [¶0004]); 
transmit a notification based on the determination that the output is faulty (“Local endpoint device(s) 214 may include one or more input device(s) 222 for receiving input from a user (e.g., neural network parameters, such as, numbers, sizes, dimensions and configurations of neurons, synapses, and layers, accuracy or training thresholds, etc.). Local endpoint device(s) 214 may include one or more output device(s) 216 (e.g., a monitor or screen) for displaying data to a user generated by device 214 or 202.” [¶0046]); and 
receive a second user input based on the transmitted notification indicative of the faulty output, wherein the data subset is received based on the second user input. (“Local endpoint device(s) 214 may send remote server 202 requests to make model predictions for a set of one or more inputs. Remote server 202 may run those inputs through the target model to generate corresponding outputs, and send those outputs to the local endpoint device(s) 214.” [¶0043; user input is inherent given outputs are being sent to the local endpoint devices])

Regarding claim 12, it is substantially similar to claim 1 respectively, and is rejected in the same manner, the same art, and reasoning applying. 

Regarding claims 13 and 14, they are substantially similar to claims 2 and 3 respectively, and are rejected in the same manner, the same art, and reasoning applying. 

Claim 20 recites features similar to claim 1 and is rejected for at least the same reasons therein. Claim 20 additionally requires A non-transitory computer-readable medium having stored thereon, computer- executable instructions that when executed by an electronic device, causes the electronic device to execute operations, the operations comprising (David, ¶0060, “Embodiments of the invention may include an article such as a non-transitory computer or processor readable medium, or a computer or processor non-transitory storage medium”)

Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over David in view of Cao and further in view of Warnecke et al. ("Machine Unlearning of Features and Labels", hereinafter "Warnecke").

Regarding claim 5, David/Cao teaches The electronic device according to claim 1, Cao teaches extract a first set of labels associated with the first dataset; 
extract a second set of labels associated with the received data subset; 
(“The system extracts the values of the selected features from each training data sample into a feature vector. It feeds the feature vectors and the malicious or benign labels of all training data samples into some machine learning algorithm to construct a succinct model” [pg. 466, Model training; each data sample would include a first dataset and subset])
however fails to explicitly teach wherein the circuitry is further configured to: 
remove the extracted second set of labels associated with the received data subset from the extracted first set of labels associated with the first dataset; and 
determine a third set of labels based on the removal of the extracted second set of labels associated with the data subset from the extracted first set of labels associated with the first dataset.
Warnecke teaches 
remove the extracted second set of labels associated with the received data subset from the extracted first set of labels associated with the first dataset (“As the second type of unlearning, we focus on correcting labels. This form of unlearning is necessary if the labels captured in a model contain unwanted information. For example, in generative language models, the training text is used as input features (preceding characters) and labels (target characters) [27, 48]. Hence, defects can only be eliminated if the labels are unlearned as well.” [pg. 7, Replacing labels]); and 
determine a third set of labels based on the removal of the extracted second set of labels associated with the data subset from the extracted first set of labels associated with the first dataset (“The new labels Y (“third set of labels”) can be individually selected for each data point, as long as they come from the domain Y, that is, Y ⊂ Y. Note that the replaced labels and features can be easily combined in one set of perturbations Z˜, so that defects affecting both can be corrected in a single update. In Section 6.2, we demonstrate that this combination can be used to remove unintended memorization from generative language models with high efficiency.” [pg. 7, Replacing labels]).
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify David’s/Cao’s teachings by removing labels and determining additional new labels as part of the machine unlearning process as taught by Warnecke. One would have been motivated to make this modification in order to efficiently update the model while enabling the removal of features and labels. [pg. 2, ¶3, Warnecke]

Regarding claim 15, it is substantially similar to claim 5 respectively, and is rejected in the same manner, the same art, and reasoning applying. 

Claims 8, 9 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over David in view of Cao and further in view of Lu et al. ("Anomaly Detection Method for Substation Equipment Based on Feature Matching and Multi-Semantic Classification", hereinafter "Lu").

Regarding claim 8, David/Cao teaches The electronic device according to claim 1, however fails to explicitly teach wherein the circuitry is further configured to: 
construct a stack layer associated with the transformation function, wherein the stack layer is configured to stack the trained first machine learning model and the trained second machine learning model to update the trained first machine learning model.
Lu teaches construct a stack layer associated with the transformation function, wherein the stack layer is configured to stack the trained first machine learning model and the trained second machine learning model to update the trained first machine learning model. (“In ResNet, the identity mapping can be constructed by superimposing a y = x layer on the basis of a shallow network. Then, the identity mapping is performed by jumping connections, which can skip one or more network layers, and the output is superimposed with the output of the stack layer. A non-linear transformation function H(x) is defined for the underlying mapping, and the stacked layers fit another mapping function F(x)” [pg. 110, A. Basic Feature Network, ¶1])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the teachings of David/Cao in order to implement the stack layer as taught by Lu. One would have been motivated to make this modification in order to solve the problem of network gradient vanishing or gradient explosion with deepening of depth. [pg. 110, A. Basic Feature Network, ¶1, Lu]

Regarding, claim 9, David/Cao/Lu teaches The electronic device according to claim 8, Lu teaches wherein the transformation function includes the constructed stack layer and a set of deep neural network (DNN) layers. (“Then, the identity mapping is performed by jumping connections, which can skip one or more network layers, and the output is superimposed with the output of the stack layer. A non-linear transformation function H(x) is defined for the underlying mapping, and the stacked layers fit another mapping function F(x)” [pg. 110, A. Basic Feature Network, ¶1])
Same motivation to combine the teachings of David/Cao/Lu as claim 8.

Regarding claim 18, it is substantially similar to claim 8 respectively, and is rejected in the same manner, the same art, and reasoning applying. 

Claims 10, 11, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over David in view of Cao and Lu and further in view of Xu et al. ("Understanding and Improving Layer Normalization", hereinafter "Xu").

Regarding claim 10, David/Cao/Lu teaches The electronic device according to claim 8, however fails to explicitly teach wherein the transformation function corresponds to a dot product of a first output of the trained first machine learning model with a second output of the trained second machine learning model.
Xu teaches wherein the transformation function corresponds to a dot product of a first output of the trained first machine learning model with a second output of the trained second machine learning model. (“AdaNorm adopts a new transformation function which can adaptively control scaling weights towards different inputs… where z = (z1, z2, . . . , zH) is the output of AdaNorm and ʘ is a dot product operation.” [pg. 7, 4. AdaNorm])
It would have been obvious to one of ordinary skill in the art before the effective filing date to modify the David’s/Cao’s/Lu’s teachings in order to use a dot product as a transformation function as taught by Xu. One would have been motivated to make this modification in order to use an adaptive transformation function that can adaptively scale weights. [6. Conclusion, pg. 9, Xu]

Regarding claim 11, David/Cao/Lu/Xu teaches The electronic device according to claim 10, Xu further teaches wherein the transformation function further corresponds to a normalization of the dot product based on the first output. (“To address the over-fitting problem, we propose a new normalization method, Adaptive Normalization (AdaNorm), by replacing the bias and gain with a new transformation function” [Abstract, See 4. AdaNorm for “dot product”])
Same motivation to combine the teachings of David/Cao/Lu/Xu as in claim 10.

Regarding claim 19, it is substantially similar to claim 10 respectively, and is rejected in the same manner, the same art, and reasoning applying. 

Response to Arguments
Applicant's arguments filed 04/06/2026 have been fully considered but they are not persuasive. 

	Regarding Allowable subject matter:
As noted in the last office action, claims 6-7, 16, and 17 have been searched however prior art rendering the claims anticipated or obvious has been uncovered. Therefore, the claims remain allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

	Regarding the 35 U.S.C. §103 Rejection:
Applicant asserts that the combination of David/Cao does not teach “apply, based on the trained second machine learning model, a transformation function on the trained first machine learning model” recited in claim 1. In particular, applicant asserts Cao merely converts the learning algorithm into a summation form however does not describe the application of the transformation g-function to the model trained on a dataset, based on a different model that is trained on a data subset of the dataset. Examiner respectfully disagrees. The Cao explicitly teaches “To prepare for unlearning, we transform learning algorithms in a system to a form consisting of a small number of summations [33]. Each summation is the sum of some efficiently computable transformation of the training data samples. The learning algorithms depend only on the summations, not individual data.” [pg. 464, bottom right col, bottom para] Examiner asserts that the learning algorithms of Cao’s system represent different learning models/algorithms (i.e. second machine learning models) trained on transformations of training data samples (i.e. subsets of dataset samples) in order to perform the unlearning process by subtracting the summations (applying transformation) to update the model (i.e. first machine learning model). The examiner further notes that the claim merely requires the applying step to be based on the trained second machine learning model without any further details of the exact training process of the second machine learning model or how the applying of the transformation function is performed. Therefore, under BRI, as long as the second machine learning algorithms/models are trained and are somehow involved within the application of the transformation function with the first machine learning model then the interpretation from the Cao reference would read on the limitation. Examiner also further notes the prior art of David also teaches training a second machine learning model on a subset of data however fails to explicitly teach the application of a transformation function to the first machine learning model. Thus, examiner asserts that the combination of David/Cao would teach the limitation as currently recited. Applicant’s arguments are not persuasive. 
Regarding the rejection of claims 10, 11, and 19, applicant asserts that the instant specification describes an electronic device and method for implementation for machine learning update based on dataset or feature unlearning, thus a person of ordinary skill in the art would not look at the disclosure of Xu at the time the invention was made to combine it with the other references. Examiner respectfully disagrees. In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, there are no details that would limit the claims to updating the machine learning model based on dataset or feature unlearning as asserted by the applicant. Claims 10, 11 and 19 merely specify that the transformation is a dot product of the outputs of the first and second machine learning models which does not limit the BRI of the claim to include updating based on a dataset or feature unlearning. The office action relies on Xu to explicitly teach the use of a dot product on model outputs. The examiner further cites (pg. 9, Conclusion, Xu) to show that there would be some motivation to use this adaptive transformation function consisting of a dot product. Therefore, applicant’s arguments are not persuasive. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims

Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached at (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHAEL H HOANG/PRIMARY EXAMINER, Art Unit 2122

Read full office action

Prosecution Timeline

Jan 26, 2023

Application Filed

Nov 18, 2025

Non-Final Rejection (signed) — §103

Jan 06, 2026

Non-Final Rejection mailed — §103

Apr 06, 2026

Response Filed

Apr 23, 2026

Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/150,657

Patent 12639617

RISK ASSESSMENT OF A PROPOSED CHANGE IN A COMPUTING ENVIRONMENT

5y 4m to grant Granted May 26, 2026

16/523,391

Patent 12632793

CLASSIFICATION IN HIERARCHICAL PREDICTION DOMAINS

6y 9m to grant Granted May 19, 2026

17/178,504

Patent 12632175

STOCHASTIC RISK SCORING WITH COUNTERFACTUAL ANALYSIS FOR STORAGE CAPACITY

5y 3m to grant Granted May 19, 2026

17/303,992

Patent 12632512

ULTRASONIC SYSTEM AND METHOD FOR TUNING A MACHINE LEARNING CLASSIFIER USED WITHIN A MACHINE LEARNING ALGORITHM

4y 11m to grant Granted May 19, 2026

15/841,094

Patent 12626782

ARCHITECTURES FOR TRAINING NEURAL NETWORKS USING BIOLOGICAL SEQUENCES, CONSERVATION, AND MOLECULAR PHENOTYPES

8y 5m to grant Granted May 12, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

2-3

Expected OA Rounds

52%

Grant Probability

76%

With Interview (+24.3%)

4y 5m (~1y 1m remaining)

Median Time to Grant

Moderate

PTA Risk

Based on 142 resolved cases by this examiner. Grant probability derived from career allowance rate.