Prosecution Insights
Last updated: May 29, 2026
Application No. 18/089,513

SOURCE-FREE ACTIVE ADAPTATION TO DISTRIBUTIONAL SHIFTS FOR MACHINE LEARNING

Final Rejection §103
Filed
Dec 27, 2022
Examiner
HWANG, MEGAN ELIZABETH
Art Unit
2143
Tech Center
2100 — Computer Architecture & Software
Assignee
Intel Corporation
OA Round
2 (Final)
48%
Grant Probability
Moderate
3-4
OA Rounds
4m
Est. Remaining
99%
With Interview

Examiner Intelligence

Grants 48% of resolved cases
48%
Career Allowance Rate
11 granted / 23 resolved
-7.2% vs TC avg
Strong +60% interview lift
Without
With
+60.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
9 currently pending
Career history
44
Total Applications
across all art units

Statute-Specific Performance

§101
10.6%
-29.4% vs TC avg
§103
79.8%
+39.8% vs TC avg
§102
8.7%
-31.3% vs TC avg
§112
1.0%
-39.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 23 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-25 are pending. Claims 26-33 have been canceled. This Office Action is responsive to the amendment filed on 12/19/2025, which has been entered into the above identified application. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-5, 8-9, 12-16, 19-20, and 23-25 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar et al. (“IITK at SemEval-2021 Task 10: Source-Free Unsupervised Domain Adaptation using Class Prototypes”, published August 5, 2021), hereinafter Kumar; in view of Peterson et al. (US 20210073677 A1, filed 03/25/2020), hereinafter Peterson; in further view of Wang et al. (“TENT: Fully Test-Time Adaptation by Entropy Minimization”, published 03/18/2021), hereinafter DWang. Regarding Claim 1, Kumar teaches a system comprising: steps to train a neural network based on baseline data samples (Kumar: “In SFDA, instead of sharing the source domain data, only a model that has been trained on the source domain data is shared.” [Section 1. Introduction]); compare new data samples received after the training to a threshold uncertainty value, wherein the threshold uncertainty value is associated with a distributional shift between the baseline data samples and the new data samples (Kumar: “For solving this subtask our strategy is to make use of high-confidence prototypes from the target domain to reinforce the target specific features of the source model. We propose a simple augmentation technique that makes use of these high-confidence prototypes to generate labelled artificial datapoints.” [Section 1. Introduction]; “One way to approach this would be through a concept from self-learning, i.e., by finding the most reliable samples from the target data over which the model S is sufficiently confident and using these predictions as the corresponding ground truth. In order to find reliable target samples, self-entropy H can be used to quantify the prediction uncertainty: [Equation 1].” [Section 3. System Overview]; “one crucial hyperparameter to consider is the self-entropy threshold below which the residing target samples will be identified as prototypes. The key issue faced while determining the value of this hyperparameter is the disparities and highly imbalanced data distribution between the classes.” [Section 3. System Overview]); select a subgroup of the data samples that satisfy the threshold uncertainty value (Kumar: “For now, the self-entropy threshold value is defined by the 50th percentile (median) self-entropy value of the minority class i.e. negated class.” [Section 3. System Overview]; “The samples with smaller self-entropy indicate that the classifier is more confident over them, and these are referred to as Prototypes.” [Section 3. System Overview]); and determine updated parameters for the neural network based on the subgroup of data samples (Kumar: “We propose a simple augmentation technique that makes use of these high-confidence prototypes to generate labelled artificial datapoints. These augmented samples are then used to perform supervised fine-tuning of our source model.” [Section 1. Introduction]; “Now both the prototypes and their augmented samples with their respective labels are used with cross entropy loss to update the weights of the feature extractor module F of the pre-trained network S, with classifier module C of the pre-trained network being frozen.” [Section 3. System Overview]). However, Kumar fails to expressly disclose A system comprising: interface circuitry; programmable circuitry; and instructions to cause the programmable circuitry to: determine updated batch-normalization parameters for a neural network; and cause transmission of the updated batch-normalization parameters to a remote system to update a copy of the neural network to the remote system. In the same field of endeavor, Peterson teaches A system comprising: interface circuitry (Peterson: “Computer system 500 also includes a communication interface 518 coupled to bus 502. Communication interface 518 provides a two-way data communication coupling to a network link 520 that is connected to a local network 522.” [0111]); programmable circuitry (Peterson: “The special-purpose computing devices may be hard-wired to perform the techniques, or may include digital electronic devices such as one or more application-specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) that are persistently programmed to perform the techniques, or may include one or more general purpose hardware processors programmed to perform the techniques pursuant to program instructions in firmware, memory, other storage, or a combination.” [0102]); and instructions to cause the programmable circuitry to: cause transmission of updated parameters to a remote system to update a copy of a neural network to the remote system (Peterson: “A central server periodically aggregates the gradients it receives from many parties, and applies them to the general model. The updated general model's aggregated gradients are then pushed (i.e. published) back to all the parties, which subsequently replace their locally retrained models with the retrained general model.” [0024]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated interface circuitry; programmable circuitry; and instructions to cause the programmable circuitry to: cause transmission of updated parameters to a remote system to update a copy of the neural network to the remote system, as taught by Peterson to the system of Kumar because both of these systems are directed towards domain adaptation of a machine learning model. In making this combination, it would provide the system of Kumar the hardware on which to operate, as well as allow for federated learning, enabling “multiple parties to jointly retrain a shared model, such that all parties contribute to a benefit from the large scale distributed retraining facilitated by FL” (Peterson: [0004]), while also preserving the privacy of all independent parties (Peterson: [0006]-[0008]). Kumar and Peterson still fail to expressly disclose determining updated batch-normalization parameters for a neural network. In the same field of endeavor, DWang teaches determining updated batch-normalization parameters for a neural network (DWang: “We propose to adapt by test entropy minimization (tent1): we optimize the model for confidence as measured by the entropy of its predictions. Our method estimates normalization statistics and optimizes channel-wise affine transformations to update online on each batch.” [Abstract]; “Our networks are equipped with batch normalization. For the source model without adaptation, the normalization statistics are estimated during training on the source data. For all test-time adaptation methods, we estimate these statistics during testing on the target data, as done in concurrent work on adaptation by normalization.” [Section 4. Experiments]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated determining updated batch-normalization parameters for a neural network, as taught by DWang to the system of Kumar and Peterson because both of these systems are directed towards entropy minimization in source-free domain adaptation. In making this combination and using the training data subset for specifically updating the batch-normalization parameters, it would allow the system of Kumar and Peterson to reduce both entropy and error by adopting “low-dimensional, channel-wise feature modulation” (DWang: [Section 1. Introduction]). Regarding Claim 2, Kumar, Peterson, and DWang teach the system of Claim 1, wherein the programmable circuitry is to determine the updated batch-normalization parameters based on a first loss term and based on a second loss term (DWang: “To adapt during testing we minimize the entropy of model predictions. We call this objective the test entropy and name our method tent after it. We choose entropy for its connections to error and shift. Entropy is related to error, as more confident predictions are all-in-all more correct (Figure 1). Entropy is related to shifts due to corruption, as more corruption results in more entropy, with a strong rank correlation to the loss for image classification as the level of corruption increases (Figure 2).” [Section 1. Introduction]; “TTT and our setting adapt the model by optimizing an unsupervised loss during testing L(xt). During training, TTT jointly optimizes this same loss on source data L(xs) with a supervised loss L(xs,ys), to ensure the parameters θ are shared across losses for compatibility with adaptation by L(xt).” [Section 2. Setting: Fully Test-Time Adaptation]). Regarding Claim 3, Kumar, Peterson, and DWang teach the system of Claim 1, wherein the programmable circuitry is to update a batch normalization layer of the neural network based on the new data samples (DWang: “Each step updates the normalization statistics and transformation parameters on a batch of data. The normalization statistics are estimated for each layer in turn, during the forward pass. The transformation parameters γ,β are updated by the gradient of the prediction entropy ∇H(ˆy), during the backward pass.” [Section 3.3 Algorithm]). Regarding Claim 4, Kumar, Peterson, and DWang teach the system of Claim 1, wherein to determine the threshold uncertainty value, the programmable circuitry is to: assign uncertainty values to items of the baseline data set (Kumar: “One way to approach this would be through a concept from self-learning, i.e., by finding the most reliable samples from the target data over which the model S is sufficiently confident and using these predictions as the corresponding ground truth. In order to find reliable target samples, self-entropy H can be used to quantify the prediction uncertainty: [Equation 1].” [Section 3. System Overview]; “To ascertain the relationship between self-entropy and a prediction’s reliability, we analyse the baseline performance scores for the data points within the varying self-entropy percentile threshold as shown in figure 1. For the baseline, we observe a direct correlation between a lower self-entropy and a higher prediction score on the respective data points. This further supports the use of low self-entropy data points as class prototypes in our proposed approaches.” [Section 5. Results]); and set the threshold uncertainty value to be greater than a majority of the assigned uncertainty values of the items of the baseline data set (Kumar: “For prototype selection, instead of using an absolute threshold value, we have chosen a percentile based entropy threshold as it adapts relatively well across different domains. This follows from the fact that confidence of the model may vary from domain to domain due to which a threshold chosen for one domain might not be a good criteria for another domain.” [Section 5. Results]; “Analysis of the practice data (figure 2) further showed that the lowest self-entropy achieved by the negated class is far higher than that of non-negated class. For now, the self-entropy threshold value is defined by the 50th percentile (median) self-entropy value of the minority class i.e. negated class.” [Section 3. System Overview]). Regarding Claim 5, Kumar, Peterson, and DWang teach the system of Claim 4, wherein the uncertainty values assigned to the items of the baseline data set are predictive entropy values (Kumar: “To ascertain the relationship between self-entropy and a prediction’s reliability, we analyse the baseline performance scores for the data points within the varying self-entropy percentile threshold as shown in figure 1. For the baseline, we observe a direct correlation between a lower self-entropy and a higher prediction score on the respective data points. This further supports the use of low self-entropy data points as class prototypes in our proposed approaches.” [Section 5. Results]). Regarding Claim 8, Kumar, Peterson, and DWang teach the system of Claim 1, wherein the programmable circuitry is to update at least one of a scale parameter or a shift parameter of a batch normalization layer of the neural network (DWang: “Tent modulates features during testing by estimating normalization statistics µ,σ and optimizing transformation parameters γ,β. Normalization and transformation apply channel-wise scales and shifts to the features.” [Fig. 4]). Regarding Claim 9, Kumar, Peterson, and DWang teach the system of Claim 1, wherein the threshold uncertainty value is determined based on predictive entropy (Kumar: “For prototype selection, instead of using an absolute threshold value, we have chosen a percentile based entropy threshold as it adapts relatively well across different domains. This follows from the fact that confidence of the model may vary from domain to domain due to which a threshold chosen for one domain might not be a good criteria for another domain.” [Section 5. Results]). Regarding Claims 12-16, 19-20, and 23-25, they are NTCRM and method claims that correspond with the system of Claims 1-5 and 8-9. Therefore, they are rejected for the same reasons as Claims 1-5 and 8-9 above. Claims 6-7, 10, 17-18, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar in view of DWang and Peterson, as applied to Claims 1, 2, 12 and 13, in further view of Wang et al. ("Active Source Free Domain Adaptation", published 05/22/2022), hereinafter FWang. FWang was cited in the previous Office Action. Regarding Claim 6, Kumar, Peterson, and DWang teach the system of Claim 2, wherein the first loss term is a cross entropy loss (Kumar: “Now both the prototypes and their augmented samples with their respective labels are used with cross entropy loss to update the weights of the feature extractor module F of the pre-trained network S, with classifier module C of the pre trained network being frozen.” [Section 3. System Overview]). However, they fail to expressly disclose wherein the second loss term is a cosine similarity of feature embeddings of the neural network before and after model adaptation. In the same field of endeavor, FWang teaches wherein the second loss term is a cosine similarity of feature embeddings of the neural network before and after model adaptation (FWang: “Given a target sample x, its neighbor ambient uncertainty NAU(x) is defined by multiplying the neighbor purity NP(x) and the neighbor affinity NA(x): NAU(x) = NP(x) * NA(x).” [Section 2.2 Minimum Happy Points Exploration]; “Neighbor affinity describes how close a sample is to its neighbors. To measure the close degree, we first define the neighbor similarity space SsN for each sample by [Equation 5], where SN1 represents the cosine similarity between x and its neighbor N1. Further, the neighbor affinity is measured by the average similarity between x and its neighbors: [Equation 6]. The more low is the neighbor affinity, the more far is the sample to its neighbors, and such that it is more likely to be outliers because outliers do not have compact neighbors.” [Section 2.2 Minimum Happy Points Explorations]). It would have been obvious to one of skill in the art before the effective filing date of the invention to have incorporated wherein the first loss term is a cross entropy loss and the second loss term is a cosine similarity of feature embeddings of the neural network before and after model adaptation, as taught by FWang to the system of Kumar, Peterson, and DWang because both of these systems are directed towards source-free domain adaptation of a trained source model to an unlabeled target domain. In making this combination and accounting for cosine similarity as a factor for the loss calculations, it would allow the system of Kumar, Peterson, and DWang to determine neighbor affinity between samples, allowing the system to account for outliers by sampling how close a sample is to its neighbors in the probability distribution space (FWang: [Section 2.2 Minimum Happy Points Exploration]). Regarding Claim 7, Kumar, Peterson, and DWang teach the system of Claim 2, wherein the first loss term is a cross entropy loss (Kumar: “Now both the prototypes and their augmented samples with their respective labels are used with cross entropy loss to update the weights of the feature extractor module F of the pre-trained network S, with classifier module C of the pre trained network being frozen.” [Section 3. System Overview]). However, they fail to expressly disclose wherein the second loss term is a Kullback-Leibler divergence of feature embeddings of the neural network before and after model adaptation. In the same field of endeavor, FWang teaches wherein the second loss term is a Kullback-Leibler divergence of feature embeddings of the neural network before and after model adaptation (FWang: “Entropy loss and KL divergence are introduced to guarantee the unambiguous and balanced classes [39, 40], which has been widely used in clustering [41, 42], and several DA works [7, 37, 43, 44]: [Equation 10].” [Section 2.3 Minimum Happy Points Exploitation]). It would have been obvious to one of skill in the art before the effective filing date of the invention to have incorporated wherein the first loss term is a cross entropy loss and the second loss term is a Kullback-Leibler divergence of feature embeddings of the neural network before and after model adaptation, as taught by FWang to the system of Kumar, Peterson, and DWang because both of these systems are directed towards source-free domain adaptation of a trained source model to an unlabeled target domain. In making this combination and accounting for KL divergence as a factor for the loss calculation, it would allow the system of Kumar, Peterson, and DWang to “guarantee the unambiguous and balanced classes” (FWang: [Section 2.3 Minimum Happy Points Exploitation) and “promote model learning” (FWang: [Section 3.2 Analysis]). Regarding Claim 10, Kumar, Peterson, and DWang teach the system of Claim 1, wherein new data samples are ranked based on entropy (Kumar: “The samples with smaller self-entropy indicate that the classifier is more confident over them, and these are referred to as Prototypes.” [Section 3. System Overview]). However, they fail to expressly disclose wherein new data samples are ranked to identify samples for active labeling. In the same field of endeavor, FWang teaches wherein new data samples are ranked to identify samples for active labeling (FWang: “we introduce a more practical scenario called active source free domain adaptation (ASFDA) that permits actively selecting a few target data to be labeled by experts.” [Abstract]; “Focusing on informative MH points can increase the importance of active labeled samples and further avoid the model to overfit on the samples with wrong pseudo-labels.” [Section 2.3 Minimum Happy Points Exploitation]). It would have been obvious to one of skill in the art before the effective filing date of the invention to have incorporated wherein new data samples are ranked to identify samples for active labeling, as taught by FWang to the system of Kumar, Peterson, and DWang because both of these systems are directed towards source-free domain adaptation of a trained source model to an unlabeled target domain. In making this combination and adopting an active labeling strategy, it would allow the system of Kumar, Peterson, and DWang to “avoid the model to overfit on the samples with wrong pseudo-labels” (FWang: [Section 2.3 Minimum Happy Points Exploitation]). Regarding Claims 17-18 and 21, they are NTCRM claims that correspond with the system of Claims 6-7 and 10. Therefore, they are rejected for the same reasons as Claims 6-7 and 10 above. Claims 11 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Kumar in view of DWang and Peterson, as applied to Claims 1 and 12 13, in further view of Klaß et al. (“Uncertainty-aware Evaluation of Time-Series Classification for Online Handwriting Recognition with Domain Shift”, published 06/17/2022), hereinafter Klaß. Regarding Claim 11, Kumar, Peterson, and DWang teach the system of Claim 1. However, they fail to expressly disclose wherein the threshold uncertainty value is an epistemic uncertainty value based on feature dissimilarity. In the same field of endeavor, Klaß teaches wherein the threshold uncertainty value is an epistemic uncertainty value based on feature dissimilarity (Klaß: “We further highlight the trade-off when using information theory-based measures to decide whether a sample is too uncertain to classify correctly. This is depicted by Figure 5a showing the relationship between classification accuracies and different threshold values. We choose the entropy as the target metric for uncertainty evaluation (MI would work analogously). On the x-axis is the accuracy of the samples above the threshold, i.e., samples our model feels confident about classifying correctly. On the y-axis is the accuracy for the samples below the threshold. These values would be considered as too inaccurate to confidently classify.” [Section 5.2 Uncertainty based on Information Theory]; “For the epistemic part, the diagonal contains the squared difference to the mean softmax outputs (over T samples). The off-diagonal has positive values when the softmax values coincide and negative values if the softmax values display an inverse relationship.” [Section 3.3.1 Uncertainty Decomposition]). It would have been obvious to one of skill in the art before the effective filing date of the invention to have incorporated wherein the threshold uncertainty value is an epistemic uncertainty value based on feature dissimilarity, as taught by Klaß to the system of Kumar, Peterson, and DWang because both of these systems are directed towards utilizing uncertainty metrics to characterize domain shifts for machine learning. In making this combination and accounting for epistemic uncertainty when determining a sampling threshold, it would allow the system of Kumar, Peterson, and DWang to account for “information gain about the model parameters that would be obtained when observing the true outcome” [Section 3.3 Uncertainty Decomposition]. Response to Arguments Examiner acknowledges the amendments to Claims 1-3, 10, 12-14, and 23-25. Applicant's arguments, filed 12/19/2025, regarding the rejection of Claims 1-33 under 35 U.S.C. § 101 have been fully considered and are persuasive. The rejection has been withdrawn. Applicant's arguments, filed 12/19/2025, regarding the rejection of Claims 1-33 under 35 U.S.C. § 103 have been fully considered and are found moot in light of the new grounds of rejection (see rejection above). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Prabhu et al. (“Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings”) discusses active domain adaptation using uncertainty-weight clusterings as a label acquisition strategy for identifying target instances. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MEGAN E HWANG whose telephone number is (703)756-1377. The examiner can normally be reached Monday-Thursday 10:00-7:30 ET. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at (571) 272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /M.E.H./Examiner, Art Unit 2143 /JENNIFER N WELCH/Supervisory Patent Examiner, Art Unit 2143
Read full office action

Prosecution Timeline

Dec 27, 2022
Application Filed
Feb 24, 2023
Response after Non-Final Action
Sep 19, 2025
Non-Final Rejection mailed — §103
Dec 18, 2025
Applicant Interview (Telephonic)
Dec 18, 2025
Examiner Interview Summary
Dec 19, 2025
Response Filed
Apr 27, 2026
Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12619854
NEURAL NETWORK INFERENCE QUANTIZATION
4y 2m to grant Granted May 05, 2026
Patent 12456093
Corporate Hierarchy Tagging
4y 1m to grant Granted Oct 28, 2025
Patent 12437514
VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING
3y 11m to grant Granted Oct 07, 2025
Patent 12437517
VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING
1y 12m to grant Granted Oct 07, 2025
Patent 12437518
VIDEO DOMAIN ADAPTATION VIA CONTRASTIVE LEARNING FOR DECISION MAKING
1y 12m to grant Granted Oct 07, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4
Expected OA Rounds
48%
Grant Probability
99%
With Interview (+60.3%)
3y 9m (~4m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 23 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month