Last updated: May 04, 2026

Application No. 17/534,340

ELECTRONIC DEVICE AND METHOD FOR TRAINING NEURAL NETWORK MODEL

Non-Final OA §103

Filed

Nov 23, 2021

Priority

Oct 20, 2021 — TW 110138818

Examiner

HICKS, AUSTIN JAMES

Art Unit

2142

Tech Center

2100 — Computer Architecture & Software

Assignee

Industrial Technology Research Institute

OA Round

3 (Non-Final)

Interview Optional

— +24.9% interview lift. Examiner has a relatively high allowance rate (76%); +24.9% interview lift. A written response may suffice.

Based on 404 resolved cases, 2023–2026

Examiner Intelligence

HICKS, AUSTIN JAMES View full profile →

Grants 76% — above average

Career Allowance Rate

308 granted / 404 resolved

+21.2% vs TC avg

Strong +25% interview lift

Without

With

+24.9%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

54 currently pending

Career history

458

Total Applications

across all art units

Statute-Specific Performance

§101

13.9%

-26.1% vs TC avg

§103

46.5%

+6.5% vs TC avg

§102

17.3%

-22.7% vs TC avg

§112

19.0%

-21.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 404 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 8/22/2025 have been fully considered but they are not persuasive. 
Applicant argues, “[t]here is nowhere in the teaching of Sohn that mentions the concept of a first neural network model includes a first sub-neural network model and a second sub-neural network model, and determine whether the second pseudo-label matches the first pseudo-label according to the probability vector.” Remarks 15. The submodels are shown in figure 1 below as the upper model and the lower model. The “according to” language pushes the interpretation of the claim to include any matching determination that includes a probability vector. In fig. 1 below, the matching happens at H(p,q), and it is done according to the two probability/prediction vectors output from the two models.

    PNG
    media_image1.png
    243
    533
    media_image1.png
    Greyscale


Allowable Subject Matter
Claims 5-7 and 17-19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The prior art of record does not teach or make obvious: 
“calculate an average probability of a first maximum probability in the first probability vector and a second maximum probability in the second probability vector; and 
in response to the average probability being greater than a probability threshold, determine whether the second pseudo-label matches the first pseudo-label…” (claim 5 and 17); 
“in response to a first maximum probability in the first probability vector being greater than a probability threshold and a second maximum probability in the second probability vector being greater than the probability threshold, determine whether the second pseudo-label matches the first pseudo-label…” (claims 6 and 18); or 
“the second pseudo-labeled data comprises a first probability vector corresponding to the first sub-neural network model and a second probability vector corresponding to the second sub-neural network model, wherein the processor is further configured to: calculate an average probability vector of the first probability vector and the second probability vector; and determine the second pseudo-label according to the average probability vector.” (claims 7 and 19).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 8-16 and 20-24 are rejected under 35 U.S.C. 103 as being unpatentable over Self-training with Noisy Student improves ImageNet classification by Xie et al, US 20150095017 A1 to Minh et al and FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence by Sohn et al.

Xie teaches claims 1 and 13. An 

(Xie fig 1 student model is the first NN, see below)

    PNG
    media_image2.png
    478
    638
    media_image2.png
    Greyscale


obtain a first pseudo-labeled data with a first pseudo-label; (Xie fig 1 shows pseudo labeled dataset used by the student model.)
input the first pseudo-labeled data into the first neural network model to obtain a second pseudo-labeled data, (Xie abs “we first train an EfficientNet model on labeled images and use it as a teacher to generate pseudo labels for 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher.” The second dataset is the pseudo labels generated by the “new teacher” which was the student model that trained on the last round of pseudo labeled data.)


train the first neural network model according to the pseudo-labeled dataset; and (Xie abs “we first train an EfficientNet model on labeled images and use it as a teacher to generate pseudo labels for 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher.”) 

Xie doesn’t teach an electronic device. 
	However, electronic device adaptable for training a neural network model, comprising: a storage medium, storing… a processor, coupled to the storage medium, wherein the processor is configured to… (Mnih para 15 “a system and method of predicting a word association between words in a word dictionary, comprising processor implemented steps of storing data defining a word association matrix including a plurality of vectors…” Mnih para 74 “As will be appreciated, removable storage unit 1018 includes a computer usable storage medium having stored therein computer software and/or data.“)
	The claims, Mnih and Xie are all machine learning algorithms. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to execute Xie on a computer because computers are the machine in the phrase machine learning.
	Xie doesn’t teach matching the pseudo-labeled data.
	However, Sohn teaches wherein the second pseudo-label data includes a probability vector;… determine whether a second pseudo-label corresponding to the second pseudo-labeled data matches the first pseudo-label corresponding to the first pseudo-labeled data, wherein the first neural network model includes a first sub-neural network model and a second sub-neural network model, and determine whether the second pseudo-label matches the first pseudo-label according to the probability vector; (Sohn pg. 3 above eq. 4, “To obtain an artificial label, we first compute the model’s predicted class distribution given a weakly-augmented version of a given unlabeled image… Then, we use [highest probability prediction] as a pseudo-label…” The second-pseudo labeled data “matches” a first pseudo label when one of the predictions is over a prediction threshold, see Sohn fig. 1 below.)

    PNG
    media_image1.png
    243
    533
    media_image1.png
    Greyscale

in response to that the second pseudo-label matches the first pseudo-label, add the second pseudo-labeled data to a pseudo-labeled dataset; and (Sohn abs “FixMatch first generates pseudo-labels using the model’s predictions on weakly augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image.)
train a final neural network model according to a third pseudo-labeled data generated by the trained first neural network model. (Sohn fig. 1 “The model is trained to make its prediction on the strongly-augmented version match the pseudo-label via a cross-entropy loss.” Sohn fig. 1’s training method is done at least “300” times, according to the 300 training epochs taught on page 18. When a pseudo label is “retained” (Sohn abs) then it becomes a fourth pseudo label data.) 
	The claims, Sohn and Xie are all machine learning algorithms. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use FixMatch to inject noise into Xie’s “Noisy Student” because of “its simplicity, [and] state-of-the-art performance across a variety of standard semi-supervised learning benchmarks…” Sohn abs.

Sohn teaches claims 2 and 14. The electronic device according to claim 1, wherein the processor is further configured to:
in response to a maximum probability in the probability vector being greater than a probability threshold, determine whether the second pseudo-label matches the first pseudo-label. (Sohn pg. 3 above eq. 4, “To obtain an artificial label, we first compute the model’s predicted class distribution given a weakly-augmented version of a given unlabeled image… Then, we use [highest probability prediction] as a pseudo-label…” The second-pseudo labeled data “matches” a first pseudo label when one of the predictions is over a prediction threshold, see Sohn fig. 1 below. Specifically Fig1 states “When the model assigns a probability to any class which is above a threshold (dotted line), the prediction is converted to a one-hot pseudo-label. Then, we compute the model’s prediction for a strong augmentation of the same image (bottom). The model is trained to make its prediction on the strongly-augmented version match the pseudo-label via a cross-entropy loss.”)

    PNG
    media_image1.png
    243
    533
    media_image1.png
    Greyscale


Sohn teaches claims 3 and 15. The electronic device according to claim 1, wherein the processor is further configured to: in response to the second pseudo-label matching the first pseudo-label, calculate a first cross-entropy loss between the first pseudo-labeled data and the second pseudo-labeled data; and (Sohn Fig1 states “When the model assigns a probability to any class which is above a threshold (dotted line), the prediction is converted to a one-hot pseudo-label. Then, we compute the model’s prediction for a strong augmentation of the same image (bottom). The model is trained to make its prediction on the strongly-augmented version match the pseudo-label via a cross-entropy loss.”)
train the first neural network model according to a loss function associated with the first cross-entropy loss. (Sohn pg. 3 above eq. 4, “To obtain an artificial label, we first compute the model’s predicted class distribution given a weakly-augmented version of a given unlabeled image… Then, we
use [highest probability prediction] as a pseudo-label…” The second-pseudo labeled data “matches” a first pseudo label when one of the predictions is over a prediction threshold, see Sohn fig. 1 below. Specifically Fig1 states “When the model assigns a probability to any class which is above a threshold (dotted line), the prediction is converted to a one-hot pseudo-label. Then, we compute the model’s prediction for a strong augmentation of the same image (bottom). The model is trained to make its prediction on the strongly-augmented version match the pseudo-label via a cross-entropy loss.”)

Xie teaches claims 4 and 16. The electronic device according to claim 3, wherein the processor is further configured to:
obtain a first labeled data; (labeled data in fig. 1 of Xie.)
input the first labeled data to the first neural network model to obtain a second labeled data; (Xie fig. 1 shows a teacher model which is the first iteration of the student model, and it is fed labeled data, see below. The second labeled data in Xie is Xie’s first set of pseudo-labels.)

    PNG
    media_image2.png
    478
    638
    media_image2.png
    Greyscale
 
calculate a second cross-entropy loss between the first labeled data and the second labeled data; and (Xie sec. 2 “We then train a student model which minimizes the combined cross entropy loss on both labeled images and unlabeled images.”)
train the first neural network model according to the loss function associated with the second cross-entropy loss. (Xie sec. 2 “We then train a student model which minimizes the combined cross entropy loss on both labeled images and unlabeled images.” The first student model is “equal-or-larger” which means that the first student model may just be the teacher model.)

Sohn teaches claims 8 and 20. The electronic device according to claim 1, the second pseudo-label comprises a first sub-pseudo-label corresponding to the first sub-neural network model and a second sub-pseudo-label corresponding to the second sub-neural network model, (Sohn’s first weakly augmented image is the first sub-pseudo label. The second strongly augmented image comprises the first weakly augmented image and the augments, and the strongly augmented image is the second pseudo label.) wherein the processor is further configured to:
in response to the first sub-pseudo-label matching the first pseudo-label and the second sub-pseudo-label matching the first pseudo-label, determine that the second pseudo-label matches the first pseudo-label. (Sohn Fig. 1 states “When the model assigns a probability to any class which is above a threshold (dotted line), the prediction is converted to a one-hot pseudo-label. Then, we compute the model’s prediction for a strong augmentation of the same image (bottom). The model is trained to make its prediction on the strongly-augmented version match the pseudo-label via a cross-entropy loss.” The branching nature of the different inputs make the model with the weakly-augmented image a first subnetwork, and the heavily augmented image goes into the second subnetwork. Sohn sec. 2.1 Sohn fig. 1 matches the strongly and weakly augmented images.)

    PNG
    media_image1.png
    243
    533
    media_image1.png
    Greyscale


Sohn teaches claims 9 and 21. The electronic device according to claim 1, wherein the processor is further configured to:
train a second neural network model according to a labeled dataset; (Sohn sec. 2.2 “The loss function for FixMatch consists of two cross-entropy loss terms: a supervised loss ls applied to labeled data and an unsupervised loss lu. Specifically, ls is just the standard cross-entropy loss on weakly augmented labeled examples…” The cross entropy loss is used to train the model.)
input an unlabeled dataset into the second neural network model to obtain a highly trusted pseudo-labeled dataset and a partially trusted pseudo-labeled dataset; and (The augmented images in Sohn Fig. 1 are unlabeled dataset. The Psuedo label from the weak-augmented images is the highly trusted pseudo labeled dataset. The prediction from the strongly-augmented image is the partially trusted pseudo-labeled dataset.)
train the first neural network model according to the partially trusted pseudo-labeled dataset, wherein the partially trusted pseudo-labeled dataset comprises the first pseudo-labeled data. (Sohn fig. 1 “The model is trained to make its prediction on the strongly-augmented version match the pseudo-label via a cross-entropy loss.”)

Sohn teaches claims 10 and 22. The electronic device according to claim 9, wherein the processor is further configured to:
train the final neural network model according to the labeled dataset, the highly trusted pseudo-labeled dataset, and the partially trusted pseudo-labeled dataset. (Sohn fig. 1 “The model is trained to make its prediction on the strongly-augmented version match the pseudo-label via a cross-entropy loss.” The partially trusted dataset is the predictions from the strongly-augmented images. The highly trusted and labeled dataset are the pseudo label from the weakly augmented images, which includes labeled data, Sohn sec. 2.2 “The loss function for FixMatch consists of two cross-entropy loss terms: a supervised loss ls applied to labeled data and an unsupervised loss lu. Specifically, ls is just the standard cross-entropy loss on weakly augmented labeled examples…”)

Sohn teaches claims 11 and 23. The electronic device according to claim 10, wherein the processor is further configured to:
input the third pseudo-labeled data in the partially trusted pseudo-labeled dataset into the first neural network model to obtain a fourth pseudo-labeled data; and (Sohn fig. 1’s training method is done at least “300” times, according to the 300 training epochs taught on page 18. When a pseudo label is “retained” (Sohn abs) then it becomes a third pseudo label data.)
in response to a fourth pseudo-label of the fourth pseudo-labeled data matching a third pseudo-label of the third pseudo-labeled data, update the partially trusted pseudo-labeled dataset according to the fourth pseudo-labeled data. (Sohn pg. 5 first paragraph below table 1 description, “Pseudo-labeling refers to a specific variant where model predictions are converted to hard labels [25], which is often used along with a confidence-based thresholding that retains unlabeled examples only when the classifier is sufficiently confident…”)

Sohn teaches claims 12 and 24. The electronic device according to claim 10, wherein the processor is further configured to:
input the third pseudo-labeled data in the partially trusted pseudo-labeled dataset into the first neural network model to obtain a fourth pseudo-labeled data; (Sohn fig. 1’s training method is done at least “300” times, according to the 300 training epochs taught on page 18. When a pseudo label is “retained” (Sohn abs) then it becomes a fourth pseudo label data.)
in response to a fourth pseudo-label of the fourth pseudo-labeled data not matching a third pseudo-label of the third pseudo-labeled data, output the fourth pseudo-labeled data and receive a fourth labeled data corresponding to the fourth pseudo-labeled data; and (Sohn pg. 5 first paragraph below table 1 description, “Pseudo-labeling refers to a specific variant where model predictions are converted to hard labels [25], which is often used along with a confidence-based thresholding that retains unlabeled examples only when the classifier is sufficiently confident…”)
update the labeled dataset according to the fourth labeled data. (Sohn pg. 5 first paragraph below table 1 description, “Pseudo-labeling refers to a specific variant where model predictions are converted to hard labels [25], which is often used along with a confidence-based thresholding that retains unlabeled examples only when the classifier is sufficiently confident…” If they don’t match the hard label is not retained, this is the update.)

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Austin Hicks whose telephone number is (571)270-3377. The examiner can normally be reached Monday - Thursday 8-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AUSTIN HICKS/Primary Examiner, Art Unit 2124

Read full office action

Prosecution Timeline

Nov 23, 2021

Application Filed

Dec 14, 2024

Non-Final Rejection — §103

Mar 19, 2025

Response Filed

Apr 18, 2025

Final Rejection — §103

Aug 22, 2025

Request for Continued Examination

Aug 31, 2025

Response after Non-Final Action

Sep 04, 2025

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/694,063

Patent 12608441

SCREENING FOR FLUCTUATING ENERGY RELAXATION TIMES

4y 1m to grant Granted Apr 21, 2026

17/786,650

Patent 12591767

NEURAL NETWORK ACCELERATION CIRCUIT AND METHOD

3y 9m to grant Granted Mar 31, 2026

17/706,298

Patent 12554795

REDUCING CLASS IMBALANCE IN MACHINE-LEARNING TRAINING DATASET

3y 10m to grant Granted Feb 17, 2026

17/805,674

Patent 12530630

Hierarchical Gradient Averaging For Enforcing Subject Level Privacy

3y 7m to grant Granted Jan 20, 2026

17/559,001

Patent 12524694

OPTIMIZING ROUTE MODIFICATION USING QUANTUM GENERATED ROUTE REPOSITORY

4y 0m to grant Granted Jan 13, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

76%

Grant Probability

99%

With Interview (+24.9%)

3y 1m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 404 resolved cases by this examiner. Grant probability derived from career allowance rate.