Last updated: April 19, 2026
Application No. 18/439,294
METHOD FOR SELF-SUPERVISED REPRESENTATION LEARNING FOR VISION-BASED REINFORCEMENT LEARNING ROBUST TO VISUAL DISTRACTION, DATA INFERENCE APPARATUS USING SELF-SUPERVISED LEARNING MODEL, AND STORAGE MEDIUM STORING INSTRUCTIONS TO PERFORM METHOD FOR SELF-SUPERVISED REPRESENTATION LEARNING

Final Rejection §103
Filed
Feb 12, 2024
Examiner
LIU, GORDON G
Art Unit
2618
Tech Center
2600 — Communications
Assignee
Research & Business Foundation Sungkyunkwan University
OA Round
2 (Final)
Interview Optional

— +15.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 673 resolved cases, 2023–2026
Examiner Intelligence

LIU, GORDON G View full profile →
Grants 83% — above average
Career Allow Rate
556 granted / 673 resolved
+20.6% vs TC avg
Strong +15% interview lift
Without
With
+15.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 4m
Avg Prosecution
29 currently pending
Career history
702
Total Applications
across all art units
Statute-Specific Performance

§101
6.7%
-33.3% vs TC avg
§103
73.3%
+33.3% vs TC avg
§102
3.0%
-37.0% vs TC avg
§112
5.7%
-34.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 673 resolved cases
Office Action

§103
DETAILED ACTION
This Office Action is in response to the Applicants' communication filed on January 15, 2026, which amends the independent claims 1, 9, and 17, adds new dependent claims 18-20, and presents arguments, is hereby acknowledged. Claims 1-20 are currently pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Applicant’s arguments filed on January 15, 2026, have been fully considered.
	Applicant argues that by this response, the independent claims 1, 9, and 17 are hereby amended to add a new limitation “training the generator and the discriminator to minimize at least one cost function with respect to the first latent state and the second latent state, the at least one cost function configured to perform optimization based on a difference between a first latent state value, determined by passing the first latent state through a classifier, and a second latent state value, determined by passing the second latent state through the classifier” in order to overcome the 35 U.S.C. §103 rejection.
Examiner replies that the amended claims with new limitation may overcome the cited portions of the prior arts. However, a newly found art, Zhong, etc. (US 20200349447 A1) teaches that the at least one cost function configured to perform optimization based on a difference between a first latent state value, determined by passing the first latent state through a classifier, and a second latent state value, determined by passing the second latent state through the classifier (See Zhong: Figs. 5A-B, and [0061], “If the latent space point E(G(Z)), which is output from the encoder E, were itself input to the generator G, then the generator G would map the E(G(Z)) back to the ambient space G(E(G(Z)). If the ambient space point G(E(G(Z))) were itself to be input to the encoder E, then the encoding map E would map G(E(G(Z)) to the latent space point E(G(E(G(Z)))), which is the same latent space as that of Z. As such, E(G(Z)) and E(G(E(G(Z)))) are the latent space representations of G(Z) and G(E(G(Z))), respectively. According to implementations of this disclosure, the encoder E learns (i.e., is trained) to minimize the difference between E(G(E(G(Z)))) and E(G(Z))”. Note that the difference is mapped to the cost function, and E(G(E(G(Z)))) and E(G(Z)) are mapped to first latent state value and second latent state value respectively). The remaining arguments of the applicant are mooted in view of the newly found art.
Examiner respectfully further replies that the Applicant's arguments have been fully considered and a new ground of rejections have been made. Accordingly, new grounds of rejection are set forth below. Since the new grounds of rejection are necessitated by Applicant's amendments to the claims, the present action is made final.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 9-10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Babagholami, etc. (US 20220301296 A1) in view of Gurevich (US 20230036068 A1), further in view of Zhong, etc. (US 20200349447 A1).
Regarding claim 1, Babagholami teaches that a self-supervised representation training method (See Babagholami: Figs. 1 and 5, and [0051], "FIG. 5 depicts an electronic device 500 that may be configured to train a neural network using the MEAR learning architecture according to the subject matter disclosed herein. The electronic device 500 may include a controller (or CPU) 510, an input/output device 520 such as, but not limited to, a keypad, a keyboard, a display, a touch-screen display, a 2D image sensor, a 3D image sensor, a memory 530, an interface 540, a GPU 550, an imaging-processing unit 560, a neural processing unit 570, a TOF processing unit 580 that are coupled to each other through a bus 590") for performing self-supervised representation training for vision-based reinforcement training robust to visual distractions using a self-supervised representation training program including a generator and a discriminator, the method comprising:
receiving an image to generate a first augmented image and a second augmented image using at least one predetermined augmentation algorithm (See Babagholami: Fig. 1, and [0033], "FIG. 1 depicts an overview of an example embodiment of a MEAR learning model architecture 100 according to the subject matter disclosed herein. The inputs 101 to the network may be weakly and strongly augmented examples in which x is a data point and y is a label. The MEAR learning model 100 may include a feature extractor f and two experts h.sub.1 and h.sub.2 (K=2) that may be trained to minimize a supervised loss custom-character.sub.s (Eq. 2) on weakly augmented samples and to minimize the diversity loss custom-character.sub.don strongly augmented samples (Eq. 3)");
inputting the first augmented image and the second augmented image to the generator to output a first latent state from the first augmented image and output a second latent state from the second augmented image (See Babagholami: Fig. 1, and [0033], "The MEAR learning model 100 may include a feature extractor f and two experts h.sub.1 and h.sub.2 (K=2) that may be trained to minimize a supervised loss custom-character.sub.s (Eq. 2) on weakly augmented samples and to minimize the diversity loss custom-character.sub.don strongly augmented samples (Eq. 3). The feature extractor f may be trained to minimize the supervised loss on weakly augmented samples, and to minimize the multi-expert consensus loss custom-character.sub.c (Eq. 6) on strongly augmented samples", The feature extractor extracts features for each weaky and strongly augmented images, and those extracted image features may be the latent states); and 
training the generator and the discriminator to minimize at least one cost function with respect to the first latent state and the second latent state (See Babagholami: Fig. 1, and [0033], "The MEAR learning model 100 may include a feature extractor f and two experts h.sub.1 and h.sub.2 (K=2) that may be trained to minimize a supervised loss custom-character.sub.s (Eq. 2) on weakly augmented samples and to minimize the diversity loss custom-character.sub.don strongly augmented samples (Eq. 3). The feature extractor f may be trained to minimize the supervised loss on weakly augmented samples, and to minimize the multi-expert consensus loss custom-character.sub.c (Eq. 6) on strongly augmented samples. A backward path is provided for each supervised loss custom-character.sub.s to the corresponding expert and to the feature extractor. A backward path is provided for the diversity loss custom-character.sub.d to each of the experts, and a backward path is provided for the consensus loss custom-character.sub.c to the feature extractor"), the at least one cost function configured to perform optimization based on a difference between a first latent state value, determined by passing the first latent state through a classifier, and a second latent state value, determined by passing the second latent state through the classifier.
However, Babagholami fails to explicitly disclose that for performing self-supervised representation training for vision-based reinforcement training robust to visual distractions using a self-supervised representation training program including a generator and a discriminator; and the at least one cost function configured to perform optimization based on a difference between a first latent state value, determined by passing the first latent state through a classifier, and a second latent state value, determined by passing the second latent state through the classifier.
However, Gurevich teaches that for performing self-supervised representation training for vision-based reinforcement training robust to visual distractions using a self-supervised representation training program including a generator and a discriminator (See Gurevich: Figs. 5 and 13, and [0219], "FIG. 5 illustrates an exemplary GAN model, according to some examples. With reference to FIG. 5, an input image 502 (e.g., a fluorescence perfusion image of breast tissue) is provided to a trained GAN model 504. The GAN model 504 outputs one or more anomaly scores. The one or more anomaly scores can be a plurality of pixel-wise scores, an image-wise score, or a combination thereof. In the depicted example, the GAN model comprises a generator 504a, a discriminator 504b, and an encoder 504c. Exemplary implementations of the GAN model are described herein with reference to FIGS. 8A-8B and 9"; and [0284], "FIG. 13 illustrates training of the encoder and the model used in process 1200, in accordance with some examples. With reference to FIG. 13, at Tl, self-supervised training 1302 is performed on the encoder model 1305 using a plurality of unlabeled training images 1310. In some examples, the plurality of training images comprises non-medical images 1311 and medical images 1312. The non-medical images can comprise one or more datasets of natural images. For example, images from a publicly available image dataset (e.g., lmageNet) can be used. In some examples, the encoder model 1305 is first trained using non-medical images 1311 before trained using medical images 1312").
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was effectively filed to modify Babagholami to have for performing self-supervised representation training for vision-based reinforcement training robust to visual distractions using a self-supervised representation training program including a generator and a discriminator as taught by Gurevich in order to lead to better usage and management of computer memory and more efficient usage of compute processing power, thus improving functioning of a computer system (See Gurevich: Fig. 5, and [0010], "Collecting, labeling, and storing a large volume of training images can lead to inefficient usage of computer memory and processing power. By using unlabeled normal images to train the GAN model, the techniques can lead to better usage and management of computer memory and more efficient usage of compute processing power, thus improving functioning of a computer system. Further, the various machine-learning models described herein can result in more accurate diagnosis and/or treatment of diseases, as discussed herein"). Babagholami teaches a method and system that may train a multi-expert neural network using adversarial regularization for deep supervised learning with the input image augmented weakly and strongly to minimize various costs; while Gurevich teaches a system and method that may train generative adversarial (GAN) model implemented in self-supervised training architecture with a generator and a discriminator to minimize cost functions. Therefore, it is obvious to one of ordinary skill in the art to modify Babagholami by Gurevich to implement the GNN with a generator and a discriminator and in self-supervised training mode. The motivation to modify Babagholami by Gurevich is "Use of known technique to improve similar devices (methods, or products) in the same way".
However, Babagholami, modified by Gurevich, fails to explicitly disclose that the at least one cost function configured to perform optimization based on a difference between a first latent state value, determined by passing the first latent state through a classifier, and a second latent state value, determined by passing the second latent state through the classifier.
However, Zhong teaches that the at least one cost function configured to perform optimization based on a difference between a first latent state value, determined by passing the first latent state through a classifier, and a second latent state value, determined by passing the second latent state through the classifier (See Zhong: Figs. 5A-B, and [0061], “If the latent space point E(G(Z)), which is output from the encoder E, were itself input to the generator G, then the generator G would map the E(G(Z)) back to the ambient space G(E(G(Z)). If the ambient space point G(E(G(Z))) were itself to be input to the encoder E, then the encoding map E would map G(E(G(Z)) to the latent space point E(G(E(G(Z)))), which is the same latent space as that of Z. As such, E(G(Z)) and E(G(E(G(Z)))) are the latent space representations of G(Z) and G(E(G(Z))), respectively. According to implementations of this disclosure, the encoder E learns (i.e., is trained) to minimize the difference between E(G(E(G(Z)))) and E(G(Z))”. Note that the difference is mapped to the cost function, and E(G(E(G(Z)))) and E(G(Z)) are mapped to first latent state value and second latent state value respectively).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention was effectively filed to modify Babagholami to have the at least one cost function configured to perform optimization based on a difference between a first latent state value, determined by passing the first latent state through a classifier, and a second latent state value, determined by passing the second latent state through the classifier as taught by Zhong in order to improve the performance such as accuracy (See Zhong: Fig. 1, and [0030], "The end result is that when both the generator 102 and the discriminator 104 converge, the discriminator 104 can no longer distinguish the generated sample G(Z) from a real data sample Y. At this point, the generator 102 can be regarded as having learned the distribution of the real data Y. By convergence is meant that additional training of either of the generator 102 and/or the discriminator 104 does lead to improved (or sufficiently improved) performance"). Babagholami teaches a method and system that may train a multi-expert neural network using adversarial regularization for deep supervised learning with the input image augmented weakly and strongly to minimize various costs; while Zhong teaches a system and method that may optimize the unsupervised generative adversarial networks via latent space regularizations by minimizing the cost function that measures the difference between the first latent state value and the second latent state value. Therefore, it is obvious to one of ordinary skill in the art to modify Babagholami by Zhong to optimize the self-supervised training model by minimizing the cost function. The motivation to modify Babagholami by Zhong is "Use of known technique to improve similar devices (methods, or products) in the same way".
Regarding claim 2, Babagholami, Gurevich, and Zhong teach all the features with respect to claim 1 as outlined above. Further, Gurevich teaches that the self-supervised representation training method of claim 1, wherein the augmentation algorithm includes a spatial augmentation algorithm and a pixel-level augmentation algorithm (See Gurevich: Fig. 14, and [0286], "FIG. 14 illustrates training of an exemplary contrastive learning algorithm using non-medical images (e.g., images 1311 in FIG. 13), in accordance with some examples. During training, an original image Xis obtained (e.g., one of the non-medical images 1311 in FIG. 13). Data transformation or augmentation 1402 can be applied to the original image X to obtain two augmented images X.sub.i and X.sub.j. For example, the system can randomly apply two separate data augmentation operators (e.g., crop, flip, color jitter, grayscale, blur) to obtain X.sub.i and X.sub.j"), and
wherein the generating the first augmented image and the second augmented image (See Gurevich: Fig. 14, and [0286], "FIG. 14 illustrates training of an exemplary contrastive learning algorithm using non-medical images (e.g., images 1311 in FIG. 13), in accordance with some examples. During training, an original image Xis obtained (e.g., one of the non-medical images 1311 in FIG. 13). Data transformation or augmentation 1402 can be applied to the original image X to obtain two augmented images X.sub.i and X.sub.j. For example, the system can randomly apply two separate data augmentation operators (e.g., crop, flip, color jitter, grayscale, blur) to obtain X.sub.i and X.sub.j") comprises:
generating the first augmented image using the spatial augmentation algorithm (See Gurevich: Fig. 14, and [0286], "FIG. 14 illustrates training of an exemplary contrastive learning algorithm using non-medical images (e.g., images 1311 in FIG. 13), in accordance with some examples. During training, an original image Xis obtained (e.g., one of the non-medical images 1311 in FIG. 13). Data transformation or augmentation 1402 can be applied to the original image X to obtain two augmented images X.sub.i and X.sub.j. For example, the system can randomly apply two separate data augmentation operators (e.g., crop, flip, color jitter, grayscale, blur) to obtain X.sub.i and X.sub.j". Note that crop and flip is spatial level augmentation); and
generating the second augmented image using the spatial augmentation algorithm and the pixel-level augmentation algorithm (See Gurevich: Fig. 14, and [0286], "FIG. 14 illustrates training of an exemplary contrastive learning algorithm using non-medical images (e.g., images 1311 in FIG. 13), in accordance with some examples. During training, an original image Xis obtained (e.g., one of the non-medical images 1311 in FIG. 13). Data transformation or augmentation 1402 can be applied to the original image X to obtain two augmented images X.sub.i and X.sub.j. For example, the system can randomly apply two separate data augmentation operators (e.g., crop, flip, color jitter, grayscale, blur) to obtain X.sub.i and X.sub.j". Note that the color jitter, grayscale, and blur are pixel level augmentation). 
Regarding claim 9, Babagholami, Gurevich, and Zhong teach all the features with respect to claim 1 as outlined above. Further, Babagholami, Gurevich, and Zhong teach that a non-transitory computer readable storage medium storing computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a self-supervised representation training method for performing self-supervised representation training for vision-based reinforcement training robust to visual distractions using a self-supervised representation training program including a generator and a discriminator (See Gurevich: Figs. 5 and 13, and [0219], "FIG. 5 illustrates an exemplary GAN model, according to some examples. With reference to FIG. 5, an input image 502 (e.g., a fluorescence perfusion image of breast tissue) is provided to a trained GAN model 504. The GAN model 504 outputs one or more anomaly scores. The one or more anomaly scores can be a plurality of pixel-wise scores, an image-wise score, or a combination thereof. In the depicted example, the GAN model comprises a generator 504a, a discriminator 504b, and an encoder 504c. Exemplary implementations of the GAN model are described herein with reference to FIGS. 8A-8B and 9"; and [0284], "FIG. 13 illustrates training of the encoder and the model used in process 1200, in accordance with some examples. With reference to FIG. 13, at Tl, self-supervised training 1302 is performed on the encoder model 1305 using a plurality of unlabeled training images 1310. In some examples, the plurality of training images comprises non-medical images 1311 and medical images 1312. The non-medical images can comprise one or more datasets of natural images. For example, images from a publicly available image dataset (e.g., lmageNet) can be used. In some examples, the encoder model 1305 is first trained using non-medical images 1311 before trained using medical images 1312"), the method (See Babagholami: Figs. 1 and 5, and [0051], "FIG. 5 depicts an electronic device 500 that may be configured to train a neural network using the MEAR learning architecture according to the subject matter disclosed herein. The electronic device 500 may include a controller (or CPU) 510, an input/output device 520 such as, but not limited to, a keypad, a keyboard, a display, a touch-screen display, a 2D image sensor, a 3D image sensor, a memory 530, an interface 540, a GPU 550, an imaging-processing unit 560, a neural processing unit 570, a TOF processing unit 580 that are coupled to each other through a bus 590") comprising:
receiving an image to generate a first augmented image and a second augmented image using at least one augmentation algorithm (See Babagholami: Fig. 1, and [0033], "FIG. 1 depicts an overview of an example embodiment of a MEAR learning model architecture 100 according to the subject matter disclosed herein. The inputs 101 to the network may be weakly and strongly augmented examples in which xis a data point and y is a label. The MEAR learning model 100 may include a feature extractor f and two experts h.sub.1 and h.sub.2 (K=2) that may be trained to minimize a supervised loss custom-character.sub.s (Eq. 2) on weakly augmented samples and to minimize the diversity loss custom-character.sub.don strongly augmented samples (Eq. 3)");
inputting the first augmented image and the second augmented image to the generator to output a first latent state from the first augmented image and output a second latent state from the second augmented image (See Babagholami: Fig. 1, and [0033], "The MEAR learning model 100 may include a feature extractor f and two experts h.sub.1 and h.sub.2 (K=2) that may be trained to minimize a supervised loss custom-character.sub.s (Eq. 2) on weakly augmented samples and to minimize the diversity loss custom-character.sub.don strongly augmented samples (Eq. 3). The feature extractor f may be trained to minimize the supervised loss on weakly augmented samples, and to minimize the multi-expert consensus loss custom-character.sub.c (Eq. 6) on strongly augmented samples", The feature extractor extracts features for each weaky and strongly augmented images, and those extracted image features may be the latent states); and
training the generator and the discriminator to minimize at least one cost function with respect to the first latent state and the second latent state (See Babagholami: Fig. 1, and [0033], "The MEAR learning model 100 may include a feature extractor f and two experts h.sub.1 and h.sub.2 (K=2) that may be trained to minimize a supervised loss custom-character.sub.s (Eq. 2) on weakly augmented samples and to minimize the diversity loss custom-character.sub.d on strongly augmented samples (Eq. 3). The feature extractor f may be trained to minimize the supervised loss on weakly augmented samples, and to minimize the multi-expert consensus loss custom-character.sub.c (Eq. 6) on strongly augmented samples. A backward path is provided for each supervised loss custom-character.sub.s to the corresponding expert and to the feature extractor. A backward path is provided for the diversity loss custom-character.sub.d to each of the experts, and a backward path is provided for the consensus loss custom-character.sub.c to the feature extractor"), the at least one cost function configured to perform optimization based on a difference between a first latent state value, determined by passing the first latent state through a classifier, and a second latent state value, determined by passing the second latent state through the classifier (See Zhong: Figs. 5A-B, and [0061], “If the latent space point E(G(Z)), which is output from the encoder E, were itself input to the generator G, then the generator G would map the E(G(Z)) back to the ambient space G(E(G(Z)). If the ambient space point G(E(G(Z))) were itself to be input to the encoder E, then the encoding map E would map G(E(G(Z)) to the latent space point E(G(E(G(Z)))), which is the same latent space as that of Z. As such, E(G(Z)) and E(G(E(G(Z)))) are the latent space representations of G(Z) and G(E(G(Z))), respectively. According to implementations of this disclosure, the encoder E learns (i.e., is trained) to minimize the difference between E(G(E(G(Z)))) and E(G(Z))”. Note that the difference is mapped to the cost function, and E(G(E(G(Z)))) and E(G(Z)) are mapped to first latent state value and second latent state value respectively).
Regarding claim 10, Babagholami, Gurevich, and Zhong teach all the features with respect to claim 9 as outlined above. Further, Gurevich teaches that the non-transitory computer readable storage medium of claim 9, wherein the augmentation algorithm includes a spatial augmentation algorithm and a pixel-level augmentation algorithm (See Gurevich: Fig. 14, and [0286], "FIG. 14 illustrates training of an exemplary contrastive learning algorithm using non-medical images (e.g., images 1311 in FIG. 13), in accordance with some examples. During training, an original image Xis obtained (e.g., one of the non-medical images 1311 in FIG. 13). Data transformation or augmentation 1402 can be applied to the original image X to obtain two augmented images X.sub.i and X.sub.j. For example, the system can randomly apply two separate data augmentation operators (e.g., crop, flip, color jitter, grayscale, blur) to obtain X.sub.i and X.sub.j", and
wherein the generating of the first augmented image and the second augmented image comprises generating the first augmented image using the spatial augmentation algorithm (See Gurevich: Fig. 14, and [0286], "FIG. 14 illustrates training of an exemplary contrastive learning algorithm using non-medical images (e.g., images 1311 in FIG. 13), in accordance with some examples. During training, an original image Xis obtained (e.g., one of the non-medical images 1311 in FIG. 13). Data transformation or augmentation 1402 can be applied to the original image X to obtain two augmented images X.sub.i and X.sub.j. For example, the system can randomly apply two separate data augmentation operators (e.g., crop, flip, color jitter, grayscale, blur) to obtain X.sub.i and X.sub.j". Note that crop and flip is spatial level augmentation); and
generating the second augmented image using the spatial augmentation algorithm and the pixel-level augmentation algorithm (See Gurevich: Fig. 14, and [0286], "FIG. 14 illustrates training of an exemplary contrastive learning algorithm using non-medical images (e.g., images 1311 in FIG. 13), in accordance with some examples. During training, an original image X is obtained (e.g., one of the non-medical images 1311 in FIG. 13). Data transformation or augmentation 1402 can be applied to the original image X to obtain two augmented images X.sub.i and X.sub.j. For example, the system can randomly apply two separate data augmentation operators (e.g., crop, flip, color jitter, grayscale, blur) to obtain X.sub.i and X.sub.j". Note that the color jitter, grayscale, and blur are pixel level augmentation).
Regarding claim 17, Babagholami, Gurevich, and Zhong teach all the features with respect to claim 1 as outlined above. Further, Babagholami, Gurevich, and Zhong teach that the device for inferring data using a self-supervised training model, the device (See Babagholami: Figs. 1 and 5, and [0051], "FIG. 5 depicts an electronic device 500 that may be configured to train a neural network using the MEAR learning architecture according to the subject matter disclosed herein. The electronic device 500 may include a controller (or CPU) 510, an input/output device 520 such as, but not limited to, a keypad, a keyboard, a display, a touch-screen display, a 2D image sensor, a 3D image sensor, a memory 530, an interface 540, a GPU 550, an imaging-processing unit 560, a neural processing unit 570, a TOF processing unit 580 that are coupled to each other through a bus 590") comprising:
a memory configured to store one or more instructions (See Babagholami: Fig. 5, and [0051], "FIG. 5 depicts an electronic device 500 that may be configured to train a neural network using the MEAR learning architecture according to the subject matter disclosed herein. The electronic device 500 may include a controller (or CPU) 510, an input/output device 520 such as, but not limited to, a keypad, a keyboard, a display, a touch-screen display, a 2D image sensor, a 3D image sensor, a memory 530, an interface 540, a GPU 550, an imaging-processing unit 560, a neural processing unit 570, a TOF processing unit 580 that are coupled to each other through a bus 590. The controller 510 may include, for example, at least one microprocessor, at least one digital signal processor, at least one microcontroller, or the like. The memory 530 may be configured to store a command code to be used by the controller 510 and/or to store a user data. In one embodiment, the controller 510 may configure and control the neural processing unit 570 or a neural processing unit (not shown) that is external to the electronic device 500 to train a neural network using the MEAR learning architecture according to the subject matter disclosed herein"); and
a processor configured to execute the one or more instructions stored in the memory, wherein the instructions, when executed by the processor, cause the processor to check input data, input the input data to the self-supervised training model, and check results inferred by the self-supervised training model (See Babagholami: Fig. 5, and [0051], "FIG. 5 depicts an electronic device 500 that may be configured to train a neural network using the MEAR learning architecture according to the subject matter disclosed herein. The electronic device 500 may include a controller (or CPU) 510, an input/output device 520 such as, but not limited to, a keypad, a keyboard, a display, a touch-screen display, a 2D image sensor, a 3D image sensor, a memory 530, an interface 540, a GPU 550, an imaging-processing unit 560, a neural processing unit 570, a TOF processing unit 580 that are coupled to each other through a bus 590. The controller 510 may include, for example, at least one microprocessor, at least one digital signal processor, at least one microcontroller, or the like. The memory 530 may be configured to store a command code to be used by the controller 510 and/or to store a user data. In one embodiment, the controller 510 may configure and control the neural processing unit 570 or a neural processing unit (not shown) that is external to the electronic device 500 to train a neural network using the MEAR learning architecture according to the subject matter disclosed herein"),
wherein the self-supervised learning model is trained by a self-supervised representation training method (See Babagholami: Fig. 2, and [0042], "Training involves training K experts and the feature extractor. The K experts may take inputs from the feature extractor and maximize custom-character(h.sub.1(·), ... , h.sub.K(·)) over strongly augmented samples, and the feature extractor tries to generate discriminative and robust features. This involves two operations") comprising:
receiving an training image and generating a first augmented image and a second augmented image using at least one augmentation algorithm (See Babagholami: Fig. 1, and [0033], "FIG. 1 depicts an overview of an example embodiment of a MEAR learning model architecture 100 according to the subject matter disclosed herein. The inputs 101 to the network may be weakly and strongly augmented examples in which xis a data point and y is a label. The MEAR learning model 100 may include a feature extractor f and two experts h.sub.1 and h.sub.2 (K=2) that may be trained to minimize a supervised loss custom-character.sub.s (Eq. 2) on weakly augmented samples and to minimize the diversity loss custom-character.sub.d on strongly augmented samples (Eq. 3)");
inputting the first augmented image and the second augmented image to a generator to output a first latent state from the first augmented image and output a second latent state from the second augmented image (See Babagholami: Fig. 1, and [0033], "The MEAR learning model 100 may include a feature extractor f and two experts h.sub.1 and h.sub.2 (K=2) that may be trained to minimize a supervised loss custom-character.sub.s (Eq. 2) on weakly augmented samples and to minimize the diversity loss custom-character.sub.don strongly augmented samples (Eq. 3). The feature extractor f may be trained to minimize the supervised loss on weakly augmented samples, and to minimize the multi-expert consensus loss custom-character.sub.c (Eq. 6) on strongly augmented samples", The feature extractor extracts features for each weaky and strongly augmented images, and those extracted image features may be the latent states); and
training the generator and the discriminator to minimize at least one cost function with respect to the first latent state and the second latent state (See Babagholami: Fig. 1, and [0033], "The MEAR learning model 100 may include a feature extractor f and two experts h.sub.1 and h.sub.2 (K=2) that may be trained to minimize a supervised loss custom-character.sub.s (Eq. 2) on weakly augmented samples and to minimize the diversity loss custom-character.sub.don strongly augmented samples (Eq. 3). The feature extractor f may be trained to minimize the supervised loss on weakly augmented samples, and to minimize the multi-expert consensus loss custom-character.sub.c (Eq. 6) on strongly augmented samples. A backward path is provided for each supervised loss custom-character.sub.s to the corresponding expert and to the feature extractor. A backward path is provided for the diversity loss custom-character.sub.d to each of the experts, and a backward path is provided for the consensus loss custom-character.sub.c to the feature extractor"), the at least one cost function configured to perform optimization based on a difference between a first latent state value, determined by passing the first latent state through a classifier, and a second latent state value, determined by passing the second latent state through the classifier (See Zhong: Figs. 5A-B, and [0061], “If the latent space point E(G(Z)), which is output from the encoder E, were itself input to the generator G, then the generator G would map the E(G(Z)) back to the ambient space G(E(G(Z)). If the ambient space point G(E(G(Z))) were itself to be input to the encoder E, then the encoding map E would map G(E(G(Z)) to the latent space point E(G(E(G(Z)))), which is the same latent space as that of Z. As such, E(G(Z)) and E(G(E(G(Z)))) are the latent space representations of G(Z) and G(E(G(Z))), respectively. According to implementations of this disclosure, the encoder E learns (i.e., is trained) to minimize the difference between E(G(E(G(Z)))) and E(G(Z))”. Note that the difference is mapped to the cost function, and E(G(E(G(Z)))) and E(G(Z)) are mapped to first latent state value and second latent state value respectively).


Claims 3 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Babagholami, etc. (US 20220301296 A1) in view of Gurevich (US 20230036068 A1), further in view of Zhong, etc. (US 20200349447 A1) and Lang, etc. (US 20190258953 A1).
Regarding claim 3, Babagholami, Gurevich, and Zhong teach all the features with respect to claim 1 as outlined above. Further, Babagholami teaches that the self-supervised representation training method of claim 1, wherein the training the generator and the discriminator comprises:
inputting the first latent state and the second latent state to a reinforcement training model (See Babagholami: Figs. 1 and 5, and [0033], "A backward path is provided for each supervised loss custom-character.sub.s to the corresponding expert and to the feature extractor. A backward path is provided for the diversity loss custom-character.sub.d to each of the experts, and a backward path is provided for the consensus loss custom-character.sub.c to the feature extractor". The backward loopback to minimize the cost functions may be corresponding to the reinforcement training models because the model parameters are adjusted based on the backward information); and
determining a third latent state corresponding to a next state of the first latent state and a fourth latent state corresponding to a next state of the second latent state using an action of an agent determined according to a control policy of the reinforcement learning model (See Babagholami: Figs. 1 and 5, and [0033], "A backward path is provided for each supervised loss custom-character.sub.s to the corresponding expert and to the feature extractor. A backward path is provided for the diversity loss custom-character.sub.d to each of the experts, and a backward path is provided for the consensus loss custom-character.sub.c to the feature extractor". The first backward process will produce new features that may be the third and fourth latent states),
wherein the reinforcement learning model is trained to determine a control policy for maximizing cumulative reward.
However, Babagholami, modified by Gurevich and Zhong, fails to explicitly disclose that wherein the reinforcement learning model is trained to determine a control policy for maximizing cumulative reward.
However, Lang teaches that wherein the reinforcement learning model is trained to determine a control policy for maximizing cumulative reward (See Lang: Fig. 20, and [0303], "The secure ML entity (2000) provides external ML function entity/entities (2050), which ML application(s) (2010) can use to directly communicate with ML security provider entity/entities (2040). This allows the secure ML entity to, for example, provide additional security functions that have no interceptable pendant in the ML toolkit (2020). In an example, external ML function entity/entities (2050) provide features such as access control policy decision points, encryption/decryption, logging etc.").
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the invention was effectively filed to modify Babagholami to have wherein the reinforcement learning model is trained to determine a control policy for maximizing cumulative reward as taught by Lang in order to enable reducing costs of vulnerabilities assessment per device and testing a high number of devices in place (See Lang: Fig. 13, and [0337], "The assessment/testing may be for example performed by a small, portable easy to use and affordable device (vs. by a human red team). The portable device may also be connected to a backend (e.g. cloud system) which carries out some of the assessment/testing steps. This not only reduces the costs of vulnerabilities assessment per device, but allows testing a high number of devices in the first place. In another aspect of the present invention, the assessment/testing is performed on a single device not connected to other systems, for example on a desktop system"). Babagholami teaches a method and system that may train a multi-expert neural network using adversarial regularization for deep supervised learning with the input image augmented weakly and strongly to minimize various costs; while Lang teaches a system and method that may control the data access policy for the machine learning model and determining the actions on environment. Therefore, it is obvious to one of ordinary skill in the art to modify Babagholami by Lang to implement the GNN with control policy determination based on backward path processing to control data accessing or model parameter updating rules to minimize cost functions. The motivation to modify Babagholami by Lang is "Use of known technique to improve similar devices (methods, or products) in the same way".
Regarding claim 11, Babagholami, Gurevich, and Zhong teach all the features with respect to claim 9 as outlined above. Further, Babagholami and Lang teach that the non-transitory computer readable storage medium of claim 9, wherein the training the generator and the discriminator comprises inputting the first latent state and the second latent state to a reinforcement training model (See Babagholami: Figs. 1 and 5, and [0033], "A backward path is provided for each supervised loss custom-character.sub.s to the corresponding expert and to the feature extractor. A backward path is provided for the diversity loss custom-character.sub.d to each of the experts, and a backward path is provided for the consensus loss custom-character.sub.c to the feature extractor". The backward loopback to minimize the cost functions may be corresponding to the reinforcement training models because the model parameters are adjusted based on the backward information); and 
determining a third latent state corresponding to a next state of the first latent state and a fourth latent state corresponding to a next state of the second latent state using an action of an agent determined according to a control policy of the reinforcement learning model (See Babagholami: Figs. 1 and 5, and [0033], "A backward path is provided for each supervised loss custom-character.sub.s to the corresponding expert and to the feature extractor. A backward path is provided for the diversity loss custom-character.sub.d to each of the experts, and a backward path is provided for the consensus loss custom-character.sub.c to the feature extractor". The first backward process will produce new features that may be the third and fourth latent states),
wherein the reinforcement learning model is trained to determine a control policy for maximizing cumulative reward (See Lang: Fig. 20, and [0303], "The secure ML entity (2000) provides external ML function entity/entities (2050), which ML application(s) (2010) can use to directly communicate with ML security provider entity/entities (2040). This allows the secure ML entity to, for example, provide additional security functions that have no interceptable pendant in the ML toolkit (2020). In an example, external ML function entity/entities (2050) provide features such as access control policy decision points, encryption/decryption, logging etc.").


Allowable Subject Matter

Claims 4-8 and 12-16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The best arts searched do not teach the claimed limitation of "The self-supervised representation training method of claim 3, wherein the self-supervised representation training program further includes an inverse dynamics module, wherein the self-supervised representation learning method further comprises: inferring a first action from the first latent state and the fourth latent state using the inverse dynamics module; and inferring a second action from the second latent state and the third latent state using the inverse dynamics module".
Claims 18-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The best arts searched do not teach the claimed limitation of "the self-supervised representation learning method of claim 1, wherein the at least one cost function comprises a first function and a second function, the first function configured to perform optimization such that the second latent state value is greater than the first latent state value, the second function configured to perform optimization such that the first latent state value is greater than the second latent state value ".




Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to GORDON G LIU whose telephone number is (571)270-0382. The examiner can normally be reached Monday - Friday 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Devona E Faulk can be reached at 571-272-7515. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GORDON G LIU/Primary Examiner, Art Unit 2618
Read full office action
Prosecution Timeline

Feb 12, 2024
Application Filed
Oct 16, 2025
Non-Final Rejection — §103
Jan 15, 2026
Response Filed
Feb 21, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/236,346
Patent 12602846
GENERATING REALISTIC MACHINE LEARNING-BASED PRODUCT IMAGES FOR ONLINE CATALOGS
2y 5m to grant Granted Apr 14, 2026
18/442,998
Patent 12602840
IMAGE PROCESSING SYSTEM, IMAGE PROCESSING METHOD, AND STORAGE MEDIUM
2y 5m to grant Granted Apr 14, 2026
18/468,209
Patent 12602871
MESH TOPOLOGY GENERATION USING PARALLEL PROCESSING
2y 5m to grant Granted Apr 14, 2026
18/527,183
Patent 12592022
INTEGRATION CACHE FOR THREE-DIMENSIONAL (3D) RECONSTRUCTION
2y 5m to grant Granted Mar 31, 2026
18/014,973
Patent 12586330
DISPLAYING A VIRTUAL OBJECT IN A REAL-LIFE SCENE
2y 5m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
83%
Grant Probability
98%
With Interview (+15.1%)
2y 4m
Median Time to Grant
Moderate
PTA Risk
Based on 673 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD FOR SELF-SUPERVISED REPRESENTATION LEARNING FOR VISION-BASED REINFORCEMENT LEARNING ROBUST TO VISUAL DISTRACTION, DATA INFERENCE APPARATUS USING SELF-SUPERVISED LEARNING MODEL, AND STORAGE MEDIUM STORING INSTRUCTIONS TO PERFORM METHOD FOR SELF-SUPERVISED REPRESENTATION LEARNING

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email