Last updated: April 19, 2026
Application No. 17/154,401
MOMENTUM CONTRASTIVE AUTOENCODER

Final Rejection §103§112
Filed
Jan 21, 2021
Examiner
ALABI, OLUWATOSIN O
Art Unit
2129
Tech Center
2100 — Computer Architecture & Software
Assignee
Salesforce Com Inc.
OA Round
4 (Final)
Interview Optional

— +26.3% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 199 resolved cases, 2023–2026
Examiner Intelligence

ALABI, OLUWATOSIN O View full profile →
Grants 58% of resolved cases
Career Allow Rate
116 granted / 199 resolved
+3.3% vs TC avg
Strong +26% interview lift
Without
With
+26.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
45 currently pending
Career history
244
Total Applications
across all art units
Statute-Specific Performance

§101
21.9%
-18.1% vs TC avg
§103
40.0%
+0.0% vs TC avg
§102
9.5%
-30.5% vs TC avg
§112
23.2%
-16.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 199 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Applicant claims the benefit of a prior-filed US Provisional Application No. 63/086,579, filed October 1, 2020, which is acknowledged.
Drawings
The drawings were received on 01/21/2021. These drawings are acceptable.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/24/2021 has been considered by the examiner.
Response to Arguments
Applicant's arguments filed 12/05/2025 have been fully considered.
Regarding applicant remarks directed to claim rejections under 35 USC § 101 related abstract ideas, the limitations have been amended, and the rejection made in the previous action has been withdrawn.
Regarding claims 35 USC 112(a) and 112 (b) made in the previous office action have been withdrawn in light of the amended limitations. See office action for amended limitations.

Regarding applicant’s remaining remarks directed to claim rejections under 35 USC § 103, the remarks are directed to limitations that were not previously examined by the examiner. See updated rejection below. 
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claim 21 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. Specifically, claim 21 recites “further comprising, generating a first output image via the trained decoder neural network and the trained first encoder neural network without the trained second neural network, in response to a first input image” (emphasis added) includes a negative limitation that does not appear to have basis in the original disclosure. The MPEP requires that a negative limitation or exclusionary proviso must have basis in the original disclosure. Any claim containing a negative limitation which does not have basis in the original disclosure should be rejected under 35 U.S.C. 112(a)  or pre-AIA  35 U.S.C. 112, first paragraph, as failing to comply with the written description requirement. See MPEP § 2163 - § 2163.07(b) for a discussion of the written description requirement of 35 U.S.C. 112(a).


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.



Claims 1-17, and 20-22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 1, the limitation “using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters” that renders the claim indefinite because the limitation includes conflicting requirements rendering the it incoherent. Specifically, the limitation requires the second and first encoder to have the same structure (e.g. same parameters, layers, network type… etc.) and also have different structure attributes by requiring “comprising second encoder parameters that are different from the first encoder parameters”. How can a set of encoders concurrently be considered the same and different with respective to their respective parameters? It makes it difficult to ascertain what properties qualify the encoder to be the same such that a person of ordinary skill in the art could ascertain the intended scope of claimed invention. It allows the applicant to subjectively decide which properties of an encoder can be consider the same while other may differ, but does not allow one of ordinary skill in the art to objectively ascertain the intended scope. These requirements appear in conflict with one another and thus one of ordinary skill in the art would be able to ascertain the intended scope of the claimed limitation; and the limitation is considered infinite. 
The examiner interprets that any set of encoder networks would be within the scope of the claim limitation. 
Regarding claims 11 and 20, include limitations similar to claim 1 and are thus rejected under the same rationale.
Regarding the dependent claims that depend on claims 1 and 11, the claims do not resolve  the noted deficiencies above and are thus rejected under the same rationale.

Regarding claim 21, the claim recites “the trained second neural network” and there is insufficient antecedent basis for this limitation in the claim. Examiner interprets any neural network as within the scope of the claim limitation.

Regarding claim 21 recites the limitation " further comprising, generating a first output image via the trained decoder neural network and the trained first encoder neural network without the trained second neural network, in response to a first input image” that renders the claim indefinite. The limitations disavows the user of “the trained second neural network” but claim 1 limitations requires the decoder to uses the second encoder network, that is assumed to be the claimed “the trained second neural network” to update its parameters based on the contrastive loss computing using the second encoder neural network as required in claim 1 limitations “… encoding, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network; determining a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation; decoding, using a decoder neural network comprising decoder parameters, the first latent representation into an output data set; determining a reconstruction loss based on the output data set and the input data set; and updating at least one parameter in the first encoder parameters and at least one parameter in the decoder parameters based on the contrastive loss and the reconstruction loss.” (emphasis added)


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 8, 11-13, 17 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (Pub. No.: US 2021/0358177, hereinafter ‘Park’) in view of Jaiswal et al. (NPL: “A Survey on Contrastive Self-Supervised Learning”, hereinafter ‘Jai’).
	
Regarding independent claim 1 limitation, Park teaches a system for training a contrastive momentum autoencoder, (0049: As shown in FIG. 1, the server(s) 104 can also include the deep image manipulation system 102 [a system for training a contrastive momentum autoencoder] as part of a digital content editing system 106... In addition, the digital content editing system 106 and/or the deep image manipulation system 102 can learn parameters of a global and spatial autoencoder [a system for training a contrastive momentum autoencoder] 112 by training an encoder neural network and a generator neural network of the global and spatial autoencoder 112 to extract spatial codes corresponding to geometric layout and global codes corresponding to overall appearance. In some embodiments, the digital content editing system 106 and/or the deep image manipulation system 102 can utilize a contrastive loss as part of the training process [a system for training a contrastive momentum autoencoder]. )
 the system comprising: one or more non-transitory memories: one or more processors coupled the one or more memories and configured to execute instructions that cause the one or more processors to perform operations, the operations comprising (in [0144] Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present disclosure also include physical and other computer readable media for carrying or storing computer-executable instructions and/or data structures. In particular, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium [the system comprising: one or more non-transitory memories: one or more processors coupled the one or more memories and configured to execute instructions that cause the one or more processors to perform operations, the operations comprising]and executable by one or more computing devices ( e.g., any of the media content access devices described herein). In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.)
receive an input data set; (in 0078: As illustrated in FIG. 5, the encoder neural network 206 includes convolutional layers, residual blocks, and layout blocks. In particular, the key in FIG. 5 indicates that the white layers of the encoder neural network 206 are convolutional layers, the diagonally patterned blocks are residual blocks, and the crosshatch patterned blocks are layout blocks. In addition, the input digital image [claimed receive the input data set as image data] ( e.g., 202, 204, 302, 402, 404, or 422) is represented by the tall gray block, the global code is represented by the short wide gray block, and the spatial code is represented by the medium height thin gray block. As mentioned above, the encoder neural network 206  includes a spatial encoder neural net­work  and a global encoder neural network which share common layers.; And Claimed two encoder networks in 0055: As shown in FIG. 2, the deep image manipulation system 102 can utilize the same encoder neural network 206 to extract the global and spatial codes from each of the first and second digital images 202, 204. In some embodiments, the deep image manipulation system 102 utilizes two sepa­rate encoders: a spatial encoder neural network to extract the spatial code 208 (and the spatial code 212) and a global encoder neural network to extract the global code 210 (and the global code 214).)
encode, using a first encoder neural network comprising first encoder parameters, a first latent representation from the input data set; encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network; (Claimed two encoder networks in 0055: As shown in FIG. 2, the deep image manipulation system 102 can utilize the same encoder neural network 206 [encode, using a first encoder neural network comprising first encoder parameters, a first latent representation from the input data set; encode, using a second encoder neural network having a same structure as the first encoder neural network] to extract the global and spatial codes from each of the first and second digital images 202, 204 [a same input data set as for the first encoder neural network]. In some embodiments, the deep image manipulation system 102 utilizes two sepa­rate encoders [encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] a spatial encoder neural network to extract the spatial code 208 (and the spatial code 212) [comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] and a global encoder neural network to extract the global code 210 (and the global code 214 [comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network].
 Examiner notes that the same encoder network receives that same inputs 202 and 204 and processes different parameters for learning input data codes/features as claimed.)
determine a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation; (in As depicted in Fig. 4, in 0069-0071: … where x.sup.0 represents a latent code representation of the first digital image 402, x.sup.1 represents a latent code representation of the second digital image 404 [using the first latent representation and second latent representation], and the other terms are defined above. In one or more embodiments, utilizing this GAN loss alone may not be enough for the deep image manipulation system 102 to constrain the global and spatial autoencoder 112 to generate a hybrid of the first digital image 402 and the second digital image 404 [the first latent representation and second latent representation associated with generating the hyrid input into the GAN model for further processing], as the GAN loss is related only to the realism of the resultant digital image. Thus, to improve the generation of hybrid digital images [the contrastive loss based at least on a similarity between the first latent representation and the second latent representation], the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [determine a contrastive loss using the first latent representation and second latent representation]… For the contrastive loss, the deep image manipulation system 102 shrinks the custom-character.sub.2 distance ∥E(G(z))−z∥.sub.2.sup.2=∥E(G(E(x)))−E(x)∥.sub.2.sup.2 by utilizing the encoder neural network 206 (E) to scale down the magnitude of its output space. Therefore, the deep image manipulation system 102 ensures that the reconstructed code 424 (e.g., a reconstructed spatial code and a reconstructed global code extracted from the modified digital image 422 utilizing the encoder neural network 206), as given by {circumflex over (z)}=E(G(z)), closely resembles (or matches) the extracted code z (e.g., the combination of the spatial code 406 and the global code 412) itself.)
decode, using a decoder neural network comprising decoder parameters the first latent representation into an output data set. (in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [decode, using a decoder neural network comprising decoder parameters the first latent representation into an output data set] and to encourage realistic hybrid digital images…)
determine a reconstruction loss based on the output data set and the input data set; (in 0066: … For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 over multiple iterations, inputting new input digital images to generate new reconstructed digital images, determining losses, and modifying parameters for each iteration. Thus, upon determining that the GAN loss and/or the reconstruction loss 308 [determine a reconstruction loss based on the output data set and the input data set] each satisfy a threshold loss, the deep image manipulation system 102 determines that the encoder neural network 206 and the generator neural network 216 are accurate. Indeed, by combining the extracted spatial code 310 and the extracted global code 312, the generator neural network 216 generates the recon­structed digital image 304 to accurately represent the input digital image 302 […using the output data set and the input data set;]. As shown in FIG. 3, the reconstructed digital image 304 looks very similar, if not identical, to the input digital image 302.)
and update at least one parameter in the first encoder parameters of the first encoder neural network and at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss (in 0070:  … For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 over multiple iterations, inputting new input digital images to generate new reconstructed digital images, determining losses, and modifying parameters for each iteration. Thus, upon determining that the GAN loss and/or the reconstruction loss 308 [based on the contrastive loss and the reconstruction loss] each satisfy a threshold loss, the deep image manipulation system 102 determines that the encoder neural network 206 and the generator neural network 216 are accurate …Thus, to improve the generation of hybrid digital images, the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [based on the contrastive loss and the reconstruction loss]. In particular, the deep image manipulation system 102 utilizes a code reconstruction loss to learn parameters [and update at least one parameter in the first encoder parameters of the first encoder neural network…] for recon­structing the particular codes ( e.g., the spatial code 406 and the global code 412) extracted from the first digital image 402 (x0) and the second digital image 404 (x1)...; And in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss] and to encourage realistic hybrid digital images… )
While Park teaches the generation and use of contrastive loss from latent representations as claimed used to form an hybrid data set for further processing as claimed. 
Additionally Jai teaches determine a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation; (As depicted in Fig. 11 & Fig 14 and in Sec. 3: Contrastive learning methods rely on the number of negative samples for generating good quality representations. Accessing negative samples can be seen as a dictionary lookup task where the dictionary is sometimes the whole training set and the rest of the times some subset of the dataset. An interesting way to categorize these methods would be based on the technique used to collect negative samples against a positive data point during training. Based on the approach taken, we categorized the methods into four major architectures as shown in Figure 11…  Using a contrastive loss, it converges to make positive samples closer and negative samples far from the original sample [determine a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation]. Here, the query encoder Q is trained on the original samples and the key encoder K is trained on their augmented versions (positive samples) along with the negative samples in the batch. The features q and k generated from these encoders are used to calculate the similarity between the respective inputs using a similarity metric (discussed later in Section 5). Most of the time, the similarity metric used is cosine similarity, which is simply the inner product of two vectors normalized to have length 1 as defined in Equation (2)…)
	And …  a second latent representation from a same input data set as for the first encoder neural network; (in As depicted in Fig. 11 and  Fig. 14 and in Sec 1: Unlike generative models, contrastive learning (CL) is a discriminative approach that aims at grouping similar samples closer and diverse samples far from each other as shown in Figure 1. To achieve this, a similarity metric is used to measure how close two embeddings are. Especially, for computer vision tasks, a contrastive loss is evaluated based on the feature representations of the images extracted from an encoder network. For instance, one sample from the training dataset is taken and a transformed version of the sample is retrieved by applying appropriate data augmentation techniques. During training, referring to Figure 2, the augmented version of the original sample is considered as a positive sample, and the rest of the samples in the batch/dataset (depends on the method being used) are considered negative samples… And in Sec. 3: Contrastive learning methods rely on the number of negative samples for generating good quality representations…. Based on the approach taken, we categorized the methods into four major architectures as shown in Figure 11…  Here, the query encoder Q is trained on the original samples and the key encoder K is trained on their augmented versions (positive samples) along with the negative samples in the batch [a second latent representation from a same input data set as for the first encoder neural network as the same input of the first encoder is used as an augmented same input]. The features q and k generated from these encoders are used to calculate the similarity between the respective inputs using a similarity metric (discussed later in Section 5). Most of the time, the similarity metric used is cosine similarity, which is simply the inner product of two vectors normalized to have length 1 as defined in Equation (2)…)

Jai and Park are analogous art because both involve developing information processing and recognition techniques using machine learning systems and algorithms.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for implementing contrastive learning as a component in self-supervised learning for computer vision, natural language processing (NLP), and other domains, as disclosed by Jai with the method of developing machine learning techniques developing embedding models for image processing tasks as collectively disclosed by Jai and Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Jai and Park as noted above; Doing so allows for embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples, (Jai, Abstract).

Regarding claim 2, the rejection of claim 1 is incorporated and Park in combination with Jai teaches the system of claim 1, the operations further comprising: after the updating of the at least one parameter in the first encoder parameters, update at least one parameter in the second encoder parameters of the second encoder neural network based on a moving average of the at least one parameter in the first encoder parameters. (in 0075-0080: In some embodiments, the deep image manipula­tion system 102 utilizes a particular training objective func­tion to learn parameters [the operations further comprising: after the updating of the at least one parameter in the first encoder parameters, update at least one parameter in the second encoder parameters of the second encoder neural network based on a moving average of the at least one parameter in the first encoder parameters] of the encoder neural network 206 and the generator neural network 216 to accurately and realistically generate modified digital images in the form of hybrid digital images or reconstructed digital images… As mentioned above, the deep image manipulation system 102 utilizes a global and spatial autoencoder 112 with a novel architecture specifically for digital image manipulation… In addition, the encoder neural network 206 generates the spatial code by passing intermediate (e.g., non­output) activations or latent features into layout blocks. Each layout block upsamples the latent feature vector to a fixed size ( e.g., a spatial resolution of 32 or 64, depending on the dataset) and reduces the channel dimension (e.g., to 1 or 2 channels). The encoder neural network 206 further aggre­gates ( e.g., averages) [… based on a moving average of the at least one parameter in the first encoder parameters] the intermediate features to generate the spatial code…)
Additionally Jai teaches, after the updating of the at least one parameter in the first encoder parameters, update at least one parameter in the second encoder parameters of the second encoder neural network based on a moving average of the at least one parameter in the first encoder parameters (in Sec. 4: . Encoders play an integral role in any self-supervised learning pipeline as they are responsible for mapping the input samples to a latent space [after the updating of the at least one parameter in the first encoder parameters]. Figure 15 reflects the role of an encoder in a self-supervised learning pipeline… Most of the works in contrastive learning utilize some variant of the ResNet [41]model…  Similarly, in the work proposed by Chen et al. [42], a traditional ResNet is used as an encoder where the features are extracted from the output of the average pooling layer [based on a moving average of the at least one parameter in the first encoder parameters]. Further, a shallow MLP (1 hidden layer) maps representations to a latent space where a contrastive loss is applied [update at least one parameter in the second encoder parameters of the second encoder neural network based on a moving average of the at least one parameter in the first encoder parameters]. For training a model for action recognition, the most common approach to extract features from a sequence of image frames is to use a 3D-ResNet as encoder…; And having claimed plurality of encoders for computing contrastive loss as depicted in Fig 11 or 14)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Park and Jai for the same reasons disclosed above.

Regarding claim 3, the rejection of claim 1 is incorporated and Park in combination with Jai teaches the system of claim 1, wherein the first encoder is configured to normalize the first latent representation using l2 normalization. (in 0097: … Indeed, in some embodiments, the deep image manipulation system 102 normalizes the latent codes (the global code and the spatial code) on the unit sphere and utilizes spherical [wherein the first encoder is configured to normalize the first latent representation using … normalization]…; And in [0073] where the dot product “⋅” represents the cosine similarity [wherein the first encoder is configured to normalize the first latent representation using l2 normalization], N represents a size of the digital image code repository (e.g., the number of stored codes 420 in the “Memory bank” of FIG. 4), τ=0.07 is a “temperature” parameter, z represents the latent code for either the spatial or global components, E represents the spatial encoder neural network or the global encoder neural network, … [0074] By utilizing the contrastive loss above, the deep image manipulation system 102 encourages {circumflex over (z)} to be classified as z (or at least within a threshold similarity of z) amongst N+1 exemplar classes, where each class logit is formed by cosine similarity [wherein the first encoder is configured to normalize the first latent representation using l2 normalization]…)
	While Park in combination with Kim and Liu teaches the use of encoder networks  from embedding features and learning latent representations using normalization techniques. Park does not expressly teach that cosine similarity is an l2 normalization. 

Jai does expressly teach  the use of encoder networks  from embedding features and learning latent representations using normalization techniques. Jai does teach that cosine similarity is an l2 normalization. (in Sec. 3.1 pg. 10:  … Using a contrastive loss, it converges to make positive samples closer and negative samples far from the original sample… The features q and k generated from these encoders are used to calculate the similarity between the respective inputs using a similarity metric (discussed later in Section 5). Most of the time, the similarity metric used is cosine similarity [wherein the first encoder is configured to normalize the first latent representation using l2 normalization], which is simply the inner product of two vectors normalized to have length 1 as defined in Equation (2). And in pg. 13 Sec. 5: … Contrastive learning focuses on comparing the embeddings with a Noise Contrastive Estimation (NCE) [43] function that is defined as… where q is the original sample, k+ represents a positive sample, and k_ represents a negative sample. τ is a hyperparameter used in most of the recent methods and is called temperature coefficient. The sim() function can be any similarity function, but generally a cosine similarity as defined in Equation (2) is used. The initial idea behind NCE was to perform a nonlinear logistic regression that discriminates between observed data and some artificially generated noise. If the number of negative samples is greater, a variant of NCE called InfoNCE is used as represented in Equation (4). The use of l2 normalization (i.e., cosine similarity) [wherein the first encoder is configured to normalize the first latent representation using l2 normalization]  and the temperature coefficient effectively weighs different examples and can help the model learn from hard negatives…)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Park and Jai for the same reasons disclosed above.

	
Regarding claim 8, the rejection of claim 1 is incorporated and Park in combination with Jai teaches the system of claim 1, wherein the first encoder neural network and the second encoder neural network are configured to receive the input data set as multiple mini-batches and the operations further comprising: updating the at least one parameter in the first encoder parameters of the first encoder neural network and at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches. (in 0033-0034: … In particular, while any conventional systems utilize conventional generative models to generate digital images from random samples (which makes them unfit for accurately generating specific digital images), the deep image manipulation system utilizes a novel model architecture (i.e., a global and spatial autoencoder) [the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches] designed specifically for digital image manipulation. Indeed, the architecture of the global and spatial autoencoder enable the deep image manipulation system to accurately generate specific digital images and manipulate particular attributes of digital images… In addition to its novel architecture, the deep image manipulation system trains the global and spatial autoen­coder to accurately generate specific digital images by swapping spatial codes and global codes between pairs of digital images [wherein the first encoder neural network and the second encoder neural network are configured to receive the input data set as multiple mini-batches; claimed mini-batches as image pairs], thus forcing the global and spatial autoen­coder to learn compositionality. By learning compositional­ity in this way, the deep image manipulation system can learn embeddings [the operations further comprising: updating the at least one parameter in the first encoder parameters of the first encoder neural network] that are suitable for digital image manipu­lation: spatial features naturally correspond to geometric layout of a digital image, and global features naturally capture an overall appearance. Additionally, by utilizing a contrastive loss [at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches] to force extracted spatial codes and extracted global codes to be more similar to corresponding codes from input digital images than to stored spatial codes and stored global codes, the deep image manipulation sys­tem further improves the accuracy and realism of resultant digital images.); And in 0069-0070: … Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [decoder parameters of the decoder neural network] and to encourage realistic hybrid digital images... In one or more embodiments, utilizing this GAN loss alone may not be enough for the deep image manipulation system 102 to constrain the global and spatial autoencoder 112 to generate a hybrid of the first digital image 402 and the second digital image 404, as the GAN loss is related only to the realism of the resultant digital image. Thus, to improve the generation of hybrid digital images, the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches]. In particular, the deep image manipulation system 102 utilizes a code reconstruction loss to learn parameters [the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches] for recon­structing the particular codes ( e.g., the spatial code 406 and the global code 412) extracted from the first digital image 402 (x0) and the second digital image 404 (x1)…)
	
Regarding independent claim 11, Park teaches a method for training a contrastive momentum autoencoder, (0049: As shown in FIG. 1, the server(s) 104 can also include the deep image manipulation system 102 [including/executing claimed method for training a contrastive momentum autoencoder] as part of a digital content editing system 106... In addition, the digital content editing system 106 and/or the deep image manipulation system 102 can learn parameters of a global and spatial autoencoder [method for training a contrastive momentum autoencoder] 112 by training an encoder neural network and a generator neural network of the global and spatial autoencoder 112 to extract spatial codes corresponding to geometric layout and global codes corresponding to overall appearance. In some embodiments, the digital content editing system 106 and/or the deep image manipulation system 102 can utilize a contrastive loss as part of the training process [method for training a contrastive momentum autoencoder]. )
the method comprising: receiving a training dataset including images; (in 0078: As illustrated in FIG. 5, the encoder neural network 206 includes convolutional layers, residual blocks, and layout blocks. In particular, the key in FIG. 5 indicates that the white layers of the encoder neural network 206 are convolutional layers, the diagonally patterned blocks are residual blocks, and the crosshatch patterned blocks are layout blocks. In addition, the input digital image [claimed receiving a training dataset including images as digital image data] ( e.g., 202, 204, 302, 402, 404, or 422) is represented by the tall gray block, the global code is represented by the short wide gray block, and the spatial code is represented by the medium height thin gray block. As mentioned above, the encoder neural network 206  includes a spatial encoder neural net­work  and a global encoder neural network which share common layers.; And Claimed two encoder networks in 0055: As shown in FIG. 2, the deep image manipulation system 102 can utilize the same encoder neural network 206 to extract the global and spatial codes from each of the first and second digital images 202, 204. In some embodiments, the deep image manipulation system 102 utilizes two sepa­rate encoders: a spatial encoder neural network to extract the spatial code 208 (and the spatial code 212) and a global encoder neural network to extract the global code 210 (and the global code 214).)
encoding, using a first encoder neural network comprising first encoder parameters, a first latent representation from an image in the training dataset input data set; encoding, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same image as for the first encoder neural network; (Claimed two encoder networks in 0055: As shown in FIG. 2, the deep image manipulation system 102 can utilize the same encoder neural network 206 [encoding, using a second encoder neural network having a same structure as the first encoder neural network] to extract the global and spatial codes from each of the first and second digital images 202, 204 [a same input data set as for the first encoder neural network]. In some embodiments, the deep image manipulation system 102 utilizes two sepa­rate encoders [encoding, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same image as for the first encoder neural network;] a spatial encoder neural network to extract the spatial code 208 (and the spatial code 212) [comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] and a global encoder neural network to extract the global code 210 (and the global code 214 [comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network].
 Examiner notes that the same encoder network receives that same inputs 202 and 204 and processes different parameters as claimed.)
determining a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation; (in As depicted in Fig. 4, in 0069-0071: … where x.sup.0 represents a latent code representation of the first digital image 402, x.sup.1 represents a latent code representation of the second digital image 404 [using the first latent representation and second latent representation], and the other terms are defined above. In one or more embodiments, utilizing this GAN loss alone may not be enough for the deep image manipulation system 102 to constrain the global and spatial autoencoder 112 to generate a hybrid of the first digital image 402 and the second digital image 404 [the first latent representation and second latent representation associated with generating the hyrid input into the GAN model for further processing], as the GAN loss is related only to the realism of the resultant digital image. Thus, to improve the generation of hybrid digital images [the contrastive loss based at least on a similarity between the first latent representation and the second latent representation], the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [determine a contrastive loss using the first latent representation and second latent representation]… For the contrastive loss, the deep image manipulation system 102 shrinks the custom-character.sub.2 distance ∥E(G(z))−z∥.sub.2.sup.2=∥E(G(E(x)))−E(x)∥.sub.2.sup.2 by utilizing the encoder neural network 206 (E) to scale down the magnitude of its output space. Therefore, the deep image manipulation system 102 ensures that the reconstructed code 424 (e.g., a reconstructed spatial code and a reconstructed global code extracted from the modified digital image 422 utilizing the encoder neural network 206), as given by {circumflex over (z)}=E(G(z)), closely resembles (or matches) the extracted code z (e.g., the combination of the spatial code 406 and the global code 412) itself.)
decoding, using a decoder neural network comprising decoder parameters, the first latent representation into an output image;. (in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [decoding, using a decoder neural network comprising decoder parameters, the first latent representation into an output image] and to encourage realistic hybrid digital images […the first latent representation into an output image]…)
determining a reconstruction loss using the output image and the image in the training dataset; (in 0066: … For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 over multiple iterations, inputting new input digital images to generate new reconstructed digital images, determining losses, and modifying parameters for each iteration. Thus, upon determining that the GAN loss and/or the reconstruction loss 308 [d determining a reconstruction loss using the output image and the image in the training dataset] each satisfy a threshold loss, the deep image manipulation system 102 determines that the encoder neural network 206 and the generator neural network 216 are accurate. Indeed, by combining the extracted spatial code 310 and the extracted global code 312, the generator neural network 216 generates the recon­structed digital image 304 to accurately represent the input digital image 302 […using the output image and the image in the training dataset]. As shown in FIG. 3, the reconstructed digital image 304 looks very similar, if not identical, to the input digital image 302.)
updating at least one parameter in the first encoder parameters and at least one parameter in the decoder parameters based on the contrastive loss and the reconstruction loss; (in 0070:  … For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 over multiple iterations, inputting new input digital images to generate new reconstructed digital images, determining losses, and modifying parameters for each iteration. Thus, upon determining that the GAN loss and/or the reconstruction loss 308 [based on the contrastive loss and the reconstruction loss] each satisfy a threshold loss, the deep image manipulation system 102 determines that the encoder neural network 206 and the generator neural network 216 are accurate …Thus, to improve the generation of hybrid digital images, the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [based on the contrastive loss and the reconstruction loss]. In particular, the deep image manipulation system 102 utilizes a code reconstruction loss to learn parameters [updating at least one parameter in the first encoder parameters and at least one parameter in the decoder parameters based on the contrastive loss and the reconstruction loss] for recon­structing the particular codes ( e.g., the spatial code 406 and the global code 412) extracted from the first digital image 402 (x0) and the second digital image 404 (x1)...; And in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [at least one parameter in the decoder parameters based on the contrastive loss and the reconstruction loss] and to encourage realistic hybrid digital images… )
and after the first encoder neural network and the decoder neural network is trained: receiving an input image; and generating, by the trained first encoder neural network and the second trained decoder neural network, a reconstructed image in response to the input image. (in 0096] In addition to reconstruction and style swapping, the deep image manipulation system 102 can also utilize the global and spatial autoencoder 112 (trained only on the reconstruction and style swapping) for additional applications such as style blending [and after the first encoder neural network and the decoder neural network is trained: receiving an input image;]. More specifically, the deep image manipulation system 102 can blend styles represented by global codes of multiple target digital images with the geometric layout of a source digital image [receiving an input image]. Within the “Style blending” section of FIG. 9, the target digital image (and its corresponding latent feature representation x.sub.1) refers to a collection digital images rather than a single digital image, and {circumflex over (z)}.sub.s.sup.1 and {circumflex over (z)}.sub.g.sup.1 refer to average codes from that collection. Indeed, the deep image manipulation system 102 extracts the spatial codes and the global codes from the collection of digital images and combines them to generate a composite (e.g., average) spatial code and a composite (e.g., average) global code. The deep image manipulation system 102 thus generates a modified digital image [and generating, by the trained first encoder neural network and the second trained decoder neural network, a reconstructed image in response to the input image] by combining the composite global code with a source spatial code.; And in [0116] As mentioned above, the deep image manipulation system 102 can generate modified digital images [and generating, by the trained first encoder neural network and the second trained decoder neural network, a reconstructed image in response to the input image] utilizing the global and spatial autoencoder 112 to swap styles of input digital images. In particular the deep image manipulation system 102 can extract spatial codes and global codes from digital images and can swap codes to combine a spatial code from one digital image with a global code from another digital image to generate a modified digital image. FIG. 12 illustrates generating modified digital images from source digital images and target digital images in accordance with one or more embodiments.)
While Park teaches the generation and use of contrastive loss from latent representations as claimed used to form an hybrid data set for further processing as claimed. 
Additionally Jai teaches determine a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation; (As depicted in Fig. 11 & Fig 14 and in Sec. 3: Contrastive learning methods rely on the number of negative samples for generating good quality representations. Accessing negative samples can be seen as a dictionary lookup task where the dictionary is sometimes the whole training set and the rest of the times some subset of the dataset. An interesting way to categorize these methods would be based on the technique used to collect negative samples against a positive data point during training. Based on the approach taken, we categorized the methods into four major architectures as shown in Figure 11…  Using a contrastive loss, it converges to make positive samples closer and negative samples far from the original sample. Here, the query encoder Q is trained on the original samples and the key encoder K is trained on their augmented versions (positive samples) along with the negative samples in the batch. The features q and k generated from these encoders are used to calculate the similarity between the respective inputs using a similarity metric (discussed later in Section 5). Most of the time, the similarity metric used is cosine similarity, which is simply the inner product of two vectors normalized to have length 1 as defined in Equation (2)…)

Jai and Park are analogous art because both involve developing information processing and recognition techniques using machine learning systems and algorithms.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for implementing contrastive learning as a component in self-supervised learning for computer vision, natural language processing (NLP), and other domains, as disclosed by Jai with the method of developing machine learning techniques developing embedding models for image processing tasks as collectively disclosed by Jai and Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Jai and Park as noted above; Doing so allows for embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples, (Jai, Abstract).

Regarding claims 12-13,  the claim limitations are similar to claims 2-3 and thus rejected under the same rationale. 
Regarding claim 8,  the claim limitations are similar to claim 17 limitations and thus rejected under the same rationale.

Regarding independent claim 20, the limitations are similar to claim 1 limitations and are rejected under the same rationale. Additionally, Park teaches a non-transitory computer readable medium having instructions thereon, that when executed by a processor cause the processor to perform operations (in 0151-0156: Embodiments of the present disclosure can also be implemented in cloud computing environments. In this description, "cloud computing" is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources… FIG. 17 illustrates, in block diagram form, an example computing device 1700 ( e.g., the computing device 1500, the client device 108, and/or the server(s) 104) that may be configured to perform one or more of the processes described above. One will appreciate that the deep image manipulation system 102 can comprise implementations of the computing device 1700… In particular embodiments, processor( s) 1702 includes hardware for executing instructions [a non-transitory computer readable medium having instructions thereon, that when executed by a processor cause the processor to perform operations], such as those making up a computer program… The computing device 1700 includes a storage device 1706 includes storage for storing data or instructions. As an example, and not by way of limitation, storage device 1706 can comprise a non-transitory storage medium described above…)

Regarding claim 21, the rejection of claim 1 is incorporated and Park in combination with Jai teaches the system of claim 1, wherein the decoder neural network comprising decoder parameters that decode the first latent representation into an output data set without the second latent representation. (in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [wherein the decoder neural network comprising decoder parameters that decode the first latent representation into an output data set without the second latent representation] and to encourage realistic hybrid digital images…)

Regarding claim 22, the rejection of claim 1 is incorporated and Park in combination with Jai teaches the system of claim 1, wherein the second encoder neural network trains the first encoder neural network such that the first encoder parameters resemble the second encoder parameters after the updating of the at least one parameter in the first encoder parameters and at least one parameter in the decoder parameters. (in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [wherein the second encoder neural network trains the first encoder neural network such that the first encoder parameters resemble the second encoder parameters after the updating of the at least one parameter in the first encoder parameters and at least one parameter in the decoder parameters, as the error or loss parameter associated with the models] and to encourage realistic hybrid digital images…; And in [0049] As shown in FIG. 1, the server(s) 104 can also include the deep image manipulation system 102 as part of a digital content editing system 106. The digital content editing system 106 can communicate with the client device 108 to perform various functions associated with the client application 110 such as extracting spatial codes, extracting global codes, and generating a modified digital image. In addition, the digital content editing system 106 and/or the deep image manipulation system 102 can learn parameters of a global and spatial autoencoder 112 by training  [wherein the second encoder neural network trains the first encoder neural network such that the first encoder parameters resemble the second encoder parameters after the updating of the at least one parameter in the first encoder parameters and at least one parameter in the decoder parameters, as the learned parameters and codes associated with the models] an encoder neural network and a generator neural network of the global and spatial autoencoder 112 to extract spatial codes corresponding to geometric layout and global codes corresponding to overall appearance. In some embodiments, the digital content editing system 106 and/or the deep image manipulation system 102 can utilize a contrastive loss as part of the training process.)


Claims  1-3, 8, 11-13, 17 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (Pub. No.: US 2021/0358177, hereinafter ‘Park’) in view of Li et al. (US 20210312296, hereinafter ‘Li’).
	
Regarding independent claim 1 limitation, Park teaches a system for training a contrastive momentum autoencoder, (0049: As shown in FIG. 1, the server(s) 104 can also include the deep image manipulation system 102 [a system for training a contrastive momentum autoencoder] as part of a digital content editing system 106... In addition, the digital content editing system 106 and/or the deep image manipulation system 102 can learn parameters of a global and spatial autoencoder [a system for training a contrastive momentum autoencoder] 112 by training an encoder neural network and a generator neural network of the global and spatial autoencoder 112 to extract spatial codes corresponding to geometric layout and global codes corresponding to overall appearance. In some embodiments, the digital content editing system 106 and/or the deep image manipulation system 102 can utilize a contrastive loss as part of the training process [a system for training a contrastive momentum autoencoder]. )
 the system comprising: one or more memories: one or more processors coupled the one or more memories and configured to execute instructions that cause the one or more processors to perform operations, the operations comprising (in [0144] Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present disclosure also include physical and other computer readable media for carrying or storing computer-executable instructions and/or data structures. In particular, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium [the system comprising: one or more memories: one or more processors coupled the one or more memories and configured to execute instructions that cause the one or more processors to perform operations, the operations comprising]and executable by one or more computing devices ( e.g., any of the media content access devices described herein). In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.)
receive an input data set; (in 0078: As illustrated in FIG. 5, the encoder neural network 206 includes convolutional layers, residual blocks, and layout blocks. In particular, the key in FIG. 5 indicates that the white layers of the encoder neural network 206 are convolutional layers, the diagonally patterned blocks are residual blocks, and the crosshatch patterned blocks are layout blocks. In addition, the input digital image [claimed receive the input data set as image data] ( e.g., 202, 204, 302, 402, 404, or 422) is represented by the tall gray block, the global code is represented by the short wide gray block, and the spatial code is represented by the medium height thin gray block. As mentioned above, the encoder neural network 206  includes a spatial encoder neural net­work  and a global encoder neural network which share common layers.; And Claimed two encoder networks in 0055: As shown in FIG. 2, the deep image manipulation system 102 can utilize the same encoder neural network 206 to extract the global and spatial codes from each of the first and second digital images 202, 204. In some embodiments, the deep image manipulation system 102 utilizes two sepa­rate encoders: a spatial encoder neural network to extract the spatial code 208 (and the spatial code 212) and a global encoder neural network to extract the global code 210 (and the global code 214).)
encode, using a first encoder neural network comprising first encoder parameters, a first latent representation from the input data set; encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network; (Claimed two encoder networks in 0055: As shown in FIG. 2, the deep image manipulation system 102 can utilize the same encoder neural network 206 [encode, using a first encoder neural network comprising first encoder parameters, a first latent representation from the input data set; encode, using a second encoder neural network having a same structure as the first encoder neural network] to extract the global and spatial codes from each of the first and second digital images 202, 204 [a same input data set as for the first encoder neural network]. In some embodiments, the deep image manipulation system 102 utilizes two sepa­rate encoders [encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] a spatial encoder neural network to extract the spatial code 208 (and the spatial code 212) [comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] and a global encoder neural network to extract the global code 210 (and the global code 214 [comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network].
 Examiner notes that the same encoder network receives that same inputs 202 and 204 and processes different parameters for learning input data codes/features as claimed.)
determine a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation; (in As depicted in Fig. 4, in 0069-0071: … where x.sup.0 represents a latent code representation of the first digital image 402, x.sup.1 represents a latent code representation of the second digital image 404 [using the first latent representation and second latent representation], and the other terms are defined above. In one or more embodiments, utilizing this GAN loss alone may not be enough for the deep image manipulation system 102 to constrain the global and spatial autoencoder 112 to generate a hybrid of the first digital image 402 and the second digital image 404 [the first latent representation and second latent representation associated with generating the hyrid input into the GAN model for further processing], as the GAN loss is related only to the realism of the resultant digital image. Thus, to improve the generation of hybrid digital images [the contrastive loss based at least on a similarity between the first latent representation and the second latent representation], the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [determine a contrastive loss using the first latent representation and second latent representation]… For the contrastive loss, the deep image manipulation system 102 shrinks the custom-character.sub.2 distance ∥E(G(z))−z∥.sub.2.sup.2=∥E(G(E(x)))−E(x)∥.sub.2.sup.2 by utilizing the encoder neural network 206 (E) to scale down the magnitude of its output space. Therefore, the deep image manipulation system 102 ensures that the reconstructed code 424 (e.g., a reconstructed spatial code and a reconstructed global code extracted from the modified digital image 422 utilizing the encoder neural network 206), as given by {circumflex over (z)}=E(G(z)), closely resembles (or matches) the extracted code z (e.g., the combination of the spatial code 406 and the global code 412) itself.)
decode, using a decoder neural network comprising decoder parameters the first latent representation into an output data set. (in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [decode, using a decoder neural network comprising decoder parameters the first latent representation into an output data set] and to encourage realistic hybrid digital images…)
determine a reconstruction loss based on the output data set and the input data set; (in 0066: … For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 over multiple iterations, inputting new input digital images to generate new reconstructed digital images, determining losses, and modifying parameters for each iteration. Thus, upon determining that the GAN loss and/or the reconstruction loss 308 [determine a reconstruction loss based on the output data set and the input data set] each satisfy a threshold loss, the deep image manipulation system 102 determines that the encoder neural network 206 and the generator neural network 216 are accurate. Indeed, by combining the extracted spatial code 310 and the extracted global code 312, the generator neural network 216 generates the recon­structed digital image 304 to accurately represent the input digital image 302 […using the output data set and the input data set;]. As shown in FIG. 3, the reconstructed digital image 304 looks very similar, if not identical, to the input digital image 302.)
and update at least one parameter in the first encoder parameters of the first encoder neural network and at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss (in 0070:  … For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 over multiple iterations, inputting new input digital images to generate new reconstructed digital images, determining losses, and modifying parameters for each iteration. Thus, upon determining that the GAN loss and/or the reconstruction loss 308 [based on the contrastive loss and the reconstruction loss] each satisfy a threshold loss, the deep image manipulation system 102 determines that the encoder neural network 206 and the generator neural network 216 are accurate …Thus, to improve the generation of hybrid digital images, the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [based on the contrastive loss and the reconstruction loss]. In particular, the deep image manipulation system 102 utilizes a code reconstruction loss to learn parameters [and update at least one parameter in the first encoder parameters of the first encoder neural network…] for recon­structing the particular codes ( e.g., the spatial code 406 and the global code 412) extracted from the first digital image 402 (x0) and the second digital image 404 (x1)...; And in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss] and to encourage realistic hybrid digital images… )
While Park teaches the use of the same model in processing input, thus considered the claimed same input as noted above.
Additionally, Li teaches using the same network to process the same input for processing different learned parameters as claimed: encode, using a first encoder neural network comprising first encoder parameters, a first latent representation from the input data set; encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network, as depicted in Fig. 3 and in [0037] In one example, the individual-disentanglement encoder (FIG. 3, 132-2) [encode, using a first encoder neural network comprising first encoder parameters, a first latent representation from the input data set; encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] and the emotion-disentanglement encoder (FIG. 3, 132-1) may be variational encoders. A variational encoder may be any device or system that utilizes an encoder, a decoder, and a loss function to approximate inference in a latent Gaussian model where the approximate posterior and model likelihood are parametrized by neural networks. And in [0056] The output of the emotion variations encoder (132-1) and the individual variations encoder (132-2) [encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] is the emotion latent vectors z_emo1 (133-1) and z_emo2 (133-2) and the individual latent vectors z_id1 (134-1) and z_id2 (134-2)... The contrastive loss function (135) causes the emotion variations encoder (132-1) and the individual variations encoder (132-2) to map the physiological data signal pair (131) [encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] as close as possible in latent space if the physiological data signal pair (131) are from the same individual or share the same emotion respectively. The value output by the contrastive loss function (135) is added to the total loss of the unsupervised neural network (130) as part of the reconstruction loss (137)..

    PNG
    media_image1.png
    464
    814
    media_image1.png
    Greyscale


Li and Park are analogous art because both involve developing information processing and recognition techniques using machine learning systems and algorithms.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for implementing contrastive learning as for processing sub-dependent data factors using an encoder neural network, as disclosed by Jai with the method of developing machine learning techniques developing embedding models for image processing tasks as disclosed by Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Li and Park as noted above; Doing so allows for embedding separate latent parameter values from a single input data set and apply a contrastive loss function to constrain the learning process, (Li, 0022 & 0056).

Regarding claims 11 and 20, include limitations similar to claim 1 and are thus rejected under the same rationale. Regarding the remaining limitations of claim 11, Park teaches the limitations as noted above; and the limitations are rejected as noted above; wherein the rejection per the Park reference is incorporated.
	
	Regarding claims 2-3, 8, 12-13, 17 and 20-22, the limitations are rejected  per the Park reference as noted above.


Claims  1-3, 8, 11-13, 17 and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (Pub. No.: US 2021/0358177, hereinafter ‘Park’) in view of Schmitt et al. (US 20210110262, hereinafter ‘Seba’).
	
Regarding independent claim 1 limitation, Park teaches a system for training a contrastive momentum autoencoder, (0049: As shown in FIG. 1, the server(s) 104 can also include the deep image manipulation system 102 [a system for training a contrastive momentum autoencoder] as part of a digital content editing system 106... In addition, the digital content editing system 106 and/or the deep image manipulation system 102 can learn parameters of a global and spatial autoencoder [a system for training a contrastive momentum autoencoder] 112 by training an encoder neural network and a generator neural network of the global and spatial autoencoder 112 to extract spatial codes corresponding to geometric layout and global codes corresponding to overall appearance. In some embodiments, the digital content editing system 106 and/or the deep image manipulation system 102 can utilize a contrastive loss as part of the training process [a system for training a contrastive momentum autoencoder]. )
 the system comprising: one or more memories: one or more processors coupled the one or more memories and configured to execute instructions that cause the one or more processors to perform operations, the operations comprising (in [0144] Embodiments of the present disclosure may comprise or utilize a special purpose or general-purpose computer including computer hardware, such as, for example, one or more processors and system memory, as discussed in greater detail below. Embodiments within the scope of the present disclosure also include physical and other computer readable media for carrying or storing computer-executable instructions and/or data structures. In particular, one or more of the processes described herein may be implemented at least in part as instructions embodied in a non-transitory computer-readable medium [the system comprising: one or more memories: one or more processors coupled the one or more memories and configured to execute instructions that cause the one or more processors to perform operations, the operations comprising]and executable by one or more computing devices ( e.g., any of the media content access devices described herein). In general, a processor (e.g., a microprocessor) receives instructions, from a non-transitory computer-readable medium, (e.g., a memory, etc.), and executes those instructions, thereby performing one or more processes, including one or more of the processes described herein.)
receive an input data set; (in 0078: As illustrated in FIG. 5, the encoder neural network 206 includes convolutional layers, residual blocks, and layout blocks. In particular, the key in FIG. 5 indicates that the white layers of the encoder neural network 206 are convolutional layers, the diagonally patterned blocks are residual blocks, and the crosshatch patterned blocks are layout blocks. In addition, the input digital image [claimed receive the input data set as image data] ( e.g., 202, 204, 302, 402, 404, or 422) is represented by the tall gray block, the global code is represented by the short wide gray block, and the spatial code is represented by the medium height thin gray block. As mentioned above, the encoder neural network 206  includes a spatial encoder neural net­work  and a global encoder neural network which share common layers.; And Claimed two encoder networks in 0055: As shown in FIG. 2, the deep image manipulation system 102 can utilize the same encoder neural network 206 to extract the global and spatial codes from each of the first and second digital images 202, 204. In some embodiments, the deep image manipulation system 102 utilizes two sepa­rate encoders: a spatial encoder neural network to extract the spatial code 208 (and the spatial code 212) and a global encoder neural network to extract the global code 210 (and the global code 214).)
encode, using a first encoder neural network comprising first encoder parameters, a first latent representation from the input data set; encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network; (Claimed two encoder networks in 0055: As shown in FIG. 2, the deep image manipulation system 102 can utilize the same encoder neural network 206 [encode, using a first encoder neural network comprising first encoder parameters, a first latent representation from the input data set; encode, using a second encoder neural network having a same structure as the first encoder neural network] to extract the global and spatial codes from each of the first and second digital images 202, 204 [a same input data set as for the first encoder neural network]. In some embodiments, the deep image manipulation system 102 utilizes two sepa­rate encoders [encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] a spatial encoder neural network to extract the spatial code 208 (and the spatial code 212) [comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network] and a global encoder neural network to extract the global code 210 (and the global code 214 [comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network].
 Examiner notes that the same encoder network receives that same inputs 202 and 204 and processes different parameters for learning input data codes/features as claimed.)
determine a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation; (in As depicted in Fig. 4, in 0069-0071: … where x.sup.0 represents a latent code representation of the first digital image 402, x.sup.1 represents a latent code representation of the second digital image 404 [using the first latent representation and second latent representation], and the other terms are defined above. In one or more embodiments, utilizing this GAN loss alone may not be enough for the deep image manipulation system 102 to constrain the global and spatial autoencoder 112 to generate a hybrid of the first digital image 402 and the second digital image 404 [the first latent representation and second latent representation associated with generating the hyrid input into the GAN model for further processing], as the GAN loss is related only to the realism of the resultant digital image. Thus, to improve the generation of hybrid digital images [the contrastive loss based at least on a similarity between the first latent representation and the second latent representation], the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [determine a contrastive loss using the first latent representation and second latent representation]… For the contrastive loss, the deep image manipulation system 102 shrinks the custom-character.sub.2 distance ∥E(G(z))−z∥.sub.2.sup.2=∥E(G(E(x)))−E(x)∥.sub.2.sup.2 by utilizing the encoder neural network 206 (E) to scale down the magnitude of its output space. Therefore, the deep image manipulation system 102 ensures that the reconstructed code 424 (e.g., a reconstructed spatial code and a reconstructed global code extracted from the modified digital image 422 utilizing the encoder neural network 206), as given by {circumflex over (z)}=E(G(z)), closely resembles (or matches) the extracted code z (e.g., the combination of the spatial code 406 and the global code 412) itself.)
decode, using a decoder neural network comprising decoder parameters the first latent representation into an output data set. (in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [decode, using a decoder neural network comprising decoder parameters the first latent representation into an output data set] and to encourage realistic hybrid digital images…)
determine a reconstruction loss based on the output data set and the input data set; (in 0066: … For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 over multiple iterations, inputting new input digital images to generate new reconstructed digital images, determining losses, and modifying parameters for each iteration. Thus, upon determining that the GAN loss and/or the reconstruction loss 308 [determine a reconstruction loss based on the output data set and the input data set] each satisfy a threshold loss, the deep image manipulation system 102 determines that the encoder neural network 206 and the generator neural network 216 are accurate. Indeed, by combining the extracted spatial code 310 and the extracted global code 312, the generator neural network 216 generates the recon­structed digital image 304 to accurately represent the input digital image 302 […using the output data set and the input data set;]. As shown in FIG. 3, the reconstructed digital image 304 looks very similar, if not identical, to the input digital image 302.)
and update at least one parameter in the first encoder parameters of the first encoder neural network and at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss (in 0070:  … For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 over multiple iterations, inputting new input digital images to generate new reconstructed digital images, determining losses, and modifying parameters for each iteration. Thus, upon determining that the GAN loss and/or the reconstruction loss 308 [based on the contrastive loss and the reconstruction loss] each satisfy a threshold loss, the deep image manipulation system 102 determines that the encoder neural network 206 and the generator neural network 216 are accurate …Thus, to improve the generation of hybrid digital images, the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [based on the contrastive loss and the reconstruction loss]. In particular, the deep image manipulation system 102 utilizes a code reconstruction loss to learn parameters [and update at least one parameter in the first encoder parameters of the first encoder neural network…] for recon­structing the particular codes ( e.g., the spatial code 406 and the global code 412) extracted from the first digital image 402 (x0) and the second digital image 404 (x1)...; And in 0069: Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss] and to encourage realistic hybrid digital images… )
While Park teaches the use of the same model in processing input, thus considered the claimed same input as noted above.
	Additionally, Seba teaches encode, using a first encoder neural network comprising first encoder parameters, a first latent representation from the input data set; encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network; (in As depicted in Fig. 1, in [0081] The Siamese auto-encoder comprises two auto-encoder networks of the same (identical) architecture, a first auto-encoder network AE1 and a second auto-encoder AE2. The first auto-encoder network AE1 [encode, using a first encoder neural network comprising first encoder parameters a first latent representation from the input data set] and the second auto-encoder network AE2 [encode, using a second encoder neural network having a same structure as the first encoder neural network and comprising second encoder parameters that are different from the first encoder parameters, a second latent representation from a same input data set as for the first encoder neural network;] share the same weights and other parameters. [0082] The first auto-encoder network AE1 and the second auto-encoder network AE2 process the training data set [from a same input data set as for the first encoder neural network] custom-character above during the training phase.)

Seba and Park are analogous art because both involve developing information processing and recognition techniques using machine learning systems and algorithms.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for implementing deep anomaly detection techniques and systems for monitoring anomalies in large-scale industrial monitoring systems. as disclosed by Liu with the method of developing machine learning techniques developing embedding models for image processing tasks as disclosed by Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Seba and Park to process data for similarity and anomaly detection using machine learning models, (Seba, Abstract); Doing so allows for implementing reliable detection of anomalies in the complex technical system enables to schedule maintenance and repair operations of the technical system, (Seba, 0004).
Regarding claims 11 and 20, include limitations similar to claim 1 and are thus rejected under the same rationale. Regarding the remaining limitations of claim 11, Park teaches the limitations as noted above; and the limitations are rejected, as noted above; wherein the rejection per the Park reference is incorporated. 

Regarding claim 2, the rejection of claim 1 is incorporated and Park in combination with Seba teaches the system of claim 1, the operations further comprising: after the updating of the at least one parameter in the first encoder parameters, update at least one parameter in the second encoder parameters of the second encoder neural network based on a moving average of the at least one parameter in the first encoder parameters. (in 0075-0080: In some embodiments, the deep image manipula­tion system 102 utilizes a particular training objective func­tion to learn parameters [the operations further comprising: after the updating of the at least one parameter in the first encoder parameters, update at least one parameter in the second encoder parameters of the second encoder neural network based on a moving average of the at least one parameter in the first encoder parameters] of the encoder neural network 206 and the generator neural network 216 to accurately and realistically generate modified digital images in the form of hybrid digital images or reconstructed digital images… As mentioned above, the deep image manipulation system 102 utilizes a global and spatial autoencoder 112 with a novel architecture specifically for digital image manipulation… In addition, the encoder neural network 206 generates the spatial code by passing intermediate (e.g., non­output) activations or latent features into layout blocks. Each layout block upsamples the latent feature vector to a fixed size ( e.g., a spatial resolution of 32 or 64, depending on the dataset) and reduces the channel dimension (e.g., to 1 or 2 channels). The encoder neural network 206 further aggre­gates ( e.g., averages) [… based on a moving average of the at least one parameter in the first encoder parameters] the intermediate features to generate the spatial code…)
Additionally, Seba teaches, after the updating of the at least one parameter in the first encoder parameters, update at least one parameter in the second encoder parameters of the second encoder neural network based on a moving average of the at least one parameter in the first encoder parameters (in as depicted in Fig. 5 and 0082-0083: The first auto-encoder network AE1 and the second auto-encoder network AE2 process the training data set custom-character above during the training phase [after the updating of the at least one parameter in the first encoder parameters, update at least one parameter in the second encoder parameters of the second encoder neural network…].  A common approach in the training of deep neural architectures may be applied for the Siamese auto-encoder. The training data set custom-character is partitioned into several batches and a training of the weights and the parameters of the first auto-encoder network AE1 and the second auto-encoder network AE2 can be performed with a stochastic gradient descent or an Adam optimizer. …A first term L.sub.REC is a reconstruction error for the digital twin simulation data. The reconstruction error for the digital twin simulation data targets the perfect reconstruction of normal operation data samples as specified by the digital twin simulation data. The reconstruction error can be calculated as a mean squared error (MSE) [update at least one parameter in the second encoder parameters of the second encoder neural network based on a moving average of the at least one parameter in the first encoder parameters] between the input signals and the output signal of the first auto-encoder network AE1,…)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Park and Seba for the same reasons disclosed above.

Regarding claim 3, the rejection of claim 1 is incorporated and Park in combination with Seba teaches the system of claim 1, wherein the first encoder is configured to normalize the first latent representation using l2 normalization. (in 0097: … Indeed, in some embodiments, the deep image manipulation system 102 normalizes the latent codes (the global code and the spatial code) on the unit sphere and utilizes spherical [wherein the first encoder is configured to normalize the first latent representation using … normalization]…; And in [0073] where the dot product “⋅” represents the cosine similarity [wherein the first encoder is configured to normalize the first latent representation using l2 normalization], N represents a size of the digital image code repository (e.g., the number of stored codes 420 in the “Memory bank” of FIG. 4), τ=0.07 is a “temperature” parameter, z represents the latent code for either the spatial or global components, E represents the spatial encoder neural network or the global encoder neural network, … [0074] By utilizing the contrastive loss above, the deep image manipulation system 102 encourages {circumflex over (z)} to be classified as z (or at least within a threshold similarity of z) amongst N+1 exemplar classes, where each class logit is formed by cosine similarity [wherein the first encoder is configured to normalize the first latent representation using l2 normalization]…)

	
Regarding claim 8, the rejection of claim 1 is incorporated and Park in combination with Seba teaches the system of claim 1, wherein the first encoder neural network and the second encoder neural network are configured to receive the input data set as multiple mini-batches and the operations further comprising: updating the at least one parameter in the first encoder parameters of the first encoder neural network and at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches. (in 0033-0034: … In particular, while any conventional systems utilize conventional generative models to generate digital images from random samples (which makes them unfit for accurately generating specific digital images), the deep image manipulation system utilizes a novel model architecture (i.e., a global and spatial autoencoder) [the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches] designed specifically for digital image manipulation. Indeed, the architecture of the global and spatial autoencoder enable the deep image manipulation system to accurately generate specific digital images and manipulate particular attributes of digital images… In addition to its novel architecture, the deep image manipulation system trains the global and spatial autoen­coder to accurately generate specific digital images by swapping spatial codes and global codes between pairs of digital images [wherein the first encoder neural network and the second encoder neural network are configured to receive the input data set as multiple mini-batches; claimed mini-batches as image pairs], thus forcing the global and spatial autoen­coder to learn compositionality. By learning compositional­ity in this way, the deep image manipulation system can learn embeddings [the operations further comprising: updating the at least one parameter in the first encoder parameters of the first encoder neural network] that are suitable for digital image manipu­lation: spatial features naturally correspond to geometric layout of a digital image, and global features naturally capture an overall appearance. Additionally, by utilizing a contrastive loss [at least one parameter in the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches] to force extracted spatial codes and extracted global codes to be more similar to corresponding codes from input digital images than to stored spatial codes and stored global codes, the deep image manipulation sys­tem further improves the accuracy and realism of resultant digital images.); And in 0069-0070: … Further, the deep image manipulation system 102 utilizes a GAN loss associated with the discriminator 414 to determine an error or a measure of loss associated with the global and spatial autoencoder 112 [decoder parameters of the decoder neural network] and to encourage realistic hybrid digital images... In one or more embodiments, utilizing this GAN loss alone may not be enough for the deep image manipulation system 102 to constrain the global and spatial autoencoder 112 to generate a hybrid of the first digital image 402 and the second digital image 404, as the GAN loss is related only to the realism of the resultant digital image. Thus, to improve the generation of hybrid digital images, the deep image manipulation system 102 can utilize an additional loss function called a contrastive loss [the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches]. In particular, the deep image manipulation system 102 utilizes a code reconstruction loss to learn parameters [the decoder parameters of the decoder neural network based on the contrastive loss and the reconstruction loss associated with a data sample in each mini-batch in the multiple mini-batches] for recon­structing the particular codes ( e.g., the spatial code 406 and the global code 412) extracted from the first digital image 402 (x0) and the second digital image 404 (x1)…)

	
Regarding claim 11, additionally, Seba teaches determine a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation; (As depicted in Fig. 5 and in [0035] An embodiment of the method uses as the machine learning model a Siamese twin neural network comprising two auto-encoder neural networks AE1 and AE2. The two auto-encoder neural networks AE1, AE2 share a same set and values of weights and parameters [the contrastive loss based at least on a similarity between the first latent representation and the second latent representation], which encode sensory input data {right arrow over (x)} ∈custom-character.sup.D into a low-dimensional latent representation vector {right arrow over (l)}=Encode({right arrow over (x)}) ∈ custom-character.sup.L, and also decode the low-dimensional latent representation vector {right arrow over (l)} back into an output signal {right arrow over (y)}=Decode({right arrow over (l)}) ∈custom-character.sup.D of the original form of the sensory input data. The weights and parameters of the auto-encoder neural networks AE1, AE2 are trained by minimizing a loss-function, the loss function comprising three parts, L=a L.sub.REC+b L.sub.PCL+c L.sub.CL, (1) [determine a contrastive loss using the first latent representation and second latent representation] … A second part L.sub.PCL is a partial contrastive loss from anomalous data samples calculated from a second auto-encoder neural network AE2 of the two auto-encoder neural networks AE1, AE2: … and a third part L.sub.CL is a contrastive loss of latent representations [determine a contrastive loss using the first latent representation and second latent representation, the contrastive loss based at least on a similarity between the first latent representation and the second latent representation] calculated from the two auto-encoders AE1, AE2,…)
Seba and Park are analogous art because both involve developing information processing and recognition techniques using machine learning systems and algorithms.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for implementing deep anomaly detection techniques and systems for monitoring anomalies in large-scale industrial monitoring systems. as disclosed by Liu with the method of developing machine learning techniques developing embedding models for image processing tasks as collectively disclosed by Seba and Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Seba and Park to process data for similarity and anomaly detection using machine learning models, (Seba, Abstract); Doing so allows for implementing reliable detection of anomalies in the complex technical system enables to schedule maintenance and repair operations of the technical system, (Seba, 0004). The remaining limitations are rejected above, and the rejected is incorporated here.

Regarding claims 12-13,  the claim limitations are similar to claims 2-3 and thus rejected under the same rationale. 
Regarding claim 17,  the claim limitations are similar to claim 8 limitations and thus rejected under the same rationale.

Regarding independent claim 20, the limitations are similar to claim 1 limitations and are rejected under the same rationale. Additionally, Park teaches a non-transitory computer readable medium having instructions thereon, that when executed by a processor cause the processor to perform operations (in 0151-0156: Embodiments of the present disclosure can also be implemented in cloud computing environments. In this description, "cloud computing" is defined as a model for enabling on-demand network access to a shared pool of configurable computing resources… FIG. 17 illustrates, in block diagram form, an example computing device 1700 ( e.g., the computing device 1500, the client device 108, and/or the server(s) 104) that may be configured to perform one or more of the processes described above. One will appreciate that the deep image manipulation system 102 can comprise implementations of the computing device 1700… In particular embodiments, processor( s) 1702 includes hardware for executing instructions [a non-transitory computer readable medium having instructions thereon, that when executed by a processor cause the processor to perform operations], such as those making up a computer program… The computing device 1700 includes a storage device 1706 includes storage for storing data or instructions. As an example, and not by way of limitation, storage device 1706 can comprise a non-transitory storage medium described above…)

Regarding claims 21-22, the limitations are rejected  per the Park reference as noted above; wherein the rejection per the Park reference is incorporated.

Claims 3 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (Pub. No.: US 2021/0358177, hereinafter ‘Park’) in view of Jaiswal et al. (NPL: “A Survey on Contrastive Self-Supervised Learning”, hereinafter ‘Jai’) in further view Neil (NPL: “Siamese Capsule Networks”).

	
Regarding claim 3, the rejection of claim 1 is incorporated and Park in combination with Jai teaches the system of claim 1, wherein the first encoder is configured to normalize the first latent representation using l2 normalization. (in 0097: … Indeed, in some embodiments, the deep image manipulation system 102 normalizes the latent codes (the global code and the spatial code) on the unit sphere and utilizes spherical [wherein the first encoder is configured to normalize the first latent representation using … normalization]…; And in [0073] where the dot product “⋅” represents the cosine similarity [wherein the first encoder is configured to normalize the first latent representation using l2 normalization], N represents a size of the digital image code repository (e.g., the number of stored codes 420 in the “Memory bank” of FIG. 4), τ=0.07 is a “temperature” parameter, z represents the latent code for either the spatial or global components, E represents the spatial encoder neural network or the global encoder neural network, … [0074] By utilizing the contrastive loss above, the deep image manipulation system 102 encourages {circumflex over (z)} to be classified as z (or at least within a threshold similarity of z) amongst N+1 exemplar classes, where each class logit is formed by cosine similarity [wherein the first encoder is configured to normalize the first latent representation using l2 normalization]…)
	While Park in combination with Kim and Liu teaches the use of encoder networks  from embedding features and learning latent representations using normalization techniques. Park does not expressly teach that cosine similarity is an l2 normalization. 
 

Jai does expressly teach  the use of encoder networks  from embedding features and learning latent representations using normalization techniques. Park, Kim, Liu reference Jai teaches s do not expressly tech the normalization as a an l2 normalization. (in Sec. 3.1 pg. 10:  … Using a contrastive loss, it converges to make positive samples closer and negative samples far from the original sample… The features q and k generated from these encoders are used to calculate the similarity between the respective inputs using a similarity metric (discussed later in Section 5). Most of the time, the similarity metric used is cosine similarity [wherein the first encoder is configured to normalize the first latent representation using l2 normalization], which is simply the inner product of two vectors normalized to have length 1 as defined in Equation (2). And in pg. 13 Sec. 5: … Contrastive learning focuses on comparing the embeddings with a Noise Contrastive Estimation (NCE) [43] function that is defined as… where q is the original sample, k+ represents a positive sample, and k_ represents a negative sample. τ is a hyperparameter used in most of the recent methods and is called temperature coefficient. The sim() function can be any similarity function, but generally a cosine similarity as defined in Equation (2) is used. The initial idea behind NCE was to perform a nonlinear logistic regression that discriminates between observed data and some artificially generated noise. If the number of negative samples is greater, a variant of NCE called InfoNCE is used as represented in Equation (4). The use of l2 normalization (i.e., cosine similarity) [wherein the first encoder is configured to normalize the first latent representation using l2 normalization]  and the temperature coefficient effectively weighs different examples and can help the model learn from hard negatives…)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Park and Jai for the same reasons disclosed above.

Additionally, Neil expressly teaches the use of encoder networks  from embedding features and learning latent representations using normalization techniques. Park, Kim, Liu references do not expressly tech the normalization as a an l2 normalization. (in Sec. Intro & Sec. 2.: … This paper extends Capsule Networks to the pairwise learning setting to learn relationships between whole entity encodings, while also demonstrating their ability to learn from little data that can perform zero-shot learning where instances from new classes arise during testing. The Siamese Capsule Network is trained using a contrastive loss with l2-normalized encoded features [wherein the …encoder is configured to normalize the first latent representation using l2 normalization]  and demonstrated on two face verification tasks… Although, the motivation for using the vector normalization of the instantiation parameters is to force the network to preserve orientation. Lastly, a reconstruction loss on the images was used for regularization which constrains the capsules to learn properties that can better encode the entities…)
Neil, Jai, and Park are analogous art because both involve developing image processing and recognition techniques using machine learning systems and algorithms.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for implementing embedding techniques that enable the learning encoding network to learn using a contrastive loss with l2-normalized encoded features for image processing task as disclosed by Neil with the method of developing machine learning techniques developing embedding models for image processing tasks as collectively disclosed by Jai and Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Neil, Jai and Park to process image data by deploying deep neural networks using vector normalization of the instantiation parameters is to force the network to preserve orientation, (Neil, Sec. 2); Doing so allows for implementing a model trained using contrastive loss with l2-normalized capsule encoded pose features to help improve results in the few-shot learning setting where image pairs in the test set contain unseen subjects, (Neil, Abstract).
	Regarding claim 13,  the claim limitations are similar to claim 3 limitations and thus rejected under the same rationale.

	Claims 4-7 and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (Pub. No.: US 2021/0358177, hereinafter ‘Park’) in view of Jaiswal et al. (NPL: “A Survey on Contrastive Self-Supervised Learning”, hereinafter ‘Jai’) in further view of Liu et al. (NPL: “SphereFace: Deep Hypersphere Embedding for Face Recognition”, hereinafter ‘Liu’).

Regarding claim 4, the rejection of claim 3 is incorporated and Park in combination with Jai and Neil teaches the system of claim 3, wherein a distribution of the normalized first latent representation is mapped to the uniform distribution over the unit hyper-sphere. (in 0097: … Indeed, in some embodiments, the deep image manipulation system 102 normalizes the latent codes [the normalized first latent representation] (the global code and the spatial code) on the unit sphere and utilizes spherical [wherein a distribution of the normalized first latent representation is mapped to the uniform distribution over the unit hyper-sphere]…)
learning a distribution of a latent space by mapping a distribution of the first latent representation to a prior distribution, wherein the prior distribution is a uniform distribution over a unit hyper-sphere (in 0097: ... In par­ticular, the deep image manipulation system 102 utilizes the slider variable a to modify the weight or the effect of the composite global code z in generating a modified digital image. Indeed, in some embodiments, the deep image manipulation system 102 normalizes the latent codes (the global code and the spatial code) on the unit sphere [learning a distribution of a latent space by mapping a distribution of the first latent representation to a prior distribution, wherein the prior distribution is a uniform distribution over a unit hyper-sphere, wherein the prior distribution is a uniform distribution over a unit hyper-sphere] and utilizes spherical linear interpolation ( e.g., a "slerp" function) given by … where z0 is the latent code (the spatial code and the global code) for the source digital image, z1 is the latent code for the target digital image, and is a slider variable)
and wherein the mapping minimizes a contrastive loss based on the first latent representation, the second latent representation, and the prior distribution; (in 0073-0074: …In some embodiments, the deep image manipulation system 102 applies this contrastive loss [wherein the mapping minimizes a contrastive loss based on the first latent representation, the second latent representation, and the prior distribution; considered part of the deep image manipulation system] to reconstructed digital images (e.g., the reconstructed digital image 304), swapped digital images ( e.g., the modified digital image 422), and to each of the codes zs and zg. By utilizing the contrastive loss  [and wherein the mapping minimizes a contrastive loss based on the first latent representation, the second latent representation, and the prior distribution] above, the deep image manipulation system 102 encourages z-hat to be classified as z (or at least within a threshold similarity of z) amongst N+l exemplar classes, where each class logit is formed by cosine similarity. In addition, minimizing this loss [and wherein the mapping minimizes a contrastive loss based on the first latent representation, the second latent representation, and the prior distribution] also serves as a maximizing a lower bound of mutual information between z and z-hat...)

Additionally Liu teaches the hypersphere embedding as claimed wherein a distribution of the normalized first latent representation is mapped to the uniform distribution over the unit hyper-sphere, in Sec. 3.3: A-Softmax loss not only imposes discriminative power to the learned features via angular margin, but also renders nice and novel hypersphere interpretation. As shown in Fig. 3, A-Softmax loss is equivalent to learning features that are discriminative on a hypersphere manifold, while Euclidean margin losses learn features in Euclidean space. To simplify, We take the binary case to analyze the hyper-sphere interpretation… The decision boundary is equivalent to mω1 =ω2, and the constrained region for correctly classifying x to class 1 is mω1 <ω2. Geometrically speaking, this is a hypercircle-like region lying on a hypersphere manifold. For example, it is a circle-like region on the unit sphere [wherein a distribution of the normalized first latent representation is mapped to the uniform distribution over the unit hyper-sphere] in 3D case, as illustrated in Fig. 3. Note that larger m leads to smaller hypercircle-like region for each class, which is an ex-plicit discriminative constraint on a manifold… One can see that A-Softmax loss imposes arc length constraint on a unit circle in 2D case and circle-like region constraint on a unit sphere in 3D case…; And as depicted in Fig. 3:


    PNG
    media_image2.png
    484
    668
    media_image2.png
    Greyscale

Figure 3: Geometry Interpretation of Euclidean margin loss (e.g. contrastive loss, triplet loss, center loss, etc.), modified softmax loss and A-Softmax loss. The first row is 2D feature constraint [wherein a distribution of the normalized first latent representation is mapped to the uniform distribution over the unit hyper-sphere], and the second row is 3D feature constraint [wherein a distribution of the normalized first latent representation is mapped to the uniform distribution over the unit hyper-sphere]. The orange region indicates the discriminative constraint for class 1, while the green region is for class 2.

Liu, Jai, and Park are analogous art because both involve developing information image processing and recognition techniques using machine learning systems and algorithms.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for implementing deep hypersphere embedding techniques that enable the learning network to learn angularly distributed features for facial recognition tasks as disclosed by Liu with the method of developing machine learning techniques developing embedding models for image processing tasks as collectively disclosed by Jai and Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Liu, Jai and Park to process image data by deploying deep neural networks that learn angularly distributed features based on a hypersphere manifold, (Liu, Abstract & Sec. 1. Intro ); Doing so allows for implementing a Softmax loss and hypersphere manifolds that makes the learned features more effective for face recognition, (Liu, Pg. 6739: Right Col. 1st full para.).

Regarding claim 5, the rejection of claim 4 is incorporated and Park in combination with Jai and Lui teaches the system of claim 1, wherein the contrastive loss includes a positive component associated with a loss based at least on a similarity between the first latent representation and the second latent representation and a negative component associated with a loss that corresponds to mapping the first latent representation to the prior distribution that is the uniform distribution over the unit hyper-sphere. (in  0072: In other words, the deep image manipulation sys­tem 102 utilizes the positive contrastive loss component [wherein the contrastive loss includes a positive component associated with a loss based at least on a similarity between the first latent representation and the second latent representation] 416 to compare the reconstructed code 424 with the extracted code z (the spatial code 406 and the global code 412) and utilizes the negative contrastive loss component [and a negative component associated with a loss that corresponds to mapping the first latent representation to the prior distribution that is the uniform distribution over the unit hyper-sphere] 418 to compare the reconstructed code 424 with stored codes 420 such as stored spatial codes and stored global codes within a digital image code repository…; And in  (in 0073-0074: …..By utilizing the contrastive loss  [wherein the contrastive loss includes a positive component associated with a loss based at least on a similarity between the first latent representation and the second latent representation] above, the deep image manipulation system 102 encourages z-hat to be classified as z (or at least within a threshold similarity of z) amongst N+l exemplar classes, where each class logit is formed by cosine similarity. In addition, minimizing this loss also serves as a maximizing a lower bound of mutual information between z and z-hat...; And in  0097: ... In par­ticular, the deep image manipulation system 102 utilizes the slider variable a to modify the weight or the effect of the composite global code z in generating a modified digital image. Indeed, in some embodiments, the deep image manipulation system 102 normalizes the latent codes (the global code and the spatial code) on the unit sphere [and a negative component associated with a loss that corresponds to mapping the first latent representation to the prior distribution that is the uniform distribution over the unit hyper-sphere] and utilizes spherical linear interpolation ( e.g., a "slerp" function) given by … where z0 is the latent code (the spatial code and the global code) for the source digital image, z1 is the latent code for the target digital image, and is a slider variable)
Additionally Liu teaches the hypersphere embedding as claimed as noted in the incorporated claim 4 rejection. 

Regarding claim 6, the rejection of claim 5 is incorporated and Park in combination with Jai and Liu teaches the system of claim 5, wherein the prior distribution is based on prior latent representations encoded by the second encoder neural network. (Claimed two networks trained with based on claimed distribution of latent code in 0097: … Indeed, in some embodiments, the deep image manipulation system 102 normalizes the latent codes (the global code and the spatial code) [based on prior latent representations encoded by the second encoder neural network] on the unit sphere and utilizes spherical [wherein the prior distribution is based on prior latent representations encoded by the second encoder neural network]…; And claimed two networks in training, in 0055: As shown in FIG. 2, the deep image manipulation system 102 can utilize the same encoder neural network 206 to extract the global and spatial codes from each of the first and second digital images 202, 204. In some embodiments, the deep image manipulation system 102 utilizes two sepa­rate encoders [wherein the prior distribution is based on prior latent representations encoded by the second encoder neural network]: a spatial encoder neural network to extract the spatial code 208 (and the spatial code 212) and a global encoder neural network to extract the global code 210 (and the global code 214).)

Regarding claim 7, the rejection of claim 6 is incorporated and Park in combination with Jai and Liu teaches the system of claim 6, the operations further comprising: updating the distribution of a latent space using the second latent representation encoded by the second encoder neural network. (in 0071: For the contrastive loss, the deep image manipulation system 102 shrinks the … distance ∥E(G(z))−z∥2 2=∥E(G(E(x)))−E(x)∥2 2 by utilizing the encoder neural network 206 (E) [the operations further comprising: updating the distribution of a latent space using the second latent representation encoded by the second encoder neural network] to scale down the magnitude of its output space. Therefore, the deep image manipulation system 102 ensures that the reconstructed code 424 (e.g., a reconstructed spatial code and a reconstructed global code extracted from the modified digital image 422 utilizing the encoder neural network 206) [the operations further comprising: update the distribution of a latent space using the second latent representation encoded by the second encoder neural network], as given by {circumflex over (z)}=E(G(z)), closely resembles (or matches) the extracted code z (e.g., the combination of the spatial code 406 and the global code 412) itself. More specifically, the deep image manipulation system 102 utilizes the contrastive loss to determine that the reconstructed code 424 closely resembles the extracted code z in proportion to other stored codes 420 within a memory bank (e.g., a digital image code repository within the database 114); And distribution of encoder codes and parameters learned through training, in 0034: In addition to its novel architecture, the deep image manipulation system trains the global and spatial autoen­coder to accurately generate specific digital images by swapping spatial codes and global codes between pairs of digital images [the operations further comprising: update the distribution of a latent space using the second latent representation encoded by the second encoder neural network], thus forcing the global and spatial autoen­coder to learn compositionality. By learning compositional­ity in this way, the deep image manipulation system can learn embeddings [the operations further comprising: update the distribution of a latent space using the second latent representation encoded by the second encoder neural network] that are suitable for digital image manipu­lation: spatial features naturally correspond to geometric layout of a digital image, and global features naturally capture an overall appearance. Additionally, by utilizing a contrastive loss to force extracted spatial codes and extracted global codes to be more similar to corresponding codes from input digital images than to stored spatial codes and stored global codes, the deep image manipulation sys­tem further improves the accuracy and realism of resultant digital images.)
Regarding claim 14,  the claim limitations are similar to claim 4 limitations and thus rejected under the same rationale.
Regarding claim 15,  the claim limitations are similar to claim 5 limitations and thus rejected under the same rationale.
Regarding claim 16,  the claim limitations are similar to claim 7 limitations and thus rejected under the same rationale.

Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Park et al. (Pub. No.: US 2021/0358177, hereinafter ‘Park’) in view of Jaiswal et al. (NPL: “A Survey on Contrastive Self-Supervised Learning”, hereinafter ‘Jai’) in further view Kim et al. (NPL: “Deep generative-contrastive networks for facial expression recognition”, hereinafter ‘Kim’).

Regarding claim 9, the rejection of claim 4 is incorporated and Park in combination with Jai teaches the system of claim 1, the operations further comprising: retrieving samples from the learned distribution over the latent space; and decoding, using the decoder neural network, the samples into a new output data set. (0065-0066: In some embodiments, the deep image manipulation system 102 utilizes the GAN loss to further help the reconstructed digital image 304 look realistic… By utilizing the GAN loss and the reconstruction loss 308, the deep image manipulation system 102 can determine how accurate the global and spatial autoencoder 112 [the operations further comprising: retrieving samples from the learned distribution over the latent space; and decoding, using the decoder neural network, the samples into a new output data set] is when generating reconstructed digital images and can improve the accuracy over subsequent iterations. For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 [retrieving samples from the learned distribution over the latent space] over multiple iterations, inputting new input digital images to generate new reconstructed digital images [decoding, using the decoder neural network, the samples into a new output data set], determining losses, and modifying parameters for each iteration…)
Additionally, Kim does expressly teach the decoder network of the autoencoder and decoder network architecture machine learning system as claimed in the limitation: decoding, using the decoder neural network, …, as depicted in Fig. 2 and in Pgs. 2-3 Sec. III.
Kim, Jai and Park are analogous art because both involve developing information image processing and recognition techniques using machine learning systems and algorithms.
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of the prior art for implement different arrangements of embedding networks using encoder networks, decoder networks, and generative-contrastive networks for facial expression recognition tasks as disclosed by Kim with the method of developing machine learning techniques using autoencoders and deep generative models for image processing tasks as collectively disclosed by Jai and  Park.
One of ordinary skill in the arts would have been motivated to combine the disclosed methods disclosed by Kim, Jai and Park to process image data by deploying deep neural networks that embed a combination of a generative model, a contrastive model, and a discriminative model with an end-to-end training manner, (Kim, Abstract); Doing so allows for implementing the machine learning models for image processing based on contrastive representation that embed a distinctive expressive factor for a
discriminative purpose that help improve recognition accuracy, (Kim, Abstract).

Regarding claim 10, the rejection of claim 9 is incorporated and Park in combination with Jai and Kim, teaches the system of claim 9, wherein the new output data set is a new data set that is different from the input data set and the output data set, wherein the input data set comprises a first image, the output data set comprises a second image, and the new output dataset comprises a third image. (0065-0066: In some embodiments, the deep image manipulation system 102 utilizes the GAN loss to further help the reconstructed digital image 304 look realistic… By utilizing the GAN loss and the reconstruction loss 308, the deep image manipulation system 102 can determine how accurate the global and spatial autoencoder 112 is when generating reconstructed digital images [wherein the new output data set is a new data set that is different from the input data set and the output data set] and can improve the accuracy over subsequent iterations. For instance, the deep image manipulation system 102 continues training the encoder neural network 206 and the generator neural network 216 over multiple iterations, inputting new input digital images [including wherein the input data set comprises a first image] to generate new reconstructed digital images [wherein the new output data set is a new data set that is different from the input data set and the output data set …, the output data set comprises a second image, and the new output dataset comprises a third image claimed new data for the subsequent iteration; claimed old data for pervious iteration], determining losses, and modifying parameters for each iteration…)

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Zheng et al. (NPL: “Contrastive auto-encoder for phoneme recognition”): teaches that contrastive autoencoder is a single/same autoencoder that consist of sub encoders for processing a subset of a same input set for learning respective parameters. As in section 2.1:  A CsAE consists of two deep AEs. However, unlike AE and other variants of AEs, the training of CsAE must be accomplished in a pure supervised fashion. Each pair of inputs for a CsAE consists of two samples that belong to a same class and may or may not be the same sample. Each sub-autoencoder (sub-AE) has K layers for encoding and K layers for decoding and thus making 2K+1 layers altogether. The outputs of the Kth layer of both sub-AEs are contrasted, which means the difference of them contributes to the loss function. We would like such difference to be as small as possible, yet maintain the ability of both sub-AEs to reconstruct the original input signal.
Phuc et al. (NPL: “Contrastive Representation Learning: A Framework and Review”): teaches representation learning refers to the process of learning  a parametric mapping from the raw input data domain to a feature vector or tensor, in the hope of capturing and extracting more abstract and useful concepts that can improve performance on a range of downstream tasks. Contrastive representation learning can be considered as learning by comparing. Unlike a discriminative model that learns a mapping to some (pseudo-)labels and a generative model that reconstructs input samples, in contrastive learning a representation is learned by comparing among the input samples. Instead of learning a signal from individual data samples one at a time, contrastive learning learns by comparing among different samples. The comparison can be performed between positive pairs of ``similar'' inputs and negative pairs of ``dissimilar'' inputs.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLUWATOSIN ALABI whose telephone number is (571)272-0516. The examiner can normally be reached Monday-Friday, 8:00am-5:00pm EST..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/OLUWATOSIN ALABI/              Primary Examiner, Art Unit 2129
Read full office action
Prosecution Timeline

Jan 21, 2021
Application Filed
Sep 04, 2024
Non-Final Rejection — §103, §112
Feb 05, 2025
Response Filed
May 06, 2025
Final Rejection — §103, §112
Jul 09, 2025
Response after Non-Final Action
Aug 07, 2025
Request for Continued Examination
Aug 14, 2025
Response after Non-Final Action
Oct 06, 2025
Non-Final Rejection — §103, §112
Dec 02, 2025
Applicant Interview (Telephonic)
Dec 04, 2025
Examiner Interview Summary
Dec 05, 2025
Response Filed
Feb 24, 2026
Final Rejection — §103, §112
Apr 08, 2026
Examiner Interview Summary
Apr 08, 2026
Applicant Interview (Telephonic)
Precedent Cases

Applications granted by this same examiner with similar technology

18/093,594
Patent 12579409
IDENTIFYING SENSOR DRIFTS AND DIVERSE VARYING OPERATIONAL CONDITIONS USING VARIATIONAL AUTOENCODERS FOR CONTINUAL TRAINING
2y 5m to grant Granted Mar 17, 2026
18/802,747
Patent 12572814
ARTIFICIAL NEURAL NETWORK BASED SEARCH ENGINE CIRCUITRY
2y 5m to grant Granted Mar 10, 2026
18/196,986
Patent 12561570
METHODS AND ARRANGEMENTS TO IDENTIFY FEATURE CONTRIBUTIONS TO ERRONEOUS PREDICTIONS
2y 5m to grant Granted Feb 24, 2026
17/410,689
Patent 12547890
AUTOREGRESSIVELY GENERATING SEQUENCES OF DATA ELEMENTS DEFINING ACTIONS TO BE PERFORMED BY AN AGENT
2y 5m to grant Granted Feb 10, 2026
18/399,358
Patent 12536478
TRAINING DISTILLED MACHINE LEARNING MODELS
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
58%
Grant Probability
85%
With Interview (+26.3%)
3y 8m
Median Time to Grant
High
PTA Risk
Based on 199 resolved cases by this examiner. Grant probability derived from career allow rate.