Last updated: May 29, 2026
Application No. 17/697,716
TRAINING METHOD, STORAGE MEDIUM, AND TRAINING DEVICE

Final Rejection §103§112
Filed
Mar 17, 2022
Priority
Sep 24, 2019 — continuation of PCTJP2019037371
Examiner
MEYER, JACQUELINE CHRISTINE
Art Unit
2144
Tech Center
2100 — Computer Architecture & Software
Assignee
Fujitsu Limited
OA Round
4 (Final)
Interview Optional

— +70.0% interview lift. Examiner has a relatively high allowance rate (67%); +70.0% interview lift. A written response may suffice.
Based on 15 resolved cases, 2023–2026
Examiner Intelligence

MEYER, JACQUELINE CHRISTINE View full profile →
Grants 67% — above average
Career Allowance Rate
10 granted / 15 resolved
+11.7% vs TC avg
Strong +70% interview lift
Without
With
+70.0%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
14 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
5.3%
-34.7% vs TC avg
§103
85.5%
+45.5% vs TC avg
§102
5.3%
-34.7% vs TC avg
§112
1.3%
-38.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 15 resolved cases
Office Action

§103 §112
DETAILED ACTION
	This final rejection is responsive to the amendment filed on February 25, 2026. Claims 1, 3, 5, and 7-10 are pending. Claims 1, 9, and 10 are independent.
	A new grounds of rejection under 35 USC §103 has been made in light of applicant’s amendment. See sections Claim Rejections 35 USC §103 and Response to Arguments below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-10 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
The amended limitation of “the information entropy of the probability distribution of the feature data is calculated as an average of self-information of individual feature data points from the probability distribution and indicates a complexity or a spread of the probability distribution of the feature data” does not appear to be described in the disclosure. The specification generally discusses information entropy such as in paragraphs 0039-0041, 0102, and 0111-0113, however, there does not appear to be any discussion of the average of self-information of individual feature data points from the probability distribution or an indication of a complexity or a spread of the probability distribution of the feature data. The application claims priority to PCT/JP2019/037371, however, a copy of the PCT filing or WIPO publication does not appear to have been provided and, therefore, the examiner is unable to presently review the document for support of the amendment.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claims 1, 9, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Kohl et al. (US20200372654), hereinafter Kohl, in view of Stewart (Comprehensive Introduction to Autoencoders), hereinafter Stewart, in view of Vu et al. (ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation), hereinafter Vu, in view of Audhkhasi (Noise-enhanced convolutional neural networks), hereinafter Audhkhasi.

Regarding claim 1, Kohl teaches:
encoding input data by the encoder of the autoencoder to obtain feature data from the encoder; (Kohl, paragraph 0080: “Each encoder block is configured to process an encoder block input to generate an encoder block output having a lower spatial resolution than the encoder block input, i.e., such that the resolution of the encoder block output is less than the resolution of the encoder block input along at least one spatial dimension.”  - The segmentation unit of paragraph 0079 teaches the encoder block and the decoder block and therefore, the segmentation unit is analogous to the autoencoder which is encoding input data.)
obtaining a probability distribution of the feature data … (Kohl, paragraph 0073: “The posterior neural network 252 is configured to process: (i) a training image 254, and (ii) a target segmentation 256 of the training image 254, to generate an output that defines the parameters of a “posterior” probability distribution 258 over the latent space.” – The posterior neural network teaches encoder blocks and decoder blocks in paragraph 0092. The latent space being analogous to the feature data obtained by encoding the input data by the encoder of the autoencoder.)
training the autoencoder to train the probability distribution of the feature data so that an information entropy of the probability distribution of the feature data and an error between the decoded data and the input data are decreased, wherein (Kohl, paragraph 0076: “The training system 250 then jointly optimizes the parameters of the prior neural network 206, the segmentation neural network 212, and the posterior neural network 252 to optimize an objective function, e.g., using gradient descent. The objective function                         
                            L
                        
                     (evaluated for a given training example) may be given by:
                
                    L
                    =
                    E
                    r
                    r
                    
                            Y
                            ,
                            
                                    Y
                                
                                ^
                            
                    -
                    β
                    ∙
                    D
                    
                            P
                            ,
                            Q
                        
                    (
                    1
                    )
                
where Err(Y,Ŷ) denotes an error (e.g., a cross entropy error) between the target segmentation Y 256 and the predicted segmentation Ŷ 262 (i.e., that is generated by processing a latent variable sampled in accordance with the posterior probability distribution), β is a constant factor, and D(P,Q) represents an error (e.g., a Kullback-Leibler divergence) between the prior probability distribution P 264 and the posterior probability distribution Q 258 for the training example. Optimizing the objective function in equation (1) encourages the prior probability distribution 264 to match the posterior probability distribution 258, and encourages the predicted segmentation 262 corresponding to a latent variable sampled in accordance with the posterior distribution 258 to match the target segmentation 256.” – cross entropy error is analogous to the error between the decoded data and the input data, and D(P,Q) is analogous to the information entropy of the probability distribution. Prior probability distribution to match the posterior probability distribution indicates that the training is also training the probability distribution.)

Kohl does not explicitly teach:
	That the obtaining a probability distribution of the feature data is done by inputting the feature data into a model, the model being configured to estimate the probability distribution from the feature data input to the model;
	adding a noise to the feature data;
generating decoded data by decoding, by the decoder of the autoencoder, the feature data to which the noise is added; and
the information entropy of the probability distribution of the feature data is calculated as an average of self-information of individual feature data points from the probability distribution and indicates a complexity or a spread of the probability distribution of the feature data,
the noise is a uniform random number, based on a distribution of which an average is zero, that has dimensions as many as the feature data output from the encoder of the autoencoder and is uncorrelated between dimensions, and
the training includes training the autoencoder and the model such that the probability distribution estimated by the model reflects a probability density of the input data.

However, Stewart teaches:
	adding a noise to the feature data; (Stewart, page 16, last paragraph: “Looking at the below image, we can consider that our approximation to the data generating procedure decides that we want to generate the number ‘2’,so it generates the value 2 from the latent variable centroid. However, we may not want to generate the same looking ‘2’ every time, as in our videogame example with plants, so we add some random noise to this item in the latent space, which is based on a random number and the ‘learned’ spread of the distribution for the value ‘2’.” – See figure below where the random numbers in the latent space is analogous to noise of the feature data.)
generating decoded data by decoding, by the decoder of the autoencoder, the feature data to which the noise is added; and (Stewart, page 17, paragraph 1: “We pass this through our decoder network and we get a 2 which looks different to the original.” – See figure below, the noise is added to the feature data before decoding the data and generating an output.)

    PNG
    media_image1.png
    423
    764
    media_image1.png
    Greyscale

	Stewart is considered analogous to the claimed invention as it is in the same field of endeavor, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have modified Kohl, which already teaches training an autoencoder but does not explicitly teach adding noise to the feature data then generating decoded data using that noisy feature data, to include the teachings of Stewart which does teach adding noise to the feature data then generating decoded data using that noisy feature data so that “the network does not arbitrarily place characters in the latent space, making the transitions between values less spurious.” (Stewart, page 18, paragraph 1)

Kohl and Stewart do not explicitly teach:
	That the obtaining a probability distribution of the feature data is done by inputting the feature data into a model, the model being configured to estimate the probability distribution from the feature data input to the model;
the information entropy of the probability distribution of the feature data is calculated as an average of self-information of individual feature data points from the probability distribution and indicates a complexity or a spread of the probability distribution of the feature data,
the noise is a uniform random number, based on a distribution of which an average is zero, that has dimensions as many as the feature data output from the encoder of the autoencoder and is uncorrelated between dimensions, and
the training includes training the autoencoder and the model such that the probability distribution estimated by the model reflects a probability density of the input data.

However, Vu teaches:
	That the obtaining a probability distribution of the feature data is done by inputting the feature data into a model, the model being configured to estimate the probability distribution from the feature data input to the model; (Vu, page 2519, last paragraph of column 1: “Let F be a semantic segmentation network which takes an image x and predicts a C-dimensional “soft-segmentation map” F(x)=Px= P(h,w,c) x h,w,c. By virtue of final softmax layer, each C-dimensional pixel-wise vector P(h,w,c) x c behaves as a discrete distribution over classes.” – the semantic segmentation network presenting a prediction as a discrete distribution over classes is analogous to the model that the feature data is being input into to estimate the probability distribution.)
the information entropy of the probability distribution of the feature data is calculated as an average of self-information of individual feature data points from the probability distribution and indicates a complexity or a spread of the probability distribution of the feature data, (Vu, page 2520, column 2, paragraphs 1 and 2: “To this end, we formulate the UDA task as minimizing distribution distance between source and target on the weighted self-information space….By aligning weighted self-information distributions of target and source domains, we indirectly minimize the entropy of target predictions. Moreover, as the adaptation is done on the weighted self-information space, our model leverages structural information from source to target. In detail, given a pixel-wise predicted class score P(h,w,c) x , the self-information or “surprisal” [40] is defined as −logP(h,w,c) x . Effectively, the entropy E(h,w) x in (2) is the expected value of the self-information Ec[−logP(h,w,c) x ]. We here perform adversarial adaptation on weighted self-information maps Ix composed of pixel level vectors I(h,w) x =−P(h,w) x · log P(h,w) x .3 Such vectors can be seen as the disentanglement of the Shannon Entropy.” – The self-information distributions of the source and target domains is analogous to the self-information of individual feature data points from the probability distribution and the Shannon entropy ensures that this is an average of self-information and indicates a complexity or spread of the probability distribution of the feature data.)
the training includes training the autoencoder and the model such that the probability distribution estimated by the model reflects a probability density of the input data. (Vu, fig. 2: “First, direct entropy minimization minimizes the entropy of the target Pxt , which is equivalent to minimizing the sum of weighted self-information maps Ixt . In the second, complementary approach, we use adversarial training to enforce the consistency in Ix across domains.” – Figure 2 shows that the training includes the self-information maps which is found using the semantic segmentation network, i.e. the model, and the adversarial training is a convolutional network (page 2521, column 2, paragraph 3) and is therefore analogous to the training of the autoencoder which is already taught by Kohl.)
	Vu is considered analogous to the claimed invention as it is in the same field of endeavor, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have modified Kohl and Stewart, which already teaches training an autoencoder and obtaining a probability distribution of the feature data but does not explicitly teach that the probability distribution of the feature data is obtaining by using a model configured to estimate the probability from the feature data or that the information entropy is calculated as an average of self-information of individual feature data points from the probability distribution, to include the teachings of Vu which does teach that the probability distribution of the feature data is obtaining by using a model configured to estimate the probability from the feature data or that the information entropy is calculated as an average of self-information of individual feature data points from the probability distribution as the ensemble of the two models and the use of the self-information for entropy loss shows an improvement over the current baselines. (Vu, page 2523, column 1, paragraph 1 and page 2524, column 2, paragraph 2)

Kohl, Stewart, and Vu do not explicitly teach:
the noise is a uniform random number, based on a distribution of which an average is zero, that has dimensions as many as the feature data output from the encoder of the autoencoder and is uncorrelated between dimensions, and (Audhkhasi, Fig. 6 description: “We added zero mean uniform                         
                            
                                    -
                                    0.5
                                     
                                                c
                                            
                                                        t
                                                    
                                                        d
                                                    
                                    ,
                                     
                                    0.5
                                     
                                                c
                                            
                                                        t
                                                    
                                                        d
                                                    
                     noise where                        
                             
                            c
                             
                            =
                             
                            0
                            ,
                             
                            0.2
                            ,
                             
                            .
                             
                            .
                             
                            .
                             
                            ,
                             
                            2.8
                            ,
                             
                            3
                        
                    ,                         
                            t
                        
                     was the training epoch, and                         
                            d
                             
                            =
                             
                            5
                        
                     was the noise annealing factor” and page 17, column 1, paragraph 5: “The hyperplane structure implies that the NCNN hyperplane imposes only a simple linear condition on the noise. The three dimensions of the noise space in this example correspond to the three output neurons in Fig. 2. Adding uniform noise to the output neurons defines the uniform noise cube.” – Here uniform noise is added to the output neurons but it can also be added to the hidden layers – see page 17, column 1, paragraph 4. The hyperplane contains independent and identically distributed uniform noise which is analogous to it being uncorrelated between dimensions – see page 17, Fig. 3 description.)
	Audhkhasi is considered analogous to the claimed invention as it is in the same field of endeavor, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have modified Kohl, Stewart, and Vu, which already teaches training an autoencoder by adding noise to the feature data but does not explicitly teach that the noise is a uniform random number, to include the teachings of Audhkhasi which does teach that the noise is a uniform random number since “injecting carefully chosen noise can speed convergence in the backpropagation training of a convolutional neural network (CNN).” (Audhkhasi, abstract)

Regarding claim 9, Claim 9 has all the same limitations of claim 1 which are taught by Kohl, Stewart, Vu, and Audhkhasi – see claim 1 above.
Kohl further teaches:
	A non-transitory computer-readable storage medium storing a training program of training an autoencoder that includes an encoder performing encoding and a decoder performing decoding, the training program being a program that causes at least one computer to execute a process, the process comprising: (Kohl, paragraph 0031: “In an embodiment there is provided one or more computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform the operations of one or more of the respective methods described hereinbefore.”)

Regarding claim 10, Claim 10 has all the same limitations of claim 1 which are taught by Kohl, Stewart, Vu, and Audhkhasi – see claim 1 above.
Kohl further teaches:
A training device of training an autoencoder that includes an encoder performing encoding and a decoder performing decoding, the training device comprising: (Kohl, paragraph 0109: “Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.”)
one or more memories; and one or more processors coupled to the one or more memories and the one or more processors configured to: (Kohl, paragraph 0109: “The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.”)

	Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Kohl in view of Stewart in view of Vu in view of Audhkhasi in view of Zong et al. (Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection), hereinafter Zong.
	Zong was cited in applicant’s IDS filed on 3/17/2022.

Regarding claim 3, Kohl, Stewart, Vu, and Audhkhasi teach the method of claim 1, as cited above.
Kohl, Stewart, Vu, and Audhkhasi do not explicitly teach:
the model is a Gaussian mixture model, wherein
the training includes training the autoencoder to train an encoding parameter of the autoencoder, a decoding parameter of the autoencoder, and a parameter of the Gaussian mixture model.

However, Zong teaches:
the model is a Gaussian mixture model, wherein (Zong, abstract: “Our model utilizes a deep autoencoder to generate a low-dimensional representation and reconstruction error for each input data point, which is further fed into a Gaussian Mixture Model (GMM).” – The Gaussian mixture model is used to generate the probabilities of the feature data (see Zong, section 3.3, paragraph 2) and, therefore, is analogous to the model taught by Vu above.)
the training includes training the autoencoder to train an encoding parameter of the autoencoder, a decoding parameter of the autoencoder, and a parameter of the Gaussian mixture model.  (Zong, abstract: “Instead of using decoupled two-stage training and the standard Expectation-Maximization (EM) algorithm, DAGMM jointly optimizes the parameters of the deep autoencoder and the mixture model simultaneously in an end-to-end fashion, leveraging a separate estimation network to facilitate the parameter learning of the mixture model.” – The parameters of the deep autoencoder include both the encoder and decoder, thus optimizing the parameters of the deep autoencoder during training is training an encoding parameter of the autoencoder and decoding parameters of the autoencoder. The mixture model is the Gaussian mixture model and optimizing the parameters of the mixture model is analogous to training the parameters of the Gaussian mixture model.)
	Zong is considered analogous to the claimed invention as it is in the same field of endeavor, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have modified Kohl, Stewart, Vu, and Audhkhasi, which already teaches training a model and an autoencoder but does not explicitly teach that the model is a Gaussian mixture model or that the training includes training the encoding and decoding parameters of the autoencoder and a parameter of the Gaussian mixture model, to include the teachings of Zong which does teach that the model is a Gaussian mixture model or that the training includes training the encoding and decoding parameters of the autoencoder and a parameter of the Gaussian mixture model in order to help “the autoencoder escape from less attractive local optima and further reduce reconstruction errors, avoiding the need of pre-training.” (Zong, abstract)

	Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Kohl in view of Stewart in view of Vu in view of Audhkhasi in view of Balle et al. (Variational Image Compression with a Scale Hyperprior), hereinafter Balle.

Regarding claim 5, Kohl, Stewart, Vu, and Audhkhasi teach the method of claim 1, as cited above.
Kohl, Stewart, Vu, and Audhkhasi do not explicitly teach:
wherein the obtaining includes obtaining the probability distribution parametrically.  

However, Balle teaches:
wherein the obtaining includes obtaining the probability distribution parametrically.  (Balle, Page 4, paragraph 1: “In variational inference, the goal is to approximate the true posterior                          
                            
                                    p
                                
                                            y
                                        
                                        ~
                                    
                                    |
                                    x
                                
                            (
                            
                                    y
                                
                                ~
                            
                            ,
                            x
                            )
                        
                    , which is assumed intractable, with a parametric variational density                         
                            q
                            
                                            y
                                        
                                        ~
                                    
                                    |
                                    x
                                
                     by minimizing the expectation of their Kullback–Leibler (KL) divergence over the data distribution                        
                             
                                    p
                                
                                    x
                                
                    ” – using the parametric variational density to minimize the divergence over the data distribution is analogous to obtaining the probability distribution parametrically.)
	Balle is considered analogous to the claimed invention as it is in the same field of endeavor, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have modified Kohl, Stewart, Vue, and Audhkhasi, which already teaches training an autoencoder and obtaining a probability distribution but does not explicitly teach that obtaining the probability distribution is done parametrically, to include the teachings of Balle which does teach that obtaining the probability distribution is done parametrically since “the minimization of the KL divergence is equivalent to optimizing the compression model for rate-distortion performance.” (Balle, page 4, paragraph 2)

	Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Kohl in view of Stewart in view of Vu in view of Audhkhasi in view of Liano (Robust Error Measure for Supervised Neural Networks with Outliers), hereinafter Liano.

Regarding claim 7, Kohl, Stewart, Vu, and Audhkhasi teach the method of claim 1, as cited above.
Kohl, Stewart, Vu, and Audhkhasi do not explicitly teach:
wherein the first error is a squared error between the decoded data and the input data.  

However, Liano teaches:
wherein the first error is a squared error between the decoded data and the input data.  (Liano, Section II, paragraph 2: “where the error function                         
                            ρ
                            (
                            ∙
                            )
                        
                     is symmetric and continuous,                         
                            
                                    r
                                
                                    i
                                
                            =
                            
                                    t
                                
                                    i
                                
                            -
                             
                                    y
                                
                                    i
                                
                     is the residual of pattern                         
                            i
                        
                     with target                         
                            t
                        
                    , and                         
                            N
                        
                     is the number of training patterns.” And Section II(A) paragraph 1: “Using the notation defined above, the LMS method is realized by setting the error function                         
                            ρ
                            
                                    r
                                
                            =
                            
                                    1
                                
                                    2
                                
                                    r
                                
                                    2
                                
                    ” – here, the target t being the input data and y being the output data and the LMS method is the least mean square which is thus a squared error between the decoded data and the input data.)
	Liano is considered analogous to the claimed invention as it is in the same field of endeavor, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have modified Kohl, Stewart, Vu, and Audhkhasi, which already teaches training an autoencoder using the error between decoded data and input data but does not explicitly teach that the error is  a squared error, to include the teachings of Liano which does teach that the error is  a squared error since it is a well-known method in training neural networks (Liano, see abstract and introduction paragraph 2)

	Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Kohl in view of Stewart in view of Vu in view of Audhkhasi in view of Xu et al. (Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications), hereinafter Xu.

Regarding claim 8, Kohl, Stewart, Vu, and Audhkhasi teach the method of claim 1, as cited above.
Kohl, Stewart, Vu, and Audhkhasi do not explicitly teach:
wherein the process further comprising performing anomaly detection on input new data based on the trained autoencoder and the probability distribution.

However, Xu teaches:
wherein the process further comprising performing anomaly detection on input new data based on the trained autoencoder and the probability distribution.  (Xu, Section 3.3, paragraph 1: “In the scope of anomaly detection, the likelihood of observation window x, i.e., pθ (x) in VAE, is an important output.”)
	Xu is considered analogous to the claimed invention as it is in the same field of endeavor, machine learning. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have modified Kohl, Stewart, Vu, and Audhkhasi, which already teaches an autoencoder but do not explicitly teach that the trained autoencoder is used to perform anomaly detection, to include the teachings of Xu which does teach that the trained autoencoder is used to perform anomaly detection “since we want to see how well a given x follows the normal patterns.” (Xu, section 3.3, paragraph 1)

Response to Arguments
Applicant’s arguments, filed on February 25, 2026, with respect to the rejection(s) of claim(s) 1, 3, 5, and 7-10 under 35 USC §103 have been fully considered and are persuasive. In particular, none of the previously cited references teach the amended limitation. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Kohl, Stewart, Vu, and Audhkhasi as Vu does teach the amended limitation of "the information entropy of the probability distribution of the feature data is calculated as an average of self-information of individual feature data points from the probability distribution and indicates a complexity or a spread of the probability distribution of the feature data.”

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Yu and Principe (Understanding autoencoders with information theoretic concepts)
Preiswerk (Shannon entropy in the context of machine learning and AI)
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JACQUELINE MEYER whose telephone number is (703)756-5676. The examiner can normally be reached M-F 8:00 am - 4:30 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached at 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/J.C.M./               Examiner, Art Unit 2144                                                                                                                                                                                         
/TAMARA T KYLE/               Supervisory Patent Examiner, Art Unit 2144
Read full office action
Prosecution Timeline

Show 1 earlier event
Feb 13, 2025
Non-Final Rejection mailed — §103, §112
May 13, 2025
Response Filed
Jul 21, 2025
Final Rejection mailed — §103, §112
Oct 15, 2025
Request for Continued Examination
Oct 19, 2025
Response after Non-Final Action
Nov 26, 2025
Non-Final Rejection mailed — §103, §112
Feb 25, 2026
Response Filed
Apr 24, 2026
Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/368,168
Patent 12639619
ARTIFICIAL INTELLIGENCE-BASED MULTI-GOAL-AWARE DEVICE SAMPLING
4y 10m to grant Granted May 26, 2026
17/539,971
Patent 12608611
MULTISCALE DIMENSIONAL REDUCTION OF DATA
4y 4m to grant Granted Apr 21, 2026
17/381,240
Patent 12585981
MANAGING AN INSTALLED BASE OF ARTIFICIAL INTELLIGENCE MODULES
4y 8m to grant Granted Mar 24, 2026
17/570,468
Patent 12468941
SYSTEMS AND METHODS FOR DYNAMICS-AWARE COMPARISON OF REWARD FUNCTIONS
3y 10m to grant Granted Nov 11, 2025
Study what changed to get past this examiner. Based on 4 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+70.0%)
3y 10m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 15 resolved cases by this examiner. Grant probability derived from career allowance rate.