Last updated: April 19, 2026
Application No. 17/158,466
INTERPRETING CONVOLUTIONAL SEQUENCE MODEL BY LEARNING LOCAL AND RESOLUTION-CONTROLLABLE PROTOTYPES

Non-Final OA §101§103
Filed
Jan 26, 2021
Examiner
YI, HYUNGJUN B
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
NEC Laboratories America Inc.
OA Round
5 (Non-Final)
Interview Optional

— +31.7% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 17 resolved cases, 2023–2026
Examiner Intelligence

YI, HYUNGJUN B View full profile →
Grants only 18% of cases
Career Allow Rate
3 granted / 17 resolved
-37.4% vs TC avg
Strong +32% interview lift
Without
With
+31.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.3%
-13.7% vs TC avg
§103
53.9%
+13.9% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
4.7%
-35.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 17 resolved cases
Office Action

§101 §103
DETAILED ACTION
This action is responsive to the claims filed on 10/14/2025. Claims 1-12 and 14-21 are pending for examination.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/14/2025 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to the claims have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-12 and 14-21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Statutory Categories
Claims 1-12, 14, and 21 are directed to a method.
Claims 15-19 are directed to a method.
Claim 20 is directed to a system.

Independent Claim 1, 15, and 20
Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes. Independent claim 1, 15, and 20 recites limitations that are abstract ideas in the form of mental processes:
Claim 1 recites:
A computer-implemented method for interpreting a convolutional sequence model, the method comprising: (this limitation recites converting input data using a convolutional process, stated at a high level with no further indication as to how the converting should be performed, which can reasonably be performed as a mental process or with aid of pen and paper)
clustering…, the plurality of input segments into clusters using respective resolution-controllable class prototypes allocated to each of a plurality of classes, each of the respective resolution-controllable class prototypes including a respective subset of the output features that characterizes a respective associated one of the plurality of classes; (this limitation recites clustering of some representation of class prototypes, which can reasonably be performed as a mental process or with aid of pen and paper)
calculating, using the clusters, similarity scores that indicate a similarity of a given one of the output features to a given one of the respective resolution-controllable class prototypes responsive to distances, in a latent space, between the output feature and the respective resolution- controllable class prototypes; (this limitation recites calculating similarity scores, stated at a high level with no further indication as to how the calculation should be performed, which can reasonably be performed as a mental process or with aid of pen and paper)
concatenating the similarity scores to obtain a similarity vector; (this limitation recites concatenation of the score values, which can reasonably be performed as a mental process or with aid of pen and paper)
performing, … a prediction and prediction support operation that provides a value of prediction and an interpretation for the value of prediction responsive to the input segments and the similarity vector by solving an optimization problem having a cross- entropy loss (,e), a gradient loss (-a), a closeness loss (ve) and a diversity regularization term (ed), wherein the interpretation for the value of prediction is provided using only non-negative weights and lacking a weight bias in the fully connected layer and wherein the closeness loss is expressed as: 
    PNG
    media_image1.png
    54
    361
    media_image1.png
    Greyscale
 where                     
                        θ
                    
                 represents a set of trainable parameters,                     
                        
                            
                                
                                    
                                        
                                            
                                                x
                                            
                                            
                                                i
                                            
                                            
                                                t
                                            
                                        
                                    
                                
                            
                            
                                t
                                -
                                1
                            
                            
                                T
                            
                        
                    
                 is a sequence data of length T, y is a label,                     
                        
                            
                                P
                            
                            
                                y
                            
                        
                    
                 represents the set of prototypes that are associated with class y, D represents the training dataset, w is a filter size,                     
                        
                            
                                z
                            
                            
                                t
                            
                        
                    
                 is an encoded sequence , and                     
                        
                            
                                p
                            
                            
                                *
                            
                        
                    
                 is a prototype (this limitation merely comprises a mathematical analysis of data and is being considered as directed to a mathematical concept, see MPEP 2106.04(a), paragraph 51 of this application’s specification outlines the mathematical procedure for this step)
and selecting from a training set a limited number of prototypical segments that are deterministic in classifying new sequences and learning an internal notion of similarity for comparing segments of new sequences with learned prototypes. (this limitation recites selection of prototypes at a high level of generality, which can reasonably be performed as a mental process or with aid of pen and paper)
This claim further recites the following additional elements for the purposes of Step 2A Prong Two analysis:
converting, by a convolutional layer having one or more filters and a sliding window, an input data sequence having a plurality of input segments into a set of output features; (this limitation invokes convolutional layers and a sliding window merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
by a fully connected layer, (this limitation invokes fully-connected layers merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
in multiple protype storage elements (this limitation invokes prototype storage elements merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
The additional limitations fail step 2A Prong 2 of the 101 analysis because they do not transform the claim into a practical application. These limitations are too abstract or lack technical improvement that would make the concept practically useful. Without clear utility or integration into a specific field, the claim does not relate to any particular application. It does not meet the requirements of Step 2A Prong 2, as it fails to make the concept meaningfully applicable in practice. Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
This claim recites the following additional elements for the purposes of Step 2B analysis:
by a convolutional layer having one or more filters and a sliding window (this limitation invokes convolutional layers merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
by a fully connected layer, (this limitation invokes fully-connected layers merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
in multiple protype storage elements (this limitation invokes prototype storage elements merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
The claim also fails Step 2B of the analysis because the additional limitations do not amount to significantly more than the abstract idea itself. The additional limitations do not enhance the claim in a way that would move it beyond its abstract ideas as they minimally elaborate on the core concept without adding any inventive or technical substance. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Claims 15 and 20 recite limitations substantially similar to claim 1, as such a similar analysis applies.
Claim 15 also recites the following additional limitations for consideration which Hirate further teaches:
A computer program product for interpreting a convolutional sequence model, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising (Under step 2A prong II and step 2B this limitation merely invokes computers or machinery as a tool to perform an existing process and is considered mere instructions to apply an exception, see MPEP 2106.05(f)(2))
Claim 20 also recites the following additional limitations for consideration which Hirate further teaches:
A computer processing system for interpreting a convolutional sequence model, the system comprising: a memory device for storing program code therein; and a processor device operatively coupled to the memory device for running the program code to (Under step 2A prong II and step 2B this limitation merely invokes computers or machinery as a tool to perform an existing process and is considered mere instructions to apply an exception, see MPEP 2106.05(f)(2))

Dependents of Claims 1, 15, and 20
The remaining dependent claims corresponding to independent claims 1, 15, and 20 do not recite additional elements, whether considered individually or in combination, that are sufficient to integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. The analysis of which is shown below:
The claims below recite additional limitations which fail step 2A Prong 2 of the 101 analysis because they do not transform the claim into a practical application. These limitations are too abstract or lack technical improvement that would make the concept practically useful. Without clear utility or integration into a specific field, the claim does not relate to any particular application. It does not meet the requirements of Step 2A Prong 2, as it fails to make the concept meaningfully applicable in practice.
The claims also fails Step 2B of the analysis because the additional limitations do not amount to significantly more than the abstract idea itself. The additional limitations do not enhance the claim in a way that would move it beyond its abstract ideas as they minimally elaborate on the core concept without adding any inventive or technical substance. The claims are unpatentable.

Claim 2 recites the further limitation of:
The computer-implemented method of claim 1, wherein the set of output features is represented by a non-linear function plus a bias term. (this limitation merely comprises a mathematical analysis of data and is being considered as directed to a mathematical concept, see MPEP 2106.04(a), paragraph 32 of this application’s specification outlines the mathematical procedure for this step)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 3 recites the further limitation of:
The computer-implemented method of claim 1, wherein each of the class prototypes collectively form a class prototype vector that is a latent representation of a prototypical segment learned through gradient descent. (this limitation merely comprises a mathematical analysis of data and is being considered as directed to a mathematical concept, see MPEP 2106.04(a), paragraph 40 of this application’s specification outlines the mathematical procedure for this step)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 4 recites the further limitation of:
The computer-implemented method of claim 1, wherein each of the respective resolution-controllable class prototypes has a selectable resolution corresponding to an associated one of the one or more filters. (a selected resolution for a prototype is being considered a mental process of evaluation which can reasonably be performed in human mind or with aid of pen and paper)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 5 recites the further limitation of:
The computer-implemented method of claim 1, wherein a dimensionality of each of the respective resolution-controllable class prototypes is equal to a dimensionality of each of the output features allocated thereto. (equal dimensionality between prototypes and output features is being considered as mere instructions to apply an exception using a generic computer)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 6 recites the further limitation of:
The computer-implemented method of claim 1, wherein a number of the multiple protype storage elements is equal to a number of the one or more filters in the convolutional layer. (an equal number of prototype storage and filters is being considered as mere instructions to apply an exception using a generic computer)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 7 recites the further limitation of:
The computer-implemented method of claim 1, wherein the one or more filters comprise multiple filters having different size lengths configured to selectively address different resolutions of the plurality of input segments corresponding to different levels of granularity. (filters further defining different size lengths corresponding to the resolution/granularity is being considered as mere instructions to apply an exception using a generic computer)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 8 recites the further limitation of:
The computer-implemented method of claim 1, wherein the similarity scores range from 0 to 1, wherein a 0 indicates that the given one of the output features is different from the given one of the respective resolution-controllable class prototypes, and a 1 indicates that the given one of the output features is identical to the given one of the respective resolution- controllable class prototypes. (this limitation merely comprises a mathematical analysis of data and is being considered as directed to a mathematical concept, see MPEP 2106.04(a), paragraph 41 of this application’s specification outlines the mathematical procedure for this step)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 9 recites the further limitation of:
The computer-implemented method of claim 1, further comprising performing a max pooling operation on the similarity scores to obtain a similarity vector. (this limitation merely comprises a mathematical analysis of data and is being considered as directed to a mathematical concept, see MPEP 2106.04(a), paragraph 35 of this application’s specification outlines the mathematical procedure for this step)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 10 recites the further limitation of:
The computer-implemented method of claim 9, a wherein the closeness loss evaluates a similarity vector to determine closeness of the given one of the output features to the respective resolution-controllable class prototypes. (this limitation merely comprises a mathematical analysis of data and is being considered as directed to a mathematical concept, see MPEP 2106.04(a), paragraph 56 of this application’s specification outlines the mathematical procedure for this step)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 11 recites the further limitation of:
The computer-implemented method of claim 1, further comprising applying a softmax operation to an output of the fully connected layer to obtain the value of prediction and the interpretation for the value of prediction. (this limitation merely comprises a mathematical analysis of data and is being considered as directed to a mathematical concept, see MPEP 2106.04(a), paragraph 46 of this application’s specification outlines the mathematical procedure for this step)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 12 recites the further limitation of:
The computer-implemented method of claim 1, further comprising pushing together the given one of the plurality of segments to corresponding ones of the respective resolution-controllable class prototypes under a constraint that pushing is only to occur toward the corresponding ones of the respective resolution-controllable class prototypes having a same class and further under at least one distance-based loss function. (this limitation merely comprises a mathematical analysis of data and is being considered as directed to a mathematical concept, see MPEP 2106.04(a), paragraph 51 of this application’s specification outlines the mathematical procedure for this step)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 14 recites the further limitation of:
The computer-implemented method of claim 1, wherein the diversity regularization term penalizes small distances between the respective resolution- controllable class prototypes below a diversity regularization threshold distance. (this limitation merely comprises a mathematical analysis of data and is being considered as directed to a mathematical concept, see MPEP 2106.04(a), paragraph 56 of this application’s specification outlines the mathematical procedure for this step)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claims 16-19 have limitations substantially identical to claims 2-5, as such a similar analysis applies.
Claim 21 recites the further limitation of:
The computer-implemented method of claim 1, wherein the convolutional layer has a plurality of filters of different sizes. (filters further defining different size lengths corresponding to the resolution/granularity is being considered as mere instructions to apply an exception using a generic computer)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3, 6, 15, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Saralajew et al. (Saralajew, S., Holdijk, L., Rees, M., & Villmann, T. (2018). Prototype-based Neural Network Layers: Incorporating Vector Quantization. ArXiv, abs/1812.01214), hereafter referred to as Saralajew, in view of Jianmin et al. (CN 106355442 A), hereafter referred to as Jianmin, and in further view of Hirate et al. (US 11755624 B2), hereafter referred to as Hirate, and Ming et al. (Ming, Y., Xu, P., Qu, H., & Ren, L. (2019, July). Interpretable and steerable sequence learning via prototypes. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 903-913).), hereafter referred to as Ming, and Chen et al., (Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., & Su, J. K. (2019). This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32.), hereafter referred to as Chen.

Regarding claim 1, Saralajew teaches the following limitations:
A computer-implemented method for interpreting a convolutional sequence model, the method comprising: converting, by a convolutional layer having one or more filters and a sliding window, an input data sequence having a plurality of input segments into a set of output features (Saralajew, page 6, paragraph 3, “Like usual in CNNs, we filter the images with a stack of filters. Assume there are Nf filters of shape k collected in a tensor of the shape wf × hf × c × Nf . Then, the convolution with all these filters can be seen as a matrix multiplication. More precisely, it is a linear transformation of the vector collected over a sliding window”, support showing that the model is a convolutional sequence model, where convolutional layers process training data samples and creates a feature map using a sliding window.); 
clustering, in multiple protype storage elements, the plurality of input segments into clusters… allocated to each of a plurality of classes, each of the respective resolution-controllable class prototypes including a respective subset of the output features that characterizes a respective associated one of the plurality of classes (Saralajew, page 2, section 2, paragraph 2, “These methods are based on a set W = {w1, w2, ..., wNW } of prototypes wk,”
Saralajew, page 3, paragraph 2“For unsupervised prototype-based vector quantization (VQ), each prototype is assumed to serve as a cluster center and hence, κ (x) of the WTA-rule (3) delivers the cluster index.”, 
Saralajew, page 3, paragraph 3, “In LVQ, each prototype wk is additionally equipped with a class label ck ∈ C = {1, 2, ..., NC}”, Saralajew teaches storing multiple prototypes as a set W, which corresponds to the claimed “multiple prototype storage elements,” and further teaches prototype-based clustering in which each prototype acts as a cluster center and a cluster index is produced via κ (x), which corresponds to clustering input segments into clusters using prototypes. Saralajew additionally teaches “class prototypes” by equipping each prototype w_k with a class label c_k in LVQ, which corresponds to prototypes being allocated/associated with classes. (The “resolution-controllable” aspect of the class prototypes is not relied upon from Saralajew and is addressed by Jianmin below.); 
and performing, by a fully connected layer a prediction and prediction support operation that provides a value of prediction and an interpretation for the value of prediction responsive to the input segments and the similarity vector (Saralajew, page 6, last paragraph, “During the training of the NN the prototypes are trained in parallel with the feature extraction layers, using the output of the feature extraction layer as input to the prototype-based model…In general, the output vector o(x) of (7) in the last FCL of a NN is element-wise normalized by the softmax activation… The training of the network is usually realized applying the cross entropy loss for the network class probability pˆ (x) and the true class probability p (x).”, Saralajew describes using the last fully connected layer to produce an output vector and applying a softmax to obtain class probabilities (a prediction). Furthermore, Saralajew trains the model using cross-entropy loss on the class probability output, which corresponds to the associated prediction support operation being optimized during training.)
a gradient loss (la), (Saralajew, page 12, paragraph 5, “For some of the regularization and loss terms described in this report it is needed to estimate the data distribution over the training dataset. Since the network is optimized by stochastic gradient descent learning, the statistics over the whole dataset cannot be estimated during run-time. This can be compensated via moving averages/ moving variances. Additionally, we frequently prefer to perform a zero de-biasing according to [94] to avoid biased gradients.”, Saralajew discusses using additional loss and regularization terms that interact with gradients during training. Since Ming et al. design their loss function to be modular and open to additional regularization terms, it would have been obvious to one of ordinary skill in the art to add a gradient loss to their unified loss function)
Saralajew teaches a machine learning model that converts input sequences into a set of output features using a sliding window. Jianmin, in the same field of prototype clustering, teaches the following limitations which Saralajew fails to teach:
using respective resolution-controllable class prototypes (Jianmin, page 6, paragraph 5, lines 1-3, “using the k-means algorithm to cluster the user, adjusting the size and resolution of clustering according to application needs, for the K-means algorithm is a hard clustering algorithm and is based on the prototype of the target function clustering method, is the target function points to the prototype distance as optimization”, Jianmin explicitly teaches adjusting the “size and resolution” of clustering according to application needs while performing prototype-based k-means clustering, which corresponds to the claim requirement that the prototypes/clustering are “resolution-controllable” (i.e., the granularity of the clustering can be adjusted). ); 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew with the teachings disclosed by Jianmin (i.e., clustering using protypes). A motivation for the combination is to provide better accuracy of estimated features. (Jianmin, abstract, “and the user behavior data is analyzed to better realize the accuracy of advertising”)
Saralajew and Jianmin teaches a machine learning model that converts input sequences into a set of output features using a sliding window. Hirate, in the same field of clustering for machine learning classification, teaches the following limitations which Saralajew and Jianmin fail to teach: 
calculating, using the clusters, similarity scores that indicate a similarity of a given one of the output features to a given one of the respective resolution-controllable class prototypes responsive to distances, in a latent space, between the output feature and the respective resolution-controllable class prototypes (Hirate, col. 6, lines 7-16, “Alternatively, for example, the degree of similarity may be the function that decreases as the distance D from the centroid vector c(P[j]) of the cluster P[j] to the characteristics information u.sub.i increases, such as exp(−D.sup.2), exp (−|D|), or 1/(1+D), where the distance D is calculated using D=d(c(P[j]),u.sub.i).”, a similarity is calculated using the distance between cluster centroids of input and the characteristic information (the class prototypes));
concatenating the similarity scores to obtain a similarity vector (Hirate, claim 6, “The processing system according to claim 1, wherein the degree of similarity is a function that decreases as a distance from the centroid vector of the first cluster, to which the first user has been allocated, to the first characteristics information obtained for the first user increases, and in the result of concatenating or combining the degree of similarity, indicated in the representation information for the first user, and the second characteristics information of the first user, a higher weight is given to a higher degree of similarity.”, 
Hirate, col. 10, line 19-24, “In the processing system according to this embodiment, the second processing device may divide the plurality of users into second clusters according to integration vectors obtained by concatenating, for each user, the obtained second characteristics information and vectors relating to the transmitted representation information.”, the similarity scores are concatenated.);
and selecting from a training set a limited number of prototypical segments that are deterministic in classifying new sequences and learning an internal notion of similarity for comparing segments of new sequences with learned prototypes (Hirate, col. 4, lines 24-27, “When the first processing device 111 starts the first process, a first obtainer 112 obtains first characteristics information u.sub.1, u.sub.2, . . . , u.sub.N respectively indicating characteristics of users 1, 2, . . . , N in the first environment (Step S201).”, an obtainer selects characteristic information.
Hirate, col. 4, lines 34-38, “The first characteristics information u₁, u,..., uN and the second characteristics information v₁, V,..., vy are respectively vectors indicating characteristics of the users 1, 2..N. These vectors are sometimes called feature vectors in the field of clustering technology.”, first characteristics information is used as a learned representation (learned prototype) of users.
Hirate, col. 5, lines 49-51, “The belonging information is information indicating which of the first clusters the users 1, 2,..., N belong to by the first characteristics information u₁, u2,..., uy.”, the learned selected prototypes are used to classify new sequences based off their clustered similarity)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew and Jianmin with the teachings disclosed by Hirate (i.e., calculating similarity scores and concatenating the scores). A motivation for the combination is to identify class types of clusters based on their similarity score. (Hirate, col. 7, lines 40-46, “As described above, when the user i belongs to the cluster P[j], the vector w.sub.i in the belonging information mode is a vector in which the j-th element is nonzero (1 or the degree of similarity s.sub.i) and the other elements are zero. Alternatively, the vector w.sub.i in the representation information mode is the centroid vector c(P[j]) or vector s.sub.ic(P[j]) obtained by multiplying this and the degree of similarity.”)
Saralajew, Jianmin, and Hirate teaches a machine learning model that converts input sequences into a set of output features using a sliding window. Ming, in the same field of neural network classification, teaches the following limitations which Saralajew, Jianmin, and Hirate fail to teach:
by solving an optimization problem having a cross-entropy loss (le) (Ming, page 905, section 3.2, “For accuracy, we minimize the cross-entropy loss on training set: 
    PNG
    media_image2.png
    25
    356
    media_image2.png
    Greyscale
”
Page 906, col. 1, paragraph 3, “Full objective. To summarize, the loss that we are minimizing is: 
    PNG
    media_image3.png
    47
    368
    media_image3.png
    Greyscale
”, Ming includes a cross-entropy loss (CE) as the primary accuracy term in their overall loss function, forming part of a unified objective optimized during training. The loss function being minimized contains a cross-entropy term along with others.)
a closeness loss (lc), (Ming, page 906, col. 1, paragraph 1, “Clustering and evidence regularization. To improve interpretability, Li et al.[19] also proposed two regularization terms to be jointly minimized, the clustering regularization Rc and the evidence regularization Re . Rc encourages a clustering structure in the latent space by minimizing the squared distance between an encoded instance and its closest prototype:  
    PNG
    media_image4.png
    55
    310
    media_image4.png
    Greyscale
”, the “clustering regularization” (Rc) term, which acts as a closeness loss by minimizing the distance between encoded examples and their closest prototypes. This teaches the claimed closeness loss, since it enforces that each example is close to at least one prototype in latent space.
Paragraph 2, “The evidence regularization Re encourages each prototype vector to be as close to an encoded instance as possible:”, evidence regularization (Re), which prioritizes prototype closeness can also be interpreted as a closeness loss.)
and a diversity regularization term (ld). (Ming, page 905, section 3.2, “We prevent such phenomenon through a diversity regularization term that penalizes on prototypes that are close to each other: 
    PNG
    media_image5.png
    57
    306
    media_image5.png
    Greyscale
… Rd is a soft regularization that exerts a larger penalty on smaller pairwise distances. By keeping prototypes distributed in the latent space, it also helps produce a sparser similarity vector a.”, a “diversity regularization” term (Rd) which penalizes prototypes that are too close to each other in the latent space, thereby explicitly encouraging diversity among prototypes.)
wherein the interpretation for the value of prediction is provided using only non-negative weights (Ming, page 905, col. 2, section 3.2, paragraph 3, “Sparsity and non-negativity. In addition, to further enhance interpretability, we add L1 penalty on the fully connected layer f , and constrain the weight matrix W to be non-negative. The L1 sparsity penalty and non-negative constraints on f help to learn sequence prototypes that have more unitary and additive semantics for classification.”, Ming requires that the fully connected layer’s weights are to be non-negative.)
and lacking a weight bias in the fully connected layer (Ming, page 905, col. 2, paragraph 3, “With the computed similarity vector a = p(e), the fully connected layer computes z = Wa, where W is a C × k weight matrix and C is the output size (i.e., the number of classes in classification tasks). To enhance interpretability, we constrain W to be nonnegative. For multi-class classification tasks, a softmax layer is used to compute the predicted probability: yˆ i = exp(zi)/ÍC j=1 exp(zj).”, the fully connected layer is explicitly lacking a weight bias.).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, and Hirate with the teachings disclosed by Ming (i.e., prediction interpretation with non-negative weights and no bias). It should be noted that a proper obviousness rejection under 35 U.S.C. 103 does not require that all claimed features be expressly or bodily incorporated in a single reference or disclosed in precisely the same way as claimed. See MPEP § 2145 (III) “The test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference.... Rather, the test is what the combined teachings of those references would have suggested to those of ordinary skill in the art.” Here, the combination of Saralajew, Jianmin, and Hirate with Ming demonstrates the routine and well-known practice of unifying multiple loss terms in a single optimization objective for neural network training, and the addition of further regularization or gradient-based loss terms would have been well within the ordinary skill in the art. A motivation for the combination is to provide increased interpretability of machine learning parameters. (Ming, page 906, col. 1, paragraph 1, “To improve interpretability, Li et al.[19] also proposed two regularization terms to be jointly minimized, the clustering regularization Rc and the evidence regularization Re . Rc encourages a clustering structure in the latent space by minimizing the squared distance between an encoded instance and its closest prototype”)
Chen, in the same field of neural network implementation, teaches the following limitations which Saralajew, Jianmin, Hirate, and Ming fails to teach:
wherein the closeness loss is expressed as: 
    PNG
    media_image1.png
    54
    361
    media_image1.png
    Greyscale
 where                                 
                                    θ
                                
                             represents a set of trainable parameters,                                 
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            x
                                                        
                                                        
                                                            i
                                                        
                                                        
                                                            t
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            t
                                            -
                                            1
                                        
                                        
                                            T
                                        
                                    
                                
                             is a sequence data of length T, y is a label,                                 
                                    
                                        
                                            P
                                        
                                        
                                            y
                                        
                                    
                                
                             represents the set of prototypes that are associated with class y, D represents the training dataset, w is a filter size,                                 
                                    
                                        
                                            z
                                        
                                        
                                            t
                                        
                                    
                                
                             is an encoded sequence , and                                 
                                    
                                        
                                            p
                                        
                                        
                                            *
                                        
                                    
                                
                             is a prototype (Chen, page 5, section 2.2, paragraph 2, “
    PNG
    media_image6.png
    45
    535
    media_image6.png
    Greyscale
”, Chen teaches the closeness formula as amended because Chen expressly defines a loss term (its “cluster cost”) as a sum over training examples together with a nested minimization that (i) restricts the prototype search to prototypes associated with the correct class via                                 
                                    m
                                    i
                                    
                                        
                                            n
                                        
                                        
                                            
                                                
                                                    j
                                                    :
                                                     
                                                    
                                                        
                                                            p
                                                        
                                                        
                                                            j
                                                            ∈
                                                        
                                                    
                                                    
                                                        
                                                            P
                                                        
                                                        
                                                            
                                                                
                                                                    
                                                                        
                                                                            y
                                                                        
                                                                        
                                                                            i
                                                                        
                                                                    
                                                                
                                                            
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            , and (ii) selects, within each example, the best-matching latent element via                                 
                                    m
                                    i
                                    
                                        
                                            n
                                        
                                        
                                            
                                                
                                                    z
                                                    ∈
                                                     
                                                    p
                                                    a
                                                    t
                                                    c
                                                    h
                                                    e
                                                    s
                                                    
                                                        
                                                            f
                                                            
                                                                
                                                                    
                                                                        
                                                                            x
                                                                        
                                                                        
                                                                            i
                                                                        
                                                                    
                                                                
                                                            
                                                        
                                                    
                                                
                                            
                                        
                                    
                                
                            , and then measures closeness using a squared Euclidean distance                                 
                                    
                                        
                                            
                                                
                                                    z
                                                    -
                                                    
                                                        
                                                            p
                                                        
                                                        
                                                            j
                                                             
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            2
                                        
                                        
                                            2
                                        
                                    
                                
                            .  The applicant’s amended expression likewise uses a nested minimization structure to choose (a) a prototype from a class-associated prototype set and (b) a best-matching element within the example (written as a min over an index t), followed by a squared Euclidean distance between the selected latent element and the selected prototype; thus, the main notational differences (Chen writing the inner minimization over “patches” z rather than over an index t, and Chen writing prototypes as p_j rather than using a starred prototype symbol) do not change the fact that both formulas define closeness by taking a minimum over candidate within-example latent elements and a minimum over class-associated prototypes, using the squared distance between the chosen latent element and the chosen prototype.)
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to incorporate Chen’s cluster cost term into the combined system of Saralajew, Jianmin, Hirate, and Ming as an additional/alternative prototype-closeness loss used in the overall training objective. Here, the combination of Saralajew, Jianmin, Hirate and Ming demonstrates the routine and well-known practice of unifying multiple loss terms in a single optimization objective for neural network training. Chen defines an explicit closeness loss optimization objective that includes a prototype-closeness (“cluster cost”) term and defines that cluster-cost using nested minima over (i) prototypes of the correct class and (ii) latent patches (segments) of the embedding, i.e., 
    PNG
    media_image6.png
    45
    535
    media_image6.png
    Greyscale
.  Chen further explains the intended effect of that term; e.g., “cluster cost encourages each training image to have some latent patch close to at least one prototype of its own class”. A person of ordinary skill in the art would have been motivated to use Chen’s cluster-cost formulation in the combined prototype-based interpretability architecture to explicitly enforce closeness between learned latent segments and class prototypes during training (i.e., to directly support the amended “closeness loss” formula). 

Regarding claim 3, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Chen further teaches:
wherein each of the class prototypes collectively form a class prototype vector that is a latent representation of a prototypical segment learned through gradient descent (Chen, page 5, section 2.2, paragraph 2, “Stochastic gradient descent (SGD) of layers before last layer: In the first training stage, we aim to learn a meaningful latent space, where the most important patches for classifying images are clustered (in L 2 -distance) around semantically similar prototypes of the images’ true classes, and the clusters that are centered at prototypes from different classes are well-separated. To achieve this goal, we jointly optimize the convolutional layers’ parameters wconv and the prototypes P = {pj} m j=1 in the prototype layer gp using SGD, while keeping the last layer weight matrix wh fixed.”, Chen teaches that prototypes are class-associated and learned via gradient descent, and that each prototype is a latent representation of a prototypical segment: Chen allocates prototypes by class as a subset P_k, defines that “every prototype is the latent representation of some training image patch”, and teaches that the prototypes P={p_j} are “jointly optimize[d] … using SGD”. Under a broad reading, the collection of prototypes allocated to a class (P_k) collectively forms a class-level prototype representation (“class prototype vector”)).  

Regarding claim 6, Saralajew, Jianmin, Hirate, and Ming teaches the limitations of claim 1. Saralajew further teaches:
wherein a number of the multiple protype storage elements is equal to a number of the one or more filters in the convolutional layer (Saralajew, page 7, section 6.2, paragraph 3, “For a kernel-prototype convolution the number of filters Nf equals the number of kernel-prototypes NW.”). 

Regarding claim 15, the limitations disclosed are substantially identical to the limitations of claim 1. Therefore, the rejection of claim 1 applies to claim 15 similarly.
Claim 15 also recites the following additional limitations for consideration which Hirate further teaches:
A computer program product for interpreting a convolutional sequence model, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising (Hirate, col. 3, lines 45-56, “In a case where a first computer executes a first program to provide the first processing device 111 and a second computer executes a second program to provide the second processing device 121, each program to be executed by the corresponding computer can be stored in a computer-readable non-transitory information storage medium, such as a compact disk, a flexible disk, a hard disk, a magneto-optical disk, a digital video disk, a magnetic tape, a read only memory (ROM), an electrically erasable programmable ROM (EEPROM), a flash memory, or a semiconductor memory. This information storage medium can be distributed and sold separately from each computer.”)

Regarding claim 17, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 15. The limitations of claim 17 are substantially identical to the limitations of claim 3. Therefore, the rejection of claim 3 applies to claim 17 similarly.

Regarding claim 20, the limitations disclosed are substantially identical to the limitations of claim 1. Therefore, the rejection of claim 1 applies to claim 20 similarly.
Claim 20 also recites the following additional limitations for consideration which Hirate further teaches:
A computer processing system for interpreting a convolutional sequence model, the system comprising: a memory device for storing program code therein; and a processor device operatively coupled to the memory device for running the program code to (Hirate, col. 3, lines 45-56, “In a case where a first computer executes a first program to provide the first processing device 111 and a second computer executes a second program to provide the second processing device 121, each program to be executed by the corresponding computer can be stored in a computer-readable non-transitory information storage medium, such as a compact disk, a flexible disk, a hard disk, a magneto-optical disk, a digital video disk, a magnetic tape, a read only memory (ROM), an electrically erasable programmable ROM (EEPROM), a flash memory, or a semiconductor memory. This information storage medium can be distributed and sold separately from each computer.”)

Claims 2 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin, Hirate, Ming, Chen and in further view of Song et al. (US 11927609 B2), hereinafter referred to as Song. 

Regarding claim 2, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Song, in the same field of, classification using neural networks, teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
wherein the set of output features is represented by a non-linear function plus a bias term (Song, col. 19, lines 45-51, “an RNN…outputs a sequence of activations a.sup.l+1=(a.sub.1.sup.l+1, . . . a.sub.T.sup.l+1) by iterating the following recursive equation… 
    PNG
    media_image7.png
    46
    243
    media_image7.png
    Greyscale

where σ is the non-linear activation function, b.sub.T.sup.l is the hidden bias vector”, as disclosed in col. 19 of Song, an RNN takes a sequence of input values, maps it to hidden values, then outputs the activations as shown above with a non-linear function and a bias term.)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, Hirate, Ming, and Chen with the teachings disclosed by Song (i.e., output features represented by a non-linear function and bias term). A motivation for the combination is to define a LSTM (Long-Short Term Memory) to ease the learning of temporal relationships on long time scales. (Song, col. 19, lines 60-65, “LSTMs extend RNN with memory cells, instead of recurrent units, to store and output information, easing the learning of temporal relationships on long time scales. LSTMs make use of the concept of gating: a mechanism based on component-wise multiplication of the input, which defines the behavior of each individual memory cell.”)

Regarding claim 16, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 15. The limitations of claim 16 are substantially identical to the limitations of claim 2. Therefore, the rejection of claim 2 applies to claim 16 similarly.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin, Hirate, Ming, Chen and in further view of Xu et al. (US 20210181931 A1), hereinafter referred to as Xu. 

Regarding claim 11, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Xu teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
applying a softmax operation to an output of the fully connected layer to obtain the value of prediction and the interpretation for the value of prediction (Xu, paragraph [0053], “The fully connected layer f with softmax output computes the eventual classification results using the similarity score vector a. The entries in the weight matrix in f may be constrained to be non-negative for better interpretability.”).  
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, Hirate, Ming, and Chen with the teachings disclosed by Xu (i.e., forming a class prototype vector). A motivation for the combination is to provide a way to obtain similarity scores to indicate input sequences with similar prototype embedding (identifying a classification). (Xu, paragraph [0053], “Through appropriate transformations, a vector of similarity scores may be obtained as a=p(e),a.sub.i∈ [0,1], where a.sub.i is the similarity score between the input sequence and the prototype p.sub.i and a.sub.i=1 indicates that the input sequence has identical embedding with prototype p.sub.i.”)


Claims 4 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin, Hirate, Ming, Chen and in further view of Boyd et al. (US 11531737 B1), hereinafter referred to as Boyd. 

Regarding claim 4, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Boyd, in the same field of image classification, teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
wherein each of the respective resolution-controllable class prototypes has a selectable resolution corresponding to an associated one of the one or more filters (Boyd, col. 13, lines 58-62, “Those of skill in the art will recognize that parameters may be placed on the image, size (pixel), resolution, angle, contrast, color parameters, and so on to ensure that the image is of sufficient relevant quality to serve as a reference.”, in this case a reference image (prototype) has a selectable resolution corresponding to a filter (parameter).)
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, Hirate, Ming, and Chen with the teachings disclosed by Boyd (i.e., filters to control prototype resolution). A motivation for the combination is to provide better quality reference photos for the device or input using that classification system. (Boyd, col. 14, lines 5-9, “For example, a record containing the reference 124 includes a high quality biometric image 1200×1200 dots per inch (dpi) than that communicated to a frontend device for use, e.g., 600×600 dpi image for use or reference.”)

Regarding claim 18, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 15. The limitations of claim 18 are substantially identical to the limitations of claim 4. Therefore, the rejection of claim 4 applies to claim 18 similarly.

Claims 5, 8 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin, Hirate, Ming, Chen, and in further view of Pai et al. (US 11580420), hereinafter referred to as Pai. 

Regarding claim 5, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Pai, in the same field of classification using neural networks, teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
wherein a dimensionality of each of the respective resolution-controllable class prototypes is equal to a dimensionality of each of the output features allocated thereto (Pai, col. 6, lines 4-13, “As used herein, the term “feature space” refers to a space reflecting features of data points. Specifically, a feature space can reflect one or more dimensions corresponding to one or more features. For instance, the model analysis system can map features of data points to locations within different dimensions of a feature space. For instance, for a dataset include a set of ten features for each of the data points, a feature space can include ten dimensions to which each of the data points is mapped based on the values of the corresponding features.”, output features have dimensions based on their corresponding features (prototypes)).  
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, Hirate, Ming, and Chen with the teachings disclosed by Pai (i.e., dimensionality of prototypes is equal to that of the output features). A motivation for the combination is to map similar features close together within a feature space than features that are not similar. (Pai, col. 6, lines 13-15, “Accordingly, similar features can be mapped closer together within the feature space than features that are not similar.”)

Regarding claim 8, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Pai teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
wherein the similarity scores range from 0 to 1, wherein a 0 indicates that the given one of the output features is different from the given one of the respective resolution-controllable class prototypes, and a 1 indicates that the given one of the output features is identical to the given one of the respective resolution-controllable class prototypes (Pai, col. 11, lines 27-44, “The optimization algorithm (objective function) is as follows:

    PNG
    media_image8.png
    256
    311
    media_image8.png
    Greyscale

where α.sub.i, is 1 if x.sub.i is a prototype, otherwise 0.”).  
Saralajew, Jianmin, Hirate, Ming, and Chen are combined with Pai based on the same rationale as set forth previously for claim 5.

Regarding claim 19, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 15. The limitations of claim 19 are substantially identical to the limitations of claim 5. Therefore, the rejection of claim 5 applies to claim 19 similarly.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin Hirate, Ming, Chen, and in further view of Guo et al. (US 20200151250 A1), hereinafter referred to as Guo. 

Regarding claim 7, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Guo, in the same field of neural network sequence modeling, teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
wherein the one or more filters comprise multiple filters having different size lengths configured to selectively address different resolutions of the plurality of input segments corresponding to different levels of granularity (Guo, paragraph [0013], “Embodiments of the present invention provide sequence modeling for natural language processing applications using a convolution of kernels of multiple sizes to capture sentence structure at different levels of granularities. This generates a set of feature maps for each position in the sentence that are added together with multi-resolution attention weights to produce the input of a recurrent neural network that generates a context vector.”, in this case filters (kernels) of multiple sizes are used to capture different granularities of input segments (sentence structure). This produces weights that are specifically attuned to multiple resolutions depending on the input.).  
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, Hirate, Ming, and Chen with the teachings disclosed by Guo (i.e., having multi-resolution attention weights to corresponding levels of granularity). A motivation for the combination is to enable identifying features from input with varying levels of resolution. (Guo, paragraph [0047], “This improves natural language processing tasks such as, e.g., sentiment classification, machine translation, and language modeling. These represent substantive technical fields, and improvements to their ability to consider contextual information at multiple resolutions provide substantial benefits across a wide variety of disciplines.”)


Claims 9 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin Hirate, Ming, Chen. and in further view of Bocklet et al. (US 10650807 B2), hereinafter referred to as Bocklet. 

Regarding claim 9, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Bocklet, in the same field of neural network sequence modeling, teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
performing a max pooling operation on the similarity scores to obtain a similarity vector (Bocklet, paragraph [0031], “The propagation from state to state along the multiple element state score vector is accomplished by performing a maximum pooling or other down-sampling operation between adjacent scores along the vector of intermediate scores to establish a current multiple element state score vector.”).  
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, Hirate, Ming, and Chen with the teachings disclosed by Bocklet (i.e., max pooling similarity scores to create a vector). A motivation for the combination is to create bias values for an input to create the backward operation of a recurrence. (Bocklet, paragraph [0031], “The scores forming the previous multiple element state score vector may be treated as bias values for input through a bias pathway of a neural network accelerator to create the backward operation of the recurrence.”)

Regarding claim 10, Saralajew, Jianmin, Hirate, Ming, Chen and Bocklet teaches the limitations of claim 9. Bocklet further teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
wherein the closeness loss evaluates a similarity vector to determine closeness of the given one of the output features to the respective resolution-controllable class prototypes (Bocklet, col. 5, lines 3-7, “Each time (or some interval) a current multiple element state score vector is generated, one or more scores on the resulting current multiple element state score vector may be used to determine if a keyphrase has been detected or not. By one form, such a determination is established by using the scores, such as a rejection score and a score for the keyphrase model, of the current multiple element state score vector as inputs to an affine layer with certain weights on the scores to output a final score”, Bocklet discloses generating a vector of scores (‘multiple element state score vector’) and then using/evaluating that vector’s scores to make the determination (i.e., whether the input corresponds to the modeled keyphrase), including using selected vector scores as inputs to w eighted affine layer to produce a final score. Under broadest reasonable interpretation, this is evaluating a vector of similarity/closeness score values to determine whether the current features correspond (‘are sufficiently close’) to the modeled target).  
Saralajew, Jianmin, Hirate, Ming, and Chen are combined with Bocklet based on the same rationale as set forth previously above for claim 9.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin Hirate, Ming, Chen, and in further view of Zhang et al. (Zhang, B., Abbas, A., & Romagnoli, J. A. (2011). Multi-resolution fuzzy clustering approach for image-based particle characterization for particle systems. Chemometrics and Intelligent Laboratory Systems, 107(1), 155-164.), hereinafter referred to as Zhang. 

Regarding claim 12, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Zhang, in the same field of neural network classification using clustering, teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
pushing together the given one of the plurality of segments to corresponding ones of the respective resolution-controllable class prototypes under a constraint that pushing is only to occur toward the corresponding ones of the respective resolution-controllable class prototypes having a same class and further under at least one distance-based loss function (Zhang, page 158, col. 2, section 3, paragraph 1, “Clustering analysis is a statistical method of partitioning a set of observations into several subsets which can be called clusters [19]. The components in the same subset have a similar property to some extent [20]. This feature in one subset is different from those in the other clusters so that clusters can be distinguished. The way to determine the similarity among components is based on a distance measure. A common used distance function for clustering analysis is the Euclidean distance. Clustering methods can be classified into three categories: hierarchical clustering, partitional clustering and spectral clustering. Fuzzy C-means Clustering belongs to the category of partitional clustering.”, the clustering method used in Zhang pushes together sets of observations to known clusters of data using a distance based loss function. The examiner interprets this as being synonymous to “pushing” together the given one of the segments (observations) to a resolution-controllable class prototype under a constraint that pushing is to occur toward one of the class prototypes (one of the clusters) having a same class (a minimum threshold), and under a distance based loss function as disclosed in the claim language. Furthermore, Zhang has support for resolution-controllable prototypes through multi-resolution analysis and adjustment of images, outlined in section 2.2-2.4.).  
It would have been obvious to a person of ordinary skill in the art  before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, Hirate, Ming, and Chen with the teachings disclosed by Zhang (i.e., having a distance based clustering method with resolution controllable prototypes). A motivation for the combination is to provide clustering and classification of observations over multiple resolutions. (Zhang, page 157, section 2.4, paragraph 2, “The Multi-resolution approach deals with denoising problems through decomposing images with wavelet transform and examining the detail coefficients at different decomposition levels. Upgraded images are achieved by subtracting the unwanted part of the noise from detail coefficients. As mentioned before, a given image can be modeled by Eq. (8).”)

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin Hirate, Ming, Chen, and in further view of Ren et al. (US 20200364504 A1), hereinafter referred to as Ren. 

Regarding claim 14, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Ren, in the same field of neural network sequence modeling, teaches the following limitation which Saralajew, Jianmin, Hirate, Ming, and Chen fail to teach:
wherein the diversity regularization term penalizes small distances between the respective resolution-controllable class prototypes below a diversity regularization threshold distance (Ren, paragraph [0046], “The diversity regularization term may be expressed as: where d.sub.min is a threshold that classifies whether two prototypes are close or not. In some examples, the value of d.sub.min may be set to 1.0 or 2.0. R.sub.d is a soft regularization that exerts a larger penalty on smaller pairwise distances.”).  
It would have been obvious to a person of ordinary skill in the art  before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, Hirate, Ming, and Chen, with the teachings disclosed by Ren (i.e., adding a diversity regularizer). A motivation for the combination is to prevent having multiple similar prototypes. (Ren, paragraph [0046], “Having multiple similar prototypes in the explanations can result in confusion and inefficiency in utilizing model parameters. To prevent this, a diversity regularization term may be incorporated that penalizes prototypes that are close to each other.”)

Claim 21 is rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin Hirate, Ming, Chen, and in further view of Tan et al. (Tan, M., & Le, Q. V. (2019). Mixconv: Mixed depthwise convolutional kernels. arXiv preprint arXiv:1907.09595.), hereinafter referred to as Tan. 

Regarding claim 21, Saralajew, Jianmin, Hirate, Ming, and Chen teaches the limitations of claim 1. Tan teaches what the above prior art fails to teach:
The computer-implemented method of claim 1, wherein the convolutional layer has a plurality of filters of different sizes. (Tan, page 2, paragraph 2, “Based on this observation, we propose a mixed depthwise convolution (MixConv), which mixes up different kernel sizes in a single convolution op, such that it can easily capture different patterns with various resolutions. Figure 2 shows the structure of MixConv, which partitions channels into multiple groups and apply different kernel sizes to each group of channels.”, MixConv enables a single convolutional layer to utilize filters (kernels) of different sizes at the same time.). 
It would have been obvious to a person of ordinary skill in the art  before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew, Jianmin, Hirate, Ming, and Chen with the teachings disclosed by Tan (i.e., mixed filter size convolutional layer). A motivation for the combination is to improve model accuracy and efficiency. (Tan, page 2, paragraph 1, “This study suggests the limitations of single kernel size: we need both large kernels to capture high-resolution patterns and small kernels to capture low-resolution patterns for better model accuracy and efficiency. Based on this observation, we propose a mixed depthwise convolution (MixConv), which mixes up different kernel sizes in a single convolution op”)

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Posada Aguilar, J. D. (2018). Semantics Enhanced Deep Learning Medical Text Classifier (Doctoral dissertation, University of Pittsburgh).
Garg, V. K., Xiao, L., & Dekel, O. (2018). Sparse Multi-Prototype Classification. In UAI (pp. 704-714).
Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. Advances in neural information processing systems, 30.
Vinyals, O., Blundell, C., Lillicrap, T., & Wierstra, D. (2016). Matching networks for one shot learning. Advances in neural information processing systems, 29.
Aiolli, F., & Sperduti, A. (2005). Multiclass classification with multi-prototype support vector machines. Journal of Machine Learning Research, 6, 817-850.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HYUNGJUN B YI whose telephone number is (703)756-4799. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/H.B.Y./Examiner, Art Unit 2146                                                                                                                                                                                                        
/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Jan 26, 2021
Application Filed
Apr 11, 2024
Non-Final Rejection — §101, §103
Jul 03, 2024
Interview Requested
Jul 11, 2024
Examiner Interview Summary
Jul 11, 2024
Response Filed
Jul 11, 2024
Applicant Interview (Telephonic)
Sep 23, 2024
Final Rejection — §101, §103
Oct 08, 2024
Applicant Interview (Telephonic)
Jan 27, 2025
Examiner Interview Summary
Jan 27, 2025
Request for Continued Examination
Jan 27, 2025
Applicant Interview (Telephonic)
Feb 03, 2025
Response after Non-Final Action
Feb 04, 2025
Examiner Interview Summary
Feb 21, 2025
Non-Final Rejection — §101, §103
Apr 18, 2025
Interview Requested
Apr 28, 2025
Applicant Interview (Telephonic)
Apr 29, 2025
Response Filed
Apr 29, 2025
Examiner Interview Summary
Jul 08, 2025
Final Rejection — §101, §103
Sep 22, 2025
Interview Requested
Sep 29, 2025
Applicant Interview (Telephonic)
Sep 30, 2025
Examiner Interview Summary
Oct 14, 2025
Request for Continued Examination
Oct 19, 2025
Response after Non-Final Action
Jan 05, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/337,998
Patent 12536429
INTELLIGENTLY MODIFYING DIGITAL CALENDARS UTILIZING A GRAPH NEURAL NETWORK AND REINFORCEMENT LEARNING
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
18%
Grant Probability
49%
With Interview (+31.7%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 17 resolved cases by this examiner. Grant probability derived from career allow rate.
INTERPRETING CONVOLUTIONAL SEQUENCE MODEL BY LEARNING LOCAL AND RESOLUTION-CONTROLLABLE PROTOTYPES

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email