Last updated: April 19, 2026
Application No. 18/479,372
INTERPRETING CONVOLUTIONAL SEQUENCE MODEL BY LEARNING LOCAL AND RESOLUTION-CONTROLLABLE PROTOTYPES

Non-Final OA §101§103§DP
Filed
Oct 02, 2023
Examiner
YI, HYUNGJUN B
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
NEC Laboratories America Inc.
OA Round
5 (Non-Final)
This examiner grants 18% of cases after interview

— +31.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 17 resolved cases, 2023–2026
Examiner Intelligence

YI, HYUNGJUN B View full profile →
Grants only 18% of cases
Career Allow Rate
3 granted / 17 resolved
-37.4% vs TC avg
Strong +32% interview lift
Without
With
+31.7%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
39 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
26.3%
-13.7% vs TC avg
§103
53.9%
+13.9% vs TC avg
§102
12.9%
-27.1% vs TC avg
§112
4.7%
-35.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 17 resolved cases
Office Action

§101 §103 §DP
DETAILED ACTION
This action is responsive to the claims filed on 12/11/2025. Claims 1-10, 12 and 14-16 are pending for examination.
Response to Arguments
Applicant argues:
“Ming… fails to read on the recited claim language in that it fails to evaluate a similarity vector to determine closeness of an output feature to the class prototypes. The other cited art cannot cure this deficiency.” (Remarks, page 9)

Applicant’s arguments have been fully considered and the examiner respectfully disagrees. Ming expressly computes similarity scores from distances to prototypes and forms a similarity vector. Specifically, Ming states: “To improve interpretability, we compute the similarity using: a_i = \exp(-d_i^2), which converts the distance to a score between 0 and 1,” (Ming, page 905, col. 2, paragraph 4) and further explains that this produces the “computed similarity vector a.” (Ming, page 905, col. 2, paragraph 6). Ming additionally uses that similarity vector directly in the model’s predictive operation (“With the computed similarity vector a = p(e), the fully connected layer computes z = Wa…” (Ming, page 905, col. 2, paragraph 5)). 
Further, Ming teaches prototype-closeness regularization terms within its training objective that explicitly determine and enforce closeness between encoded instances (i.e., output features/embeddings) and prototypes. Ming describes a clustering regularization R_c that “minimiz[es] the squared distance between an encoded instance and its closest prototype,” and an evidence regularization R_e that “encourages each prototype vector to be as close to an encoded instance as possible.” (Ming, page 906, col. 1, paragraph 1). Ming further confirms these regularization terms are jointly minimized as part of the overall optimized loss function. 

Accordingly, Ming teaches that prototype-closeness is determined during training using regularization losses operating on the relationships between encoded instances and prototypes, and Ming also expressly represents those relationships as a “similarity vector a” computed from the corresponding prototype distances. Thus, Ming’s optimized objective evaluates the similarity relationships (expressed as the similarity vector and/or its underlying distances) to determine and enforce closeness of the encoded output feature to the prototypes, as required by the amended limitation.   
Applicant’s remarks regarding the provisional double patenting rejections have been fully considered. The terminal disclaimers filed to obviate the double patenting with respect to applications 18479385, 17158466, and 18479326 have been entered. Accordingly, the double patenting rejections are withdrawn.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-10, 12, and 14-16 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Statutory Categories
Claims 1-10, 12, and 14-16 are directed to a method.

Independent Claims – Claim 1
Step 2A Prong 1: Does the claim recite an abstract idea, law of nature, or natural phenomenon?
Yes. Independent claim 1 recites limitations that are abstract ideas in the form of mental processes:
Claim 1 recites:
Clustering… the plurality of input segments into clusters using respective resolution-controllable class prototypes allocated to each of a plurality of classes, each of the respective resolution-controllable class prototypes including a respective subset of the output features that characterizes a respective associated one of the plurality of classes; (clustering the input segments into classes stated a high level of generality on how to perform the clustering is being considered a mental process of evaluation (clustering) that can reasonably be performed by the human mind)
pushing together the plurality of segments to corresponding ones of the respective resolution-controllable class prototypes under a constraint that pushing is only to occur toward the corresponding ones of the respective-controllable class prototypes having a same class and further under at least one distance-based loss function (this limitation merely recites mathematical concepts in the form of mathematical formulas, algorithms, or calculations. Page 15 of the specification provides support for the mathematical disclosure.)
calculating, using the clusters, similarity scores that indicate a similarity of a given one of the output features to a given one of the respective resolution-controllable class prototypes responsive to distances, in a latent space, between the output feature and the respective resolution-controllable class prototypes; (calculating similarity scores using clusters based on numerical distances merely involves mathematical calculations, algorithms, or formulas, paragraphs 40-43 of the specification provide the related mathematical disclosure)
concatenating the similarity scores to obtain a similarity vector and (concatenating values to form a vector is being considered a mental process of evaluation which can reasonably be performed in human mind or with aid of pen and paper)
performing… a prediction and prediction support operation that provides a value of prediction and an interpretation for the value of prediction responsive to the input segments and the similarity score by solving an optimization problem having a cross-entropy loss (lc), a gradient loss (la), a closeness loss (lc) and a diversity regularization term (ld), wherein the interpretation for the value of prediction is provided using only non-negative weights and lacking a weight bias in the fully connected layer, wherein the closeness loss evaluates a similarity vector to determine closeness of the given one of the output features to the respective resolution-controllable class prototypes (this limitation merely recites mathematical concepts in the form of mathematical formulas, algorithms, or calculations. Pages 14-17 of the specification provides support for the mathematical disclosure.)
This claim further recites the following additional elements for the purposes of Step 2A Prong Two analysis:
in multiple protype storage elements, (this limitation invokes protype storage elements merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
performing, by a fully connected layer (this limitation invokes fully connected layers merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
The additional limitations fail step 2A Prong 2 of the 101 analysis because they do not transform the claim into a practical application. These limitations are too abstract or lack technical improvement that would make the concept practically useful. Without clear utility or integration into a specific field, the claim does not relate to any particular application. It does not meet the requirements of Step 2A Prong 2, as it fails to make the concept meaningfully applicable in practice. Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
This claim recites the following additional elements for the purposes of Step 2B analysis:
in multiple protype storage elements, (this limitation invokes protype storage elements merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
performing, by a fully connected layer (this limitation invokes fully connected layers merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
The claim also fails Step 2B of the analysis because the additional limitations do not amount to significantly more than the abstract idea itself. The additional limitations do not enhance the claim in a way that would move it beyond its abstract ideas as they minimally elaborate on the core concept without adding any inventive or technical substance. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Dependents of Claim 1
The remaining dependent claims corresponding to independent claim 1 do not recite additional elements, whether considered individually or in combination, that are sufficient to integrate the judicial exception into a practical application or amount to significantly more than the judicial exception. The analysis of which is shown below:
The claims below recite additional limitations which fail step 2A Prong 2 of the 101 analysis because they do not transform the claim into a practical application. These limitations are too abstract or lack technical improvement that would make the concept practically useful. Without clear utility or integration into a specific field, the claim does not relate to any particular application. It does not meet the requirements of Step 2A Prong 2, as it fails to make the concept meaningfully applicable in practice.
The claims also fails Step 2B of the analysis because the additional limitations do not amount to significantly more than the abstract idea itself. The additional limitations do not enhance the claim in a way that would move it beyond its abstract ideas as they minimally elaborate on the core concept without adding any inventive or technical substance. The claims are unpatentable.
Claim 2 recites the further limitation of:
The computer-implemented method of claim 1, wherein the set of output features is represented by a non-linear function plus a bias term (this limitation merely recites mathematical concepts in the form of mathematical formulas, algorithms, or calculations. Pages 11-12 of the specification provides support for the mathematical disclosure)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 3 recites the further limitation of:
The computer-implemented method of claim 1, wherein each of the class prototypes collectively form a class prototype vector that is a latent representation of a prototypical segment learned through gradient descent (calculating similarity scores using clusters based on numerical distances merely involves mathematical calculations, algorithms, or formulas, paragraphs 53 and 59 of the specification provide the related mathematical disclosure)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 4 recites the further limitation of:
The computer-implemented method of claim 1, further comprising: converting, by a convolutional layer having one or more filters and a sliding window, the input data sequence into a set of the output features (this limitation invokes convolutional layers and sliding windows merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim 5 recites the further limitation of:
The computer-implemented method of claim 1, wherein each of the respective resolution-controllable class prototypes has a selectable resolution corresponding to an associated one of the one or more filters (this limitation invokes filters merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 6 recites the further limitation of:
The computer-implemented method of claim 1, wherein a dimensionality of each of the respective resolution-controllable class prototypes is equal to a dimensionality of each of the output features allocated thereto (this limitation invokes class prototype dimensionality merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 7 recites the further limitation of:
The computer-implemented method of claim 1, wherein a number of the multiple protype storage elements is equal to a number of the one or more filters in the convolutional layer (this limitation invokes protype storage elements merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 8 recites the further limitation of:
The computer-implemented method of claim 1, wherein the one or more filters comprise multiple filters having different size lengths configured to selectively address different resolutions of the plurality of input segments corresponding to different levels of granularity (this limitation invokes filter lengths merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 9 recites the further limitation of:
The computer-implemented method of claim 1, wherein the similarity scores range 0 to 1, wherein a 0 indicates that the given one of the output features is different from the given one of the respective resolution-controllable class prototypes, and a 1 indicates that the given one of the output features is identical to the given one of the respective resolution-controllable class prototypes. (this limitation merely recites mathematical concepts in the form of mathematical formulas, algorithms, or calculations. Page 13 of the specification provides support for the mathematical disclosure)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 10 recites the further limitation of:
The computer-implemented method of claim 2, further comprising performing a max pooling operation on the similarity cores to obtain the similarity vector (this limitation merely recites mathematical concepts in the form of mathematical formulas, algorithms, or calculations. Paragraph 41 of the specification provides support for the mathematical disclosure)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 12 recites the further limitation of:
The computer-implemented method of claim 3, further applying a softmax operation to an output of the fully connected layer to obtain the value of prediction and the interpretation for the value of prediction (this limitation merely recites mathematical concepts in the form of mathematical formulas, algorithms, or calculations. Paragraph 45 of the specification provides support for the mathematical disclosure)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 14 recites the further limitation of:
The computer-implemented method of claim 1, wherein said performing step is performed to solve an optimization problem having an accuracy component and an interpretability component (this limitation merely recites mathematical concepts in the form of mathematical formulas, algorithms, or calculations. Paragraph 47 of the specification provides support for the mathematical disclosure)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 15 recites the further limitation of:
The computer-implemented method of claim 1, wherein the diversity regularization term penalizes small distances between the respective resolution-controllable class prototypes below a diversity regularization threshold distance (this limitation merely recites mathematical concepts in the form of mathematical formulas, algorithms, or calculations. Paragraph 55 of the specification provides support for the mathematical disclosure)
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.
Claim 16 recites the further limitation of:
The computer-implemented method of claim 1, wherein the convolutional layer has a plurality of filters of different sizes (this limitation invokes filters for a convolutional layer merely as a tool to perform an existing process and is considered as mere instructions to apply an exception, see MPEP 2106.05(f))
Since the claim does not recite additional elements that either integrate the judicial exception into a practical application, nor provide significantly more than the judicial exception, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 3-5, 7, and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Saralajew et al. (Saralajew, S., Holdijk, L., Rees, M., & Villmann, T. (2018). Prototype-based Neural Network Layers: Incorporating Vector Quantization. ArXiv, abs/1812.01214), hereafter referred to as Saralajew, in view of Jianmin et al. (CN 106355442 A), hereafter referred to as Jianmin, and in further view of Liu et al. (Liu, Z., Li, X., Peng, H., He, L., & Yu, P. S. (2020, December). Heterogeneous similarity graph neural network on electronic health records. In 2020 IEEE international conference on big data (big data) (pp. 1196-1205). IEEE), hereafter referred to as Liu, and Ming et al. (Ming, Y., Xu, P., Qu, H., & Ren, L. (2019, July). Interpretable and steerable sequence learning via prototypes. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 903-913).), hereafter referred to as Ming.

Regarding claim 1, Saralajew teaches the following limitations:
A computer-implemented method for interpreting a convolutional sequence model in machine learning, the method comprising: clustering, in multiple protype storage elements, a plurality of input segments of an input data sequence (Saralajew, page 3, paragraph 2, “For unsupervised prototype-based vector quantization (VQ), each prototype is assumed to serve as a cluster center and hence, κ (x) of the WTA-rule (3) delivers the cluster index. A model is trained either by a heuristically motivated update of the prototypes or by the optimization of a cost function [17, 18, 19].”, clusters of input segments are formed as prototypes,
Saralajew, page 2, section 2, “Vector quantization is one of the most successful approaches for data clustering and representation as well as compression [12, 13]. One of the pioneers in vector quantization, R. M. GRAY stated [14]: “A vector quantizer is a system for mapping a sequence of continuous or discrete vectors into a digital sequence suitable for communication over or storage in a digital channel. The goal of such a system is data compression: to reduce the bit rate so as to minimize communication channel capacity or digital storage memory requirements while maintaining the necessary fidelity of the data.””, vector quantization, a method of data clustering and used in Saralajew’s methods, teaches that vectors form as prototype storage elements.)
pushing the plurality of segments to corresponding ones of the respective resolution- controllable class prototypes under a constraint that pushing is only to occur toward the corresponding ones of the respective-controllable class prototypes having a same class and further under at least one distance-based loss function; (Saralajew, page 6, section 6.1, “At inference, the distance between the output of the feature extraction layer and all learned prototypes is used for classification. This requires the selection of a differentiable loss function and distance measure.”, 
Saralajew, page 3, paragraph 2, “In LVQ, each prototype wk is additionally equipped with a class label ck ∈ C = {1,2,...,NC} and the prototypes are distributed regarding a training dataset X = {(x,c(x))|x ∈ Rn,c(x) ∈ C}. The class of a given data point x is defined as 
    PNG
    media_image1.png
    18
    118
    media_image1.png
    Greyscale
”, Saralajew’s LVQ framework explicitly associates each prototype w_k with a class label c_k and defines classification of a data point x by the winner prototype κ(x) via 
    PNG
    media_image1.png
    18
    118
    media_image1.png
    Greyscale
, which is a same-class association between an input segment and the corresponding class-labeled prototype. Saralajew further states that classification uses distances between the feature-extraction outputs and prototypes and requires a differentiable loss function and distance measure. Thus, training/optimization based on the distance/loss framework necessarily enforces movement (“pushing” under BRI) of the encoded segments with respect to the class-labeled prototypes.)
and performing, by a fully connected layer a prediction and prediction support operation that provides a value of prediction… responsive to the input segments and the similarity vector (Saralajew, page 6, last paragraph, “During the training of the NN the prototypes are trained in parallel with the feature extraction layers, using the output of the feature extraction layer as input to the prototype-based model…In general, the output vector o(x) of (7) in the last FCL of a NN is element-wise normalized by the softmax activation… The training of the network is usually realized applying the cross entropy loss for the network class probability pˆ (x) and the true class probability p (x).”,
Saralajew, page 6, last paragraph, “In general, the output vector o(x) of (7) in the last FCL of a NN is element-wise normalized by the softmax activation (also denoted as Gibbs measure/distribution) given p(x) = softmax(o(x)) with ˆp(x) ∈ [0,1]NC being a probability vector of the estimated class probabilities and the softmax function defined as 
    PNG
    media_image2.png
    45
    162
    media_image2.png
    Greyscale
” Saralajew explicitly describes producing a softmax probability vector p(x) representing estimated class probabilities, which is the claimed “value of prediction” and “support operation” output by the network in response to the computed distances/similarity information of the input segments.)
a gradient loss (la), (Saralajew, page 12, paragraph 5, “For some of the regularization and loss terms described in this report it is needed to estimate the data distribution over the training dataset. Since the network is optimized by stochastic gradient descent learning, the statistics over the whole dataset cannot be estimated during run-time. This can be compensated via moving averages/ moving variances. Additionally, we frequently prefer to perform a zero de-biasing according to [94] to avoid biased gradients.”)
Saralajew teaches a machine learning model that converts input sequences into a set of output features using a sliding window. Jianmin, in the same field of prototype clustering, teaches the following limitations which Saralajew fails to teach:
clustering…using respective resolution-controllable class prototypes allocated to each of a plurality of classes, each of the respective resolution- controllable class prototypes including a respective subset of output features into which the input data segment is converted, the respective subset of the output features characterizing a respective associated one of the plurality of classes (Jianmin, page 6, paragraph 5, lines 1-3, “using the k-means algorithm to cluster the user, adjusting the size and resolution of clustering according to application needs, for the K-means algorithm is a hard clustering algorithm and is based on the prototype of the target function clustering method, is the target function points to the prototype distance as optimization”, in this case, the plurality of input segments being “the user” (extracted user feature information, explained in Jianmin, page 6, paragraph 4), is clustered (k-means algorithm) using respective resolution-controllable (adjusting the size and resolution) prototypes. It is well known in the art that “prototypes”, in image classification, refers to a representative example or sample image that is used to validate the desired output features of a machine learning model. Jianmin uses prototypes as a reference image for a target function, serving as a representation for a set of desired features. Thus, the prototypes used in Jianmin reflect the limitation that each resolution-controllable class prototypes include a respective subset of output features that characterizes a respective associated class. 
It is interpreted by the examiner that “prototype storage elements” refer to a storage such as memory or RAM, to store the outputs of the machine learning clustering. Since Jianmin operates under the scope of online website processing, there is inherently a storage element of some form of memory to store the data collected from the sites and the machine learning outputs.); 
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings disclosed by Saralajew with the teachings disclosed by Jianmin (i.e., clustering using protypes). A motivation for the combination is to provide better accuracy of estimated features. (Jianmin, abstract, “and the user behavior data is analyzed to better realize the accuracy of advertising”)
Saralajew and Jianmin teaches a machine learning model that converts input sequences into a set of output features using a sliding window. Liu, in the same field of clustering for machine learning classification, teaches the following limitations which Saralajew and Jianmin fail to teach: 

    PNG
    media_image3.png
    170
    623
    media_image3.png
    Greyscale

Figure 6 of Liu, HSGNN clusters of data derived from EHR (Electronic Health Records)
being electric health records into clusters (Liu, page 1197, col. 2, paragraph 1, “We propose a novel framework HSGNN, which can learn informative representations for medical codes and make predictions for patients in EHR.”, the model of Liu explicitly works with input comprising Electronic Health Records (EHR)
Liu, page 1204, col. 2, paragraph 2, “HSGNN can learn representations for nodes. Since many models such as GRAM can learn high quality representations by integrating medical ontologies, we try to test the ability of HSGNN to learn informative representations on the same task…. an ideal result is that all diagnosis nodes belong to the same category can form a cluster in visualization... According to the visualization, we can prove that HSGNN can produce representations with high quality since it forms clear clusters for each category” clusters of EHR data are formed and depicted in figure 6.)
concatenating the similarity scores to obtain a similarity vector (Liu, page 1199, col. 2, paragraph 1, “In the preprocessing step, we construct the heterogeneous EHR graph and calculate the similarities of all node pairs under a group of meta-paths P = {p1, p2, · · · pK} (the similarity of two nodes is set to 0 if their node types are not applicable to the mata-path). After this step, we can obtain a series of symmetric similarity matrices A = {A1, A2, · · · , AK} where K is both the number of meta-paths and the number of similarity matrices”, A represents similarity matrices.
Liu, page 1200, col. 2, paragraph 1, “Aggregated Attention Sum: After learning from A to obtain a more informative node feature matrix Fmeta, we use Fmeta to generate the attention weights of graph aggregation. Motivated from [18], in this step we apply GNN on each graph to obtain multiple features for each node. Formally, for k ∈ {1, 2, · · · , K} we have: 
    PNG
    media_image4.png
    33
    248
    media_image4.png
    Greyscale
 where meta GNN can be any kind of GNN layers. In the next step, to learn the node feature matrix Fmeta, we use 
    PNG
    media_image5.png
    40
    305
    media_image5.png
    Greyscale
 where AGGREGAT ORF is the aggregation function, which can be Graph Attention Network (GAT) [33]. Here we can also use some other operations such as concatenate or average F (0) 1 ,F (0) 2 , · · · ,F (0) K together.”, the model computes a final feature vector, Fmeta, which is a concatenation of features vectors of each node. Since each feature vector F_k incorporates similarity scores from A_k, it is interpreted by the examiner that similarity  scores are concatenated to form the similarity vector (Fmeta) as described in the claim language.);
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the invention to have incorporated the teachings disclosed by Saralajew and Jianmin with the teachings disclosed by Liu (i.e., calculating similarity scores of electronic health records and concatenating the scores). A motivation for the combination is to identify more discriminative representations. (Liu, page 1197, col. 2, paragraph 3, “Experimental results show the superiority of our proposed model on the diagnosis prediction task. Experiments also prove the effectiveness of using similarity subgraphs and the quality of learned graph embeddings.”,)
Saralajew, Jianmin, and Liu teaches a machine learning model that converts input sequences into a set of output features using a sliding window. Ming, in the same field of neural network classification, teaches the following limitations which Saralajew, Jianmin, and Liu fail to teach:
calculating, using the clusters, similarity scores that indicate a similarity of a given one of the output features to a given one of the respective resolution-controllable class prototypes responsive to distances, in a latent space, between the output feature and the respective resolution-controllable class prototypes (Ming, page 902, col. 2, paragraph 3, “The prototype layer p contains k prototype vectors pi ∈ Rm, which have the same length as e. The layer scores the similarity between e and each prototype pi. In previous work[19],the squared L2 distance, d2 i = ∥e − pi∥2 2 , is directly used as the output of the layer. To improve interpretability, we compute the similarity using 
    PNG
    media_image6.png
    25
    84
    media_image6.png
    Greyscale
”, Ming explicitly computes latent-space distances between the encoded representation e and prototype vectors p_i, and converts those distances into similarity scores a_i and a similarity vector a. This directly reads on “calculating … similarity scores … responsive to distances, in a latent space, between the output feature and the … class prototypes.”);
and an interpretation for the value of prediction (Ming, page 902, col. 1, last paragraph, “ProSeNet is readily explainable by consulting the most similar prototypes. When making predictions based on a new input sequence, the explanation can be generated along with the inference procedure. A prediction could be explained by a weighted addition of the contribution of the most similar prototypes Input: Prediction: Explanation: pizza is good but service is extremely slow Negative 0.69 * good food but worst service (Negative 2.1) + 0.30 * service is really slow (Negative 1.1) The factors in front of the prototype sequences are the similarities between the input and the prototypes. At the end of each prototype shows its associated weights wi. The weights can be interpreted as the model’s confidence on the possible labels of the prototype.”, Ming expressly ties an interpretation/explanation of the prediction to the similarity values between the input and prototypes (i.e., the similarity vector), describing an explanation as a weighted combination of prototype contributions. This is the claimed “prediction support operation… [providing] an interpretation for the value of prediction” responsive to the input and similarity vector. )
by solving an optimization problem having a cross-entropy loss (le) (Ming, page 905, section 3.2, “For accuracy, we minimize the cross-entropy loss on training set: 
    PNG
    media_image7.png
    25
    356
    media_image7.png
    Greyscale
”
Page 906, col. 1, paragraph 3, “Full objective. To summarize, the loss that we are minimizing is: 
    PNG
    media_image8.png
    47
    368
    media_image8.png
    Greyscale
”, Ming includes a cross-entropy loss (CE) as the primary accuracy term in their overall loss function, forming part of a unified objective optimized during training. The loss function being minimized contains a cross-entropy term along with others.)
a closeness loss (lc), (Ming, page 905, col. 2, paragraph 3, “To improve interpretability, we compute the similarity using: 
    PNG
    media_image9.png
    19
    80
    media_image9.png
    Greyscale
 which converts the distance to a score between 0 and 1.”, discloses how closeness loss is used by computing the distance between extracted features and learned prototypes.)
and a diversity regularization term (ld). (Ming, page 905, section 3.2, “We prevent such phenomenon through a diversity regularization term that penalizes on prototypes that are close to each other: 
    PNG
    media_image10.png
    57
    306
    media_image10.png
    Greyscale
… Rd is a soft regularization that exerts a larger penalty on smaller pairwise distances. By keeping prototypes distributed in the latent space, it also helps produce a sparser similarity vector a.”, a “diversity regularization” term (Rd) which penalizes prototypes that are too close to each other in the latent space, thereby explicitly encouraging diversity among prototypes.)
wherein the interpretation for the value of prediction is provided using only non-negative weights (Ming, page 905, col. 2, section 3.2, paragraph 3, “Sparsity and non-negativity. In addition, to further enhance interpretability, we add L1 penalty on the fully connected layer f , and constrain the weight matrix W to be non-negative. The L1 sparsity penalty and non-negative constraints on f help to learn sequence prototypes that have more unitary and additive semantics for classification.”, Ming requires that the fully connected layer’s weights are to be non-negative.)
and lacking a weight bias in the fully connected layer (Ming, page 905, col. 2, paragraph 3, “With the computed similarity vector a = p(e), the fully connected layer computes z = Wa, where W is a C × k weight matrix and C is the output size (i.e., the number of classes in classification tasks). To enhance interpretability, we constrain W to be nonnegative. For multi-class classification tasks, a softmax layer is used to compute the predicted probability: yˆ i = exp(zi)/ÍC j=1 exp(zj).”, the fully connected layer is explicitly lacking a weight bias.).
wherein the closeness loss evaluates the similarity vector to determine closeness of the given one of the output features to the respective resolution-controllable class prototypes. (Ming, page 905, col. 2, paragraph 3, “To improve interpretability, we compute the similarity using: 
    PNG
    media_image9.png
    19
    80
    media_image9.png
    Greyscale
 which converts the distance to a score between 0 and 1. Zero can be interpreted as the sequence embedding e being completely different from the prototype vector pi , and one means they are identical. With the computed similarity vector a = p(e), the fully connected layer computes z = Wa, where W is a C × k weight matrix and C is the output size (i.e., the number of classes in classification tasks).”, Ming discloses that similarity between an encoded instance and each prototype is computed from the distance as: 
    PNG
    media_image9.png
    19
    80
    media_image9.png
    Greyscale

Ming explicitly defines and utilizes a similarity vector representing similarity between encoded output features and respective prototypes.
Ming, page 906, col. 1, paragraph 3, “Full objective. To summarize, the loss that we are minimizing is: 
    PNG
    media_image11.png
    32
    265
    media_image11.png
    Greyscale
”, Because Ming’s similarity vector is computed directly from the prototype distances (a_i = exp(−d_i^2)), and because the clustering and evidence regularization losses minimize those same distances within the unified optimization objective, Ming’s closeness regularization necessarily evaluates the similarity relationships embodied in the similarity vector to determine closeness between encoded features and prototypes. It is interpreted by the Examiner that Ming’s prototype-closeness regularization evaluates the similarity relationships represented in the similarity vector to determine and enforce closeness between encoded output features and respective prototypes. Because Ming’s similarity vector is computed directly from the prototype distances (a_i = exp(−d_i^2)), and because the clustering and evidence regularization losses minimize those same distances within the unified optimization objective, Ming’s closeness regularization necessarily evaluates the similarity relationships embodied in the similarity vector to determine closeness between encoded features and prototypes.)
Saralajew discusses using additional loss and regularization terms that interact with gradients during training. Since Ming et al. design their loss function to be modular and open to additional regularization terms, it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to have incorporated the teachings disclosed by Saralajew, Jianmin, and Liu with the teachings disclosed by Ming (i.e., prediction interpretation with non-negative weights and no bias). It should be noted that a proper obviousness rejection under 35 U.S.C. 103 does not require that all claimed features be expressly or bodily incorporated in a single reference or disclosed in precisely the same way as claimed. See MPEP § 2145 (III) “The test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference.... Rather, the test is what the combined teachings of those references would have suggested to those of ordinary skill in the art.” Here, the combination of Saralajew, Jianmin, and Liu with Ming demonstrates the routine and well-known practice of unifying multiple loss terms in a single optimization objective for neural network training, and the addition of further regularization or gradient-based loss terms would have been well within the ordinary skill in the art. A motivation for the combination is to provide increased interpretability of machine learning parameters. (Ming, page 906, col. 1, paragraph 1, “To improve interpretability, Li et al.[19] also proposed two regularization terms to be jointly minimized, the clustering regularization Rc and the evidence regularization Re . Rc encourages a clustering structure in the latent space by minimizing the squared distance between an encoded instance and its closest prototype”)
Regarding claim 3, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Ming further teaches:
wherein each of the class prototypes collectively form a class prototype vector that is a latent representation of a prototypical segment learned through gradient descent (Ming, page 906, section 3.3, paragraph 1, “We use stochastic gradient descent (SGD) with mini-batch to minimize the loss function on training data… we design a projection step during training that assigns pi with their closest sequence embedding in the training set: 
    PNG
    media_image12.png
    31
    129
    media_image12.png
    Greyscale
. Each prototype vector pi is then associated with a prototype se quence in the input space.”, Ming teaches prototype vectors p_i learned during training using SGD and then explicitly ties each learned prototype vector to an actual sequence from the training set via a projection step that assigns p_i to the closest training sequence embedding. Under BRI, that associated training sequence is a “prototypical segment,” and the corresponding p_i is its latent-space representation; the set of prototype vectors in the prototype layer therefore collectively forms the claimed class prototype vector structure learned via gradient descent. ).  

Regarding claim 4, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Saralajew further teaches:
The computer-implemented method of claim 1, further comprising: converting, by a convolutional layer having one or more filters and a sliding window, the input data sequence into a set of the output features. (Saralajew, page 6, paragraph 3, “Like usual in CNNs, we filter the images with a stack of filters. Assume there are Nf filters of shape k collected in a tensor of the shape wf × hf × c × Nf . Then, the convolution with all these filters can be seen as a matrix multiplication. More precisely, it is a linear transformation of the vector collected over a sliding window”, support showing that the model is a convolutional sequence model, where convolutional layers process training data samples and creates a feature map using a sliding window.). 
Regarding claim 5, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Saralajew further teaches:
wherein each of the respective resolution-controllable class prototypes has a selectable resolution corresponding to an associated one of the one or more filters (Saralajew, page 7, section 6.2, paragraph 1, “we now suppose a convolutional kernel k ∈ Rwf×hf×c. However, rather than considering the kernel k as a learnable filter, we consider the kernel as a learnable (kernel-)prototype. Then, the filter response of the convolution operation is the distance between the sliding window and the kernel-prototype. We denote the convolution operation in combination with a kernel-prototype using the ⊛ sign and refer to it as kernel-prototype convolution.”, in this case a reference image (prototype) has a selectable resolution corresponding to a filter (parameter)… a kernel prototype of arbitrary kernel-size can be convoluted over an input… For a kernel-prototype convolution the number of filters Nf equals the number of kernel-prototypes NW.”, Saralajew explicitly equates the convolution “kernel” (i.e., the filter having dimensions Rwf×hf×c) with a learnable prototype, and further states that the kernel-prototype can be of arbitrary kernel-size and is interpreted to mean the kernel size is selectable. Saralajew also states that in this kernel-prototype convolution, the number of filters equals the number of kernel-prototypes, meaning each prototype corresponds to a respective filter. Thus, Saralajew teaches that each prototype (being a kernel/filter) has a selectable “resolution” (kernel/window size) that corresponds to its associated filter. )

Regarding claim 7, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Saralajew further teaches:
wherein a number of the multiple protype storage elements is equal to a number of the one or more filters in the convolutional layer (Saralajew, page 7, section 6.2, paragraph 3, “For a kernel-prototype convolution the number of filters Nf equals the number of kernel-prototypes NW.”). 

Regarding claim 9, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Ming further teaches:
wherein the similarity scores range from 0 to 1, wherein a 0 indicates that the given one of the output features is different from the given one of the respective resolution-controllable class prototypes, and a 1 indicates that the given one of the output features is identical to the given one of the respective resolution-controllable class prototypes (Ming, page 905, col. 2, paragraph 3, “To improve interpretability, we compute the similarity using: 
    PNG
    media_image13.png
    19
    81
    media_image13.png
    Greyscale
 which converts the distance to a score between 0 and 1. Zero can be interpreted as the sequence embedding e being completely different from the prototype vector pi , and one means they are identical.”, Ming states that the distance scores (the similarity scores) range between 0 and 1 where in 0 is different from the prototype it is compared to and 1 means it is identifical.)

Claim 2 are rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin, and in further view of Liu, Ming, and Song et al. (US 11927609 B2), hereinafter referred to as Song. 

Regarding claim 2, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Song, in the same field of, classification using neural networks, teaches the following limitation which Saralajew, Jianmin, Liu, and Ming fail to teach:
wherein the set of output features is represented by a non-linear function plus a bias term (Song, col. 19, lines 45-51, “an RNN…outputs a sequence of activations a.sup.l+1=(a.sub.1.sup.l+1, . . . a.sub.T.sup.l+1) by iterating the following recursive equation… 
    PNG
    media_image14.png
    46
    243
    media_image14.png
    Greyscale

where σ is the non-linear activation function, b.sub.T.sup.l is the hidden bias vector”, as disclosed in col. 19 of Song, an RNN takes a sequence of input values, maps it to hidden values, then outputs the activations as shown above with a non-linear function and a bias term.)
It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Saralajew, Jianmin, Liu, and Ming with the teachings disclosed by Song (i.e., output features represented by a non-linear function and bias term). A motivation for the combination is to define a LSTM (Long-Short Term Memory) to ease the learning of temporal relationships on long time scales. (Song, col. 19, lines 60-65, “LSTMs extend RNN with memory cells, instead of recurrent units, to store and output information, easing the learning of temporal relationships on long time scales. LSTMs make use of the concept of gating: a mechanism based on component-wise multiplication of the input, which defines the behavior of each individual memory cell.”)

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin, and in further view of Liu, Ming, and Xu et al. (US 20210181931 A1), hereinafter referred to as Xu. 

Regarding claim 12, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Xu teaches the following limitation which Saralajew, Jianmin, Liu, and Ming fail to teach:
applying a softmax operation to an output of the fully connected layer to obtain the value of prediction and the interpretation for the value of prediction (Xu, paragraph [0053], “The fully connected layer f with softmax output computes the eventual classification results using the similarity score vector a. The entries in the weight matrix in f may be constrained to be non-negative for better interpretability.”).  
It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Saralajew, Jianmin, Liu, and Ming with the teachings disclosed by Xu (i.e., forming a class prototype vector). A motivation for the combination is to provide a way to obtain similarity scores to indicate input sequences with similar prototype embedding (identifying a classification). (Xu, paragraph [0053], “Through appropriate transformations, a vector of similarity scores may be obtained as a=p(e),a.sub.i∈ [0,1], where a.sub.i is the similarity score between the input sequence and the prototype p.sub.i and a.sub.i=1 indicates that the input sequence has identical embedding with prototype p.sub.i.”)

Claims 6, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin, and in further view of Liu, Ming, and Pai et al. (US 11580420), hereinafter referred to as Pai. 

Regarding claim 6, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Pai, in the same field of classification using neural networks, teaches the following limitation which Saralajew, Jianmin, Liu, and Ming fail to teach:
wherein a dimensionality of each of the respective resolution-controllable class prototypes is equal to a dimensionality of each of the output features allocated thereto (Pai, col. 6, lines 4-13, “As used herein, the term “feature space” refers to a space reflecting features of data points. Specifically, a feature space can reflect one or more dimensions corresponding to one or more features. For instance, the model analysis system can map features of data points to locations within different dimensions of a feature space. For instance, for a dataset include a set of ten features for each of the data points, a feature space can include ten dimensions to which each of the data points is mapped based on the values of the corresponding features.”, output features have dimensions based on their corresponding features (prototypes)).  
It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Saralajew, Jianmin, Liu, and Ming with the teachings disclosed by Pai (i.e., dimensionality of prototypes is equal to that of the output features). A motivation for the combination is to map similar features close together within a feature space than features that are not similar. (Pai, col. 6, lines 13-15, “Accordingly, similar features can be mapped closer together within the feature space than features that are not similar.”)

Regarding claim 14, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Pai teaches the following limitation which Saralajew, Jianmin, Liu, and Ming fail to teach:
wherein said performing step is performed to solve an optimization problem having an accuracy component and an interpretability component (Pai, col. 4, lines 58-64, “Specifically, the model analysis system improves accuracy by using an iterative process to select, from a plurality of data points in a dataset, a plurality of representative prototypes based on distances between data points, allowing the model analysis system to accurately determine the impact of the features of the data points while providing interpretable results.”).
The motivation to combine Saralajew, Jianmin, Liu, and Ming with Pai is the same as discussed above in relation to claim 6.

Claim 8 and 16 is rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin and in further view of Liu, Ming, and Guo et al. (US 20200151250 A1), hereinafter referred to as Guo. 

Regarding claim 8, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Guo, in the same field of neural network sequence modeling, teaches the following limitation which Saralajew, Jianmin, Liu, and Ming fail to teach:
wherein the one or more filters comprise multiple filters having different size lengths configured to selectively address different resolutions of the plurality of input segments corresponding to different levels of granularity (Guo, paragraph [0013], “Embodiments of the present invention provide sequence modeling for natural language processing applications using a convolution of kernels of multiple sizes to capture sentence structure at different levels of granularities. This generates a set of feature maps for each position in the sentence that are added together with multi-resolution attention weights to produce the input of a recurrent neural network that generates a context vector.”, in this case filters (kernels) of multiple sizes are used to capture different granularities of input segments (sentence structure). This produces weights that are specifically attuned to multiple resolutions depending on the input.).  
It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Saralajew, Jianmin, Liu, and Ming with the teachings disclosed by Guo (i.e., having multi-resolution attention weights to corresponding levels of granularity). A motivation for the combination is to enable identifying features from input with varying levels of resolution. (Guo, paragraph [0047], “This improves natural language processing tasks such as, e.g., sentiment classification, machine translation, and language modeling. These represent substantive technical fields, and improvements to their ability to consider contextual information at multiple resolutions provide substantial benefits across a wide variety of disciplines.”)

Regarding claim 16, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Guo, in the same field of neural network sequence modeling, teaches the following limitation which Saralajew, Jianmin, Liu, and Ming fail to teach:
The computer-implemented method of claim 1, wherein the convolutional layer has a plurality of filters of different sizes. (Guo, paragraph [0013], “Embodiments of the present invention provide sequence modeling for natural language processing applications using a convolution of kernels of multiple sizes to capture sentence structure at different levels of granularities. This generates a set of feature maps for each position in the sentence that are added together with multi-resolution attention weights to produce the input of a recurrent neural network that generates a context vector.”, in this case filters (kernels) of multiple sizes are used to capture different granularities of input segments (sentence structure). This produces weights that are specifically attuned to multiple resolutions depending on the input.).  
A rationale for this combination is the same as previously set forth above for claim 8.

Claim 10 are rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin and in further view of Liu, Ming, and Bocklet et al. (US 10650807 B2), hereinafter referred to as Bocklet. 

Regarding claim 10, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Bocklet, in the same field of neural network sequence modeling, teaches the following limitation which Saralajew, Jianmin, Liu, and Ming fail to teach:
performing a max pooling operation on the similarity scores to obtain a similarity vector (Bocklet, col. 4, line 53, “The propagation from state to state along the multiple element state score vector is accomplished by performing a maximum pooling or other down-sampling operation between adjacent scores along the vector of intermediate scores to establish a current multiple element state score vector.”).  
It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Saralajew, Jianmin, Liu, and Ming with the teachings disclosed by Bocklet (i.e., max pooling similarity scores to create a vector). A motivation for the combination is to create bias values for an input to create the backward operation of a recurrence. (Bocklet, paragraph [0031], “The scores forming the previous multiple element state score vector may be treated as bias values for input through a bias pathway of a neural network accelerator to create the backward operation of the recurrence.”)

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Saralajew in view of Jianmin and in further view of Liu, Ming, and Ren et al. (US 20200364504 A1), hereinafter referred to as Ren. 

Regarding claim 15, Saralajew, Jianmin, Liu, and Ming teaches the limitations of claim 1. Ren, in the same field of neural network sequence modeling, teaches the following limitation which Saralajew, Jianmin, Liu, and Ming fail to teach:
wherein the diversity regularization term penalizes small distances between the respective resolution-controllable class prototypes below a diversity regularization threshold distance (Ren, paragraph [0046], “The diversity regularization term may be expressed as: where d.sub.min is a threshold that classifies whether two prototypes are close or not. In some examples, the value of d.sub.min may be set to 1.0 or 2.0. R.sub.d is a soft regularization that exerts a larger penalty on smaller pairwise distances.”).  
It would have been obvious to a person of ordinary skill in the art to have incorporated the teachings disclosed by Saralajew, Jianmin, Liu, and Ming with the teachings disclosed by Ren (i.e., adding a diversity regularizer). A motivation for the combination is to prevent having multiple similar prototypes. (Ren, paragraph [0046], “Having multiple similar prototypes in the explanations can result in confusion and inefficiency in utilizing model parameters. To prevent this, a diversity regularization term may be incorporated that penalizes prototypes that are close to each other.”)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Posada Aguilar, J. D. (2018). Semantics Enhanced Deep Learning Medical Text Classifier (Doctoral dissertation, University of Pittsburgh).
Bai, T., Zhang, S., Egleston, B. L., & Vucetic, S. (2018, July). Interpretable representation learning for healthcare via capturing disease progression through time. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 43-51).
Zhu, W., & Razavian, N. (2019). Variationally Regularized Graph-based Representation Learning for Electronic Health Records. arXiv preprint arXiv:1912.03761.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HYUNGJUN B YI whose telephone number is (703)756-4799. The examiner can normally be reached M-F 9-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on (571) 272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/H.B.Y./Examiner, Art Unit 2146                                                                                                                                                                                                        

/USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Oct 02, 2023
Application Filed
Jul 01, 2024
Non-Final Rejection — §101, §103, §DP
Sep 25, 2024
Response Filed
Dec 18, 2024
Final Rejection — §101, §103, §DP
Feb 26, 2025
Request for Continued Examination
Mar 05, 2025
Response after Non-Final Action
Apr 14, 2025
Non-Final Rejection — §101, §103, §DP
Jul 02, 2025
Response Filed
Sep 08, 2025
Non-Final Rejection — §101, §103, §DP
Dec 11, 2025
Response Filed
Mar 30, 2026
Non-Final Rejection — §101, §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/337,998
Patent 12536429
INTELLIGENTLY MODIFYING DIGITAL CALENDARS UTILIZING A GRAPH NEURAL NETWORK AND REINFORCEMENT LEARNING
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 1 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
18%
Grant Probability
49%
With Interview (+31.7%)
4y 7m
Median Time to Grant
High
PTA Risk
Based on 17 resolved cases by this examiner. Grant probability derived from career allow rate.
INTERPRETING CONVOLUTIONAL SEQUENCE MODEL BY LEARNING LOCAL AND RESOLUTION-CONTROLLABLE PROTOTYPES

This examiner grants 18% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email