Last updated: May 29, 2026
Application No. 17/956,857
EXPLAINABLE PREDICTION MODELS BASED ON CONCEPTS

Non-Final OA §101§103§112
Filed
Sep 30, 2022
Examiner
WONG, WILLIAM
Art Unit
2144
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
2 (Non-Final)
Interview Optional

— +26.9% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 30% grant rate with +26.9% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 397 resolved cases, 2023–2026
Examiner Intelligence

WONG, WILLIAM View full profile →
Grants only 30% of cases
Career Allowance Rate
120 granted / 397 resolved
-24.8% vs TC avg
Strong +27% interview lift
Without
With
+26.9%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
19 currently pending
Career history
432
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
85.5%
+45.5% vs TC avg
§102
3.9%
-36.1% vs TC avg
§112
1.0%
-39.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 397 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to communications filed on 12/16/2025.  Claims 1-20 are pending and have been examined.  

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The term “desired” in claim 2 is a relative term which renders the claim indefinite. The term “desired” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  What is “desired” varies depending on context, person, etc.  As such, the claim is indefinite.  

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claims recite a method and product comprising providing, defining, producing, and training. 
The limitations “providing… defining… producing…” as recited in claim 1 are each a process, under the broadest reasonable interpretation, covering performance of the limitations in the mind or by pen and paper (See Berkheimer v. HP, Inc., 881 F.3d 1360, 1366, 125 USPQ2d 1649 (Fed. Cir. 2018)) but for the recitation of generic computer components.  That is, the limitation “providing a training dataset of data samples each having a prediction label indicating a desired prediction output from the mode” in the context of the claim encompasses the user making observations or determinations.  The limitation “defining a set of concept vectors comprising a plurality of concept vectors which are associated with respective predefined concepts characterizing information content of the data samples” in the context of the claim encompasses the user making determinations.  The limitation “producing a set of input vectors from each data sample” in the context of the claim encompasses the user making evaluations.  Other than reciting “training a neural network model comprising a cross-attention module… and a prediction module”, “…producing a sample embedding for a data sample and …producing a prediction output from the sample embedding, by supplying the set of input vectors for each data sample to the cross-attention module and training a set of weights of a cross-attention mechanism between the set of input vectors and the set of concept vectors in the cross- attention module, to optimize a loss function dependent on difference between the prediction output and the prediction label for each data sample; wherein the sample embedding comprises a matrix of attention weights and a matrix of value vectors, produced from respective concept vectors, via the cross-attention mechanism, and the prediction output comprises a linear transformation of a product of the matrix of attention weights and the matrix of value vectors” in the context of the claim encompasses the user making calculations.  If a claimed limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “mental processes” grouping of abstract ideas.  Accordingly, the claim recites an abstract idea.  Accordingly, the claim recites an abstract idea.  
This judicial exception is not integrated into a practical application.  In particular, the claim recites additional elements. The claim recites “training a neural network model comprising a cross-attention module… and a prediction module”.  The elements are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using a generic computer component (e.g. See MPEP 2106.05(f)) and/or amounts to generally linking the use of the judicial exception to a particular technological environment or field of use (e.g. see MPEP 2106.05(h)).  Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.  
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional elements are no more than a generic computer component and/or field of use.  Therefore, the claims are not patent eligible.
Claim 19 also recites similar claim language as claim 1, and thus has the same issues.  It is noted, with respect to claim 9, that the claim recites “a computer readable storage medium having program instructions embodied therein” to perform the method.  The elements are recited at a high-level of generality, such that it amounts to no more than mere instructions to apply the exception using a generic computer component (e.g. See MPEP 2106.05(f)).  Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea, and are not sufficient to amount to significantly more than the judicial exception.  
Regarding claim 2, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the concept labels and loss function, which is part of the mental steps (encompassing a user performing calculations) and does not include any additional elements.  
Regarding claim 3, the claim does not include any additional elements that are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes generating/applying the model, which amounts to no more than mere instructions to apply the exception using a generic computer component (e.g. See MPEP 2106.05(f)) and/or amounts to generally linking the use of the judicial exception to a particular technological environment or field of use (e.g. see MPEP 2106.05(h)).  
Regarding claim 4, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes providing a matrix, which is a mental step (encompassing a user making determinations) and does not include any additional elements.  
Regarding claim 5, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the set of input vectors, which is part of the mental steps (encompassing a user making determinations) and does not include any additional elements.  
Regarding claim 6, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the set of input vectors and the set of concept labels, which is part of the mental steps (encompassing a user making determinations) and does not include any additional elements.  
Regarding claim 7, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the set of input vectors, which is part of the mental steps (encompassing a user making determinations) and does not include any additional elements.  
Regarding claim 8, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the set of input vectors and the set of concept labels, which is part of the mental steps (encompassing a user making determinations) and does not include any additional elements.  
Regarding claim 9, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the set of input vectors and the set of concept labels, which is part of the mental steps (encompassing a user making determinations) and does not include any additional elements.  
Regarding claim 10, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the predefined concepts and cross-attention, which is part of the mental steps (encompassing a user making determinations) and does not include any additional elements.  
Regarding claim 11, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the set of concept vectors, which is part of the mental steps (encompassing a user making determinations) and does not include any additional elements.  
Regarding claim 12, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes producing the set of input vectors, which is part of the mental steps (encompassing a user making evaluations) and does not include any additional elements.  
Regarding claim 13, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes producing the set of input vectors, which is part of the mental steps (encompassing a user making evaluations) and does not include any additional elements.  
Regarding claim 14, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the set of input vectors, producing a prediction result, and the prediction output, which is part of the mental steps (encompassing a user making evaluations) and “the prediction module” amounts to no more than mere instructions to apply the exception using a generic computer component (e.g. see MPEP 2106.05(f)). 
Regarding claim 15, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the prediction output, which is part of the mental steps (encompassing a user making evaluations) and does not include any additional elements.  
Regarding claim 16, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely describes “a classification module”, which amounts to no more than mere instructions to apply the exception using a generic computer component (e.g. see MPEP 2106.05(f)), and further describes the prediction output, which is part of the mental steps (encompassing a user making evaluations).  
Regarding claim 17, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the data samples, which is part of the mental steps (encompassing a user making observations or determinations) and does not include any additional elements.  
Regarding claim 18, the claim does not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes the data samples, which is part of the mental steps (encompassing a user making observations or determinations) and “a classification module”, which amounts to no more than mere instructions to apply the exception using a generic computer component (e.g. see MPEP 2106.05(f)).
Regarding claim 20, the claim does not include any additional elements that are sufficient to amount to significantly more than the judicial exception.  For example, the claim merely further describes generating/applying the model, which amounts to no more than mere instructions to apply the exception using a generic computer component (e.g. See MPEP 2106.05(f)) and/or amounts to generally linking the use of the judicial exception to a particular technological environment or field of use (e.g. see MPEP 2106.05(h)).  

Examiner Note
It is noted that the specification states “A computer readable storage medium, as that term is used in the present disclosure, is not to be construed as storage in the form of transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide, light pulses passing through a fiber optic cable, electrical signals communicated through a wire, and/or other transmission media” (e.g. in paragraph 19).

Response to Arguments
Previous objections to the claims have been withdrawn in view of amendments.
Previous rejections under 35 USC 112 not included in this action have been withdrawn in view of amendments.  It is noted that “desired” in claim 2 was not addressed.
With respect to rejections under 35 USC 101, it is noted that the alleged improvements are not reflected in the claims.  For example, applicant cites paragraphs that relate to explanations, but claim 1, for example, merely mentions “explainable” as an intended use (note also, that the term is only used in the preamble of the claim, which is in not accorded patentable weight).  The claims do not appear to integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception and instead the claims appear to generally observe and determine data and train well-known attention neural networks using a formula related to the data.
Applicant's arguments have been fully considered but they are not persuasive.  
Applicant argues that He allegedly does not teach the defining limitation because the embedding of He allegedly is not associated with concepts characterizing information content of images.  However, examiner respectfully disagrees.  He describes “images can be represented in…vector forms… vector representation q.sub.t of words [i.e. concepts] in a dictionary” (e.g. in paragraphs 43 and 74).  In other words, concepts are defined in a dictionary that are used to represent the images.  
Applicant also argues that He allegedly does not teach the matrix limitation.  However, examiner respectfully disagrees.  He describes “attention-determining module 236 can operate NCM 418(1) to determine first attention information based at least in part on the feature information of the image and the feature information of the query… take as inputs one or more elements of a vector of the feature input of the image and one or more elements of a vector of the feature information of the query… a prediction output from…the model” (e.g. in paragraphs 84 and 128) and “embedding operation 408 can include a matrix multiplication… Embedding operation 408 can then be carried out as in Eq. (1), in which W.sub.e is an embedding matrix mapping words to a feature space… x.sub.t is a current input, e.g., corresponding to a word of the query text 406 or provided by embedding operation… W matrices and b vectors are weight and bias parameters, respectively… at least embedding operation 408, transform 412, or transform 414 is an identity transform… a weight matrix for the image features [i.e. concept vectors]… a vector of the first attention information” (e.g. in paragraphs 74, 76, 81, and 84-85).  Lin also teaches “attention map is upsampled to match the spatial size (H×W) of z via a bilinear interpolation… elementwise multiplication” (e.g. in paragraph 79).
As such, applicant’s arguments are not persuasive.  

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 11, and 14-20 are rejected under 35 U.S.C. 103 as being unpatentable over He et al. (US 20170293638 A1) in view of Lin et al. (US 20200151448 A1).
As per independent claim 1, He teaches a computer-implemented method for generating a neural network model for producing explainable prediction outputs for input data samples, the method comprising: 
providing a training dataset of data samples each having a prediction label (e.g. in paragraphs 62, 70, and 109, “training set of data 250 is a corpus of data used by training module… training set of data 250 can include at least some of the images 144 or text 146… Training module 246 can then use the determined output text and a corresponding label [i.e. desired prediction] in the training data set to determine an error signal”); 
defining a set of concept vectors comprising a plurality of concept vectors which are associated with respective predefined concepts characterizing information content of the data samples (e.g. in paragraphs 43 and 74, “images can be represented in…vector forms… vector representation q.sub.t of words [i.e. concepts] in a dictionary”); 
producing a set of input vectors from each data sample (e.g. in paragraphs 43 and 74, “images can be represented in…vector forms… vector representation q.sub.t of words in a dictionary”); and 
training a neural network model, comprising a cross-attention module for producing a sample embedding for a data sample (e.g. in paragraph 2, “training computational models, such as neural networks (NNs)… determine first attention information based at least in part on the feature information [embedding] of the image and the feature information [embedding] of the query”, i.e. cross attention) and a prediction module for producing a prediction output from the sample embedding, by supplying the set of input vectors for each data sample to the cross-attention module (e.g. in paragraphs 84 and 128, “attention-determining module 236 can operate NCM 418(1) to determine first attention information based at least in part on the feature information of the image and the feature information of the query… take as inputs one or more elements of a vector of the feature input of the image and one or more elements of a vector of the feature information of the query… a prediction output from…the model”) and training of a cross-attention mechanism between the set of input vectors and the set of concept vectors in the cross-attention module, to optimize a loss function dependent on a difference between the prediction output and the prediction label for each data sample (e.g. in paragraphs 2, 63, 109, and 128, “training computational models… training data can include the images 144 and items the text 146 associated with specific ones of the images… cost [i.e. loss] function can be defined that computes a difference between the training data and a prediction output from the Theano expression of the model… learning-step function can be repeatedly called until convergence criteria are met”, i.e. optimize); 
wherein the sample embedding comprises a matrix of attention weights and a matrix of value vectors, produced from respective concept vectors, via the cross-attention mechanism and the prediction output comprises a linear transformation of a product of the matrix of attention weights and the matrix of value vectors (e.g. in paragraphs 74, 76, 81, and 84-85 “embedding operation 408 can include a matrix multiplication. For example, each word t in query text 406 can be represented as a one-hot vector representation q.sub.t of words in a dictionary. Embedding operation 408 can then be carried out as in Eq. (1), in which W.sub.e is an embedding matrix mapping words to a feature space… x.sub.t is a current input, e.g., corresponding to a word of the query text 406 or provided by embedding operation… W matrices and b vectors are weight and bias parameters, respectively… at least embedding operation 408, transform 412, or transform 414 is an identity transform… a weight matrix for the image features [i.e. concept vectors]… a vector of the first attention information”, note: identity transform is linear), 
but does not specifically teach training a set of weights of the cross-attention mechanism.  
However, Lin teaches training a set of weights of a mechanism (e.g. in paragraphs 45, 54, 63, and 79, “train…weights of networks (e.g., an image tagging network, a backbone network, a convolutional neural network, etc.), training losses computed while training a network… provide the training updates to detection module 152 to update weights of one or more networks… embedding FC layers are made convolutional by converting embedding FC weights to 1×1 convolutional filters. Soft topic word embedding representations for an input image can be converted to 1×1 convolutional filters which are applied to the feature map provided by image tagging network 202 to obtain spatially-preserving responses for each concept (e.g., attention maps)… attention map is upsampled to match the spatial size (H×W) of z via a bilinear interpolation… elementwise multiplication”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of He to include the teachings of Lin because one of ordinary skill in the art would have recognized the benefit of facilitating machine learning.

As per claim 2, the rejection of claim 1 is incorporated and the combination further teaches wherein: a set of concept labels is defined for each data sample, the set of concept labels indicating those concepts which characterize information content of that sample (e.g. He, in paragraphs 62 and 109, “images 144 with associated captions or descriptions… a corresponding label in the training data set”; Lin, in paragraphs 1 and 5, “first training dataset includes a large-scale image tagging dataset with image-level annotations (e.g., each image includes multiple tags from a larger vocabulary of tags)”); and the loss function is further dependent on a difference between a desired distribution of attention indicated by the set of concept labels and the matrix of attention weights for each data sample (e.g. He, in paragraphs 74, 76, 109, and 128, “Embedding operation 408 can then be carried out as in Eq. (1), in which W.sub.e is an embedding matrix mapping words to a feature space… x.sub.t is a current input, e.g., corresponding to a word of the query text 406 or provided by embedding operation… W matrices and b vectors are weight and bias parameters, respectively… cost function can be defined that computes a difference between the training data [“a corresponding label in the training data set”] and a prediction output”; Lin, in paragraphs 63 and 120, “Training module 154 can generate any suitable training update in any suitable way, such as weights of neural networks used in convolutions, updated by stochastic gradient descent that minimizes any suitable loss function”).
As per claim 3, the rejection of claim 1 is incorporated and the combination further teaches after generating the neural network model, applying the model for inference to a new data sample by producing a set of input vectors from the new data sample and supplying that set of input vectors to the model to obtain a prediction output for the new data sample (e.g. He, in paragraphs 2, 21, and 43, “computational models, such as neural networks (NNs)… for using the trained computational models in, e.g., analyzing images or answering questions about content depicted in images… Models trained as described herein can be operated to determine answers to queries about images; describe image contents; or determine or extract portions of images that are relevant to particular queries, e.g., to support image searching… images can be represented in…vector forms”).
As per claim 11, the rejection of claim 1 is incorporated and the combination further teaches wherein the set of concept vectors further comprises at least one unallocated concept vector which is not associated with a concept (e.g. Lin, in paragraphs 3 and 81, “can scale and generalize to word embeddings and attention maps of unseen concepts”).
As per claim 14, the rejection of claim 1 is incorporated and the combination further teaches wherein: the set of input vectors for each data sample comprises a plurality of input vectors, the prediction module produces, via said linear transformation, a prediction result corresponding to each input vector, and the prediction output is produced by aggregating the prediction results for the input vectors (e.g. He, in paragraphs 43, 76, 81, and 86, “represented in…vector forms… training data can include the images 144 and items the text 146… vector representation q.sub.t of words, W matrices… at least embedding operation 408, transform 412, or transform 414 is an identity transform… averaging, normalization, summing, or concatenating operators, e.g., operating on vector outputs from corresponding NCMs”).
As per claim 15, the rejection of claim 1 is incorporated and the combination further teaches wherein the prediction output is indicative of a probability distribution over a predetermined range of possible prediction outputs for data samples (e.g. He, in paragraphs 92-93, “In some examples, the output-element values are probabilities that the respective words are answers found in image data… probabilities that each of the W words in a dictionary is an answer to the query”).
As per claim 16, the rejection of claim 1 is incorporated and the combination further teaches wherein the prediction module comprises a classification module and said prediction output is indicative of a probability distribution over a predetermined plurality of classes for classification of data samples (e.g. He, in paragraphs 70 and 92, “CCM 404 can be trained over a number of epochs sufficient to provide a desired classification accuracy… In some examples, the output-element values are probabilities that the respective words are answers found in image data”).
As per claim 17, the rejection of claim 1 is incorporated and the combination further teaches wherein the data samples comprise one of image data, audio data, measurement data for physical system, text data, sequential data, and medical data for patients (e.g. He, in paragraph 43, “training data can include the images 144… the text 146”).
As per claim 18, the rejection of claim 1 is incorporated and the combination further teaches wherein the data samples comprise image data and wherein the prediction module comprises a classification module (e.g. He, in paragraphs 43 and 70, “training data can include the images 144… CCM 404 can be trained over a number of epochs sufficient to provide a desired classification accuracy”).
	Claims 19-20 are the product claims corresponding to method claims 1 and 3, and are rejected under the same reasons set forth, and the combination further teaches a computer readable storage medium having program instructions embodied therein, the program instructions being executable by a computing system (e.g. He, in paragraphs 30 and 33, “RAM, static RAM (SRAM), dynamic RAM (DRAM), phase change memory (PRAM)”, etc.).

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over He et al. (US 20170293638 A1) in view of Lin et al. (US 20200151448 A1) and further in view of Tomsett et al. (US 20200402658 A1).
As per claim 4, the rejection of claim 3 is incorporated and the combination further teaches providing, with the prediction output for the new data sample, a matrix of attention weights for the new data sample (e.g. He, in paragraphs 74, 76, 109, and 128, “Embedding operation 408 can then be carried out as in Eq. (1), in which W.sub.e is an embedding matrix mapping words to a feature space… x.sub.t is a current input, e.g., corresponding to a word of the query text 406 or provided by embedding operation… W matrices and b vectors are weight and bias parameters, respectively… a prediction output”; Lin, in paragraphs 63 and 120, “Training module 154 can generate any suitable training update in any suitable way, such as weights of neural networks used in convolutions”), but does not specifically teach as an explanation of the prediction output.  However, Tomsett teaches weights as an explanation of a prediction output (e.g. in paragraphs 4, 105, and 118, “an explanation selection component that accesses a plurality of explanation generation components to generate different types of explanations of a machine learning output… explanation generation components in a group can iteratively generate…other output data can include…model weights”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Tomsett because one of ordinary skill in the art would have recognized the benefit of providing relevant information.

Claims 5-10 are rejected under 35 U.S.C. 103 as being unpatentable over He et al. (US 20170293638 A1) in view of Lin et al. (US 20200151448 A1) and further in view of Cao et al. (“Concept Learners for Few-Shot Learning”, 3/20/3021, 17 pages as cited in the IDS dated 01/04/2023).
As per claim 5, the rejection of claim 1 is incorporated, but the combination does not specifically teach wherein the set of input vectors for a data sample comprises a plurality of local input vectors corresponding to respective portions of that data sample.  However, Cao teaches a set of input vectors comprising a plurality of local input vectors corresponding to respective portions of that data sample (e.g. in page 3 and page 4 section 2.3, “Each labeled data point (x, y) consists of a D-dimensional feature vector x ∈ R D and a class label y ∈ {1, ..., K}… Local… for the query point xq are given by local… a set of query points of interest”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Cao because one of ordinary skill in the art would have recognized the benefit of representing relevant information.
As per claim 6, the rejection of claim 2 is incorporated, but the combination does not specifically teach wherein: the set of input vectors for each data sample comprises a plurality of local input vectors corresponding to respective portions of that data sample; and the set of concept labels comprises a plurality of local concept labels, associated with respective local input vectors, each indicating those concepts which characterize information content of the portion of the data sample corresponding to the local input vector associated with that concept label.  However, Cao teaches a set of input vectors for each data sample comprising a plurality of local input vectors corresponding to respective portions of that data sample (e.g. in page 3 and page 4 section 2.3, “Each labeled data point (x, y) consists of a D-dimensional feature vector x ∈ R D and a class label y ∈ {1, ..., K} … Local… a set of query points of interest”) and a set of concept labels comprising a plurality of local concept labels, associated with respective local input vectors, each indicating those concepts which characterize information content of the portion of the data sample corresponding to the local input vector associated with that concept label (e.g. in page 4 section 2.3, “Local…concept… for the query point xq are given by local concept”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Cao because one of ordinary skill in the art would have recognized the benefit of representing relevant information.
As per claim 7, the rejection of claim 1 is incorporated, but the combination does not specifically teach wherein the set of input vectors for each data sample comprises a global input vector corresponding to the data sample as a whole.  However, Cao teaches a set of input vectors for a data sample comprising a global input vector corresponding to the data sample as a whole (e.g. in page 3 and page 4 section 2.3, “Each labeled data point (x, y) consists of a D-dimensional feature vector x ∈ R D and a class label y ∈ {1, ..., K} … global… all query points of interest… across a set of examples”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Cao because one of ordinary skill in the art would have recognized the benefit of representing relevant information.
As per claim 8, the rejection of claim 2 is incorporated, but the combination does not specifically teach wherein: the set of input vectors for each data sample comprises a global input vector corresponding to the data sample as a whole; and the set of concept labels comprises a global concept label, associated with the global input vector, indicating those concepts which characterize information content of the data sample as a whole.  However, Cao teaches a set of input vectors for each data sample comprises a global input vector corresponding to the data sample as a whole (e.g. in page 3 and page 4 section 2.3, “Each labeled data point (x, y) consists of a D-dimensional feature vector x ∈ R D and a class label y ∈ {1, ..., K} … global… all query points of interest… across a set of examples”) and the set of concept labels comprises a global concept label, associated with the global input vector, indicating those concepts which characterize information content of the data sample as a whole (e.g. in page 4 section 2.3, “global concept… concept embeddings of all query points of interest… across a set of examples”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Cao because one of ordinary skill in the art would have recognized the benefit of representing relevant information.
As per claim 9, the rejection of claim 2 is incorporated, but the combination does not specifically teach wherein: the set of input vectors for each data sample comprises a global input vector corresponding to the data sample as a whole and a plurality of local input vectors corresponding to respective portions of that data sample; the set of concept labels includes a plurality of local concept labels, associated with respective local input vectors, each indicating those concepts which characterize information content of the portion of the data sample corresponding to the local input vector associated with that concept label; and the set of concept labels further comprises a global concept label, associated with the global input vector, indicating those concepts which characterize information content of the data sample as a whole.  However, Cao teaches wherein: the set of input vectors for each data sample comprises a global input vector corresponding to the data sample as a whole (e.g. in page 3 and page 4 section 2.3, “Each labeled data point (x, y) consists of a D-dimensional feature vector x ∈ R D and a class label y ∈ {1, ..., K} … global… all query points of interest… across a set of examples”) and a plurality of local input vectors corresponding to respective portions of that data sample (e.g. in page 3 and page 4 section 2.3, “Each labeled data point (x, y) consists of a D-dimensional feature vector x ∈ R D and a class label y ∈ {1, ..., K}… Local…concept … for the query point xq are given by local… a set of query points of interest”); the set of concept labels includes a plurality of local concept labels, associated with respective local input vectors, each indicating those concepts which characterize information content of the portion of the data sample corresponding to the local input vector associated with that concept label (e.g. in page 4 section 2.3, “Local…concept… for the query point xq are given by local concept… a set of query points of interest”); and the set of concept labels further comprises a global concept label, associated with the global input vector, indicating those concepts which characterize information content of the data sample as a whole (e.g. in page 4 section 2.3, “global concept… concept embeddings of all query points of interest… across a set of examples”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Cao because one of ordinary skill in the art would have recognized the benefit of representing relevant information.
As per claim 10, the rejection of claim 9 is incorporated and the combination further teaches wherein: the predefined concepts comprise a set of local concepts, characterizing information content of portions of said data samples (e.g. Cao, in page 4 section 2.3, “Local…concept… for the query point xq are given by local… a set of query points of interest”), and a set of global concepts characterizing information content of data samples as a whole (e.g. Cao, in page 4 section 2.3, “global concept… concept embeddings of all query points of interest… across a set of examples”); and in the cross-attention module, cross-attention is applied between each local input vector and each concept vector associated with a local concept (e.g. Cao, in page 3 and page 4 section 2.3, “Each labeled data point (x, y) consists of a D-dimensional feature vector x ∈ R D and a class label y ∈ {1, ..., K}… Local…concept … for the query point xq are given by local”), and between each global input vector and each concept vector associated with a global concept (e.g. Cao, in page 3 and page 4 section 2.3, “Each labeled data point (x, y) consists of a D-dimensional feature vector x ∈ R D and a class label y ∈ {1, ..., K} … global concept… all query points of interest… across a set of examples”).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over He et al. (US 20170293638 A1) in view of Lin et al. (US 20200151448 A1) and further in view of Lockett (US 10108902 B1).
As per claim 12, the rejection of claim 1 is incorporated, but the combination does not specifically teach wherein producing the set of input vectors from each data sample comprises tokenizing each data sample via an embedding process.  However, Lockett teaches producing a set of input vectors from each data sample comprising tokenizing each data sample via an embedding process (e.g. in column 8 lines 27-36 column 19 lines 56-63, “divides data objects into a sequence of tokens via tokenizer 203 shown in FIG. 2. Thereafter, word embedder 205 assigns to each token a word-embedding vector as shown at 503. Vector sequences are then convolved at 505 to determine localized features… a data object representation…is generated”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Lockett because one of ordinary skill in the art would have recognized the benefit of generating a representation of the data sample.

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over He et al. (US 20170293638 A1) in view of Lin et al. (US 20200151448 A1) and Cao et al. (“Concept Learners for Few-Shot Learning”, 3/20/3021, 17 pages as cited in the IDS dated 01/04/2023) and further in view of Lockett (US 10108902 B1).
As per claim 13, the rejection of claim 5 is incorporated, but the combination does not specifically teach wherein producing the set of input vectors from each data sample comprises tokenizing each data sample via an embedding process which preserves correspondence between the input vectors and respective portions of the data sample.  However, Lockett teaches producing a set of input vectors from each data sample comprising tokenizing each data sample via an embedding process which preserves correspondence between the input vectors and respective portions of the data sample (e.g. in column 8 lines 27-36 column 19 lines 56-63, “divides data objects into a sequence of tokens via tokenizer 203 shown in FIG. 2. Thereafter, word embedder 205 assigns to each token a word-embedding vector as shown at 503. Vector sequences are then convolved at 505 to determine localized features… a data object representation…is generated”).  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of the combination to include the teachings of Lockett because one of ordinary skill in the art would have recognized the benefit of generating a representation of the data sample.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
For example, 
	Burachas et al. (US 20190370587 A1) teaches “techniques for attention-based explanations for artificial intelligence behavior” (e.g. in abstract).
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WONG whose telephone number is (571)270-1399. The examiner can normally be reached Monday-Friday 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, TAMARA KYLE can be reached at (571)272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/W.W/Examiner, Art Unit 2144                                                                                                                                                                                                        01/07/2026


/TAMARA T KYLE/Supervisory Patent Examiner, Art Unit 2144
Read full office action
Prosecution Timeline

Show 5 earlier events
Dec 16, 2025
Response Filed
Jan 21, 2026
Final Rejection mailed — §101, §103, §112
Mar 10, 2026
Interview Requested
Mar 16, 2026
Applicant Interview (Telephonic)
Mar 17, 2026
Examiner Interview Summary
Mar 19, 2026
Response after Non-Final Action
Apr 15, 2026
Request for Continued Examination
Apr 24, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/323,489
Patent 12639585
PROACTIVE ALERT AGGREGATION AND CORRELATION MANAGEMENT WITH AUTOMATED SUMMARIZATION
5y 0m to grant Granted May 26, 2026
17/641,455
Patent 12639566
METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR MANAGING MODEL UPDATES
4y 2m to grant Granted May 26, 2026
18/021,247
Patent 12572252
CONTROLLING A 2D SCREEN INTERFACE APPLICATION IN A MIXED REALITY APPLICATION
3y 0m to grant Granted Mar 10, 2026
18/347,443
Patent 12530707
CUSTOMER EFFORT EVALUATION IN A CONTACT CENTER SYSTEM
2y 6m to grant Granted Jan 20, 2026
18/783,227
Patent 12511846
XR DEVICE-BASED TOOL FOR CROSS-PLATFORM CONTENT CREATION AND DISPLAY
1y 5m to grant Granted Dec 30, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
30%
Grant Probability
57%
With Interview (+26.9%)
4y 5m (~9m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 397 resolved cases by this examiner. Grant probability derived from career allowance rate.