Last updated: May 29, 2026
Application No. 18/193,572
HALLUCINATION MITIGATION FOR GENERATIVE TRANSFORMER MODELS

Final Rejection §102§103
Filed
Mar 30, 2023
Priority
Oct 20, 2022 — provisional 63/418,003
Examiner
MCLEAN, IAN SCOTT
Art Unit
2654
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
4 (Final)
Interview Optional

— +31.4% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 45% grant rate with +31.4% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 47 resolved cases, 2023–2026
Examiner Intelligence

MCLEAN, IAN SCOTT View full profile →
Grants 45% of resolved cases
Career Allowance Rate
21 granted / 47 resolved
-17.3% vs TC avg
Strong +31% interview lift
Without
With
+31.4%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
22 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§103
88.4%
+48.4% vs TC avg
§102
11.6%
-28.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 47 resolved cases
Office Action

§102 §103
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
2.	Applicant’s arguments with respect to claims 1 and 22 and dependent have been considered but are moot because the present rejection relies on a new combination of references and does not rely on any reference applied in the prior rejection of record for newly added search algorithms and EOS token limitations of claim 1, which was previously rejected under 35 U.S.C. 102.
Specifically, Aggarwal is relied upon for the natural language processing steps, including generating sentence level token sequences, determining entailment and natural language inference based confidence scores, adjusting confidence based on faithfulness, ranking and selecting sequences and constructing output text. Rush et al. “A Neural Attention Model for Abstractive Sentence Summarization,” is relied upon for teaching known search decoding techniques for neural abstractive sentence generation, including greedy and beam search decoding within the context of mitigating hallucinations, summarization and faithfulness. Freitag et al., Beam Search Strategies for Neural machine Translation, is further relied upon for teaching end-of-sentence based completion of sequences during beam search decoding. Therefore, the newly added search algorithm and end-of-sentence token limitations are addressed by the newly cited art.
With respect to claim 2, newly cited Jiao (US 2023/0385315) is relied upon for the conventional autoregressive language model teaching that generated tokens are added to the prior context for subsequent generation.


Claim Rejections - 35 USC § 103
3.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

4.	Claims 1, 3, 6-18, 22-24, 26-30 are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal (US 2024/0119220), in view of Rush et al “A Neural Attention Model for Sentence Summarization” herein Rush, further in view of Freitag et al. “Beam Search Strategies for Neural Machine Translation” herein Freitag.

Regarding Claim 1
Aggarwal discloses an apparatus for natural language processing, the apparatus comprising: 
at least one memory (Aggarwal: p[0006] discloses a memory); 
and at least one processor coupled to the at least one memory (Aggarwal: p[0006] discloses a memory configured to store instructions that are run on a processor), the at least one processor configured to:
generate, based on input content  a first sequence of tokens that represents a first sentence fragment and a second sequence of tokens that represents a second sentence fragment (Aggarwal: ¶[0096] discloses each simplified sentence is a sequence of tokens generated from input content. A sentence fragment is broader than a complete sentence and therefore reads on the discloses sentences), ;
determine a first confidence level associated with the first sequence of tokens and a second confidence level associated with the second sequence of tokens, wherein the first confidence level is based on respective confidence levels associated with each token in the first sequence of tokens (Aggarwal: ¶[0097] entailment scores quantify confidence/faithfulness of each sentence level token sequence, separate scores are computing for each Sj corresponding to first and second confidence levels);
generate,  and a second complete sentence that includes the second sequence of tokens (Aggarwal: ¶[0060]-[0061], ¶[0106] discloses a text simplification model (i.e., a first algorithm) to generate complete sentences S1, S2 that include the respective token  sequences);;
generate a first natural language inference (NLI) score for the first complete sentence and a second NLI score for the second complete sentence (Aggarwal: ¶[0076] discloses entailment scores are NLI scores generated per sentence. Separate entailment scores are computed S1 and S2), wherein the first NLI score is based on faithfulness of the first complete sentence to the input content (Aggarwal: ¶[0030], ¶[0074] discloses entailment scores to faithfulness between generated text and input content);
adjust the first confidence level for the first sequence of tokens based on the first NLI score for the first complete sentence to generate an updated first confidence level for the first sequence of tokens (Aggarwal: ¶[0098] thresholding and pruning based on entailment modifies whether a sentence is retained, effectively adjusting its confidence for inclusion);
adjust the second confidence level for the second sequence of tokens based on the second NLI score for the second complete sentence to generate an updated second confidence level for the second sequence of tokens (Aggarwal: ¶[0097]-[0098] entailment scores are computed for each sentence, i.e., the same logic is applied independently to each sequence, including the second sequence);
rank the first sequence of tokens and the second sequence of tokens based on the updated first confidence level and the updated second confidence level to select a sequence of tokens from a set that includes the first sequence of tokens and the second sequence of tokens (Aggarwal: ¶[0099]-[0100] discloses multiple confidence related scores are evaluated comparatively across sentences, meaning it is necessarily ranking them for selection through thresholds);
 and generate, using (Aggarwal: ¶[0099]-[0100] discloses constructing the modified text P from selected sequences necessarily involves generating output text based on the selected token sequence, i.e., a second algorithm) .
Aggarwal does not explicitly disclose
	generate, based on input content and using a first search algorithm, a first sequence of tokens that represents a first sentence fragment and a second sequence of tokens that represents a second sentence fragment, wherein the first sequence of tokens and the second sequence of tokens both end before any end-of- sentence (EOS) token;
generate, using a second search algorithm, a first complete sentence that includes the first sequence of tokens followed by a first set of one or more additional tokens followed by a first EOS token, and a second complete sentence that includes the second sequence of tokens followed by a second set of one or more additional tokens followed by a second EOS token;
and generate, using the first search algorithm and based on the selected sequence of tokens, a third complete sentence that includes the selected sequence of tokens followed by a third set of one or more additional tokens and a third EOS token.
However, Rush teaches: 
generate, based on input content and using a first search algorithm, a first sequence of tokens that represents a first sentence fragment and a second sequence of tokens that represents a second sentence fragment (Rush: Section 4 Generating Summaries teaches a beam search decoder that maintains K potential hypotheses at each summary position with Algorithm 1 receiving input x and outputting Approx. K best summaries, the algorithm initializes the beam iterates from i = 0 to n-1, generates hypotheses by appending yi+1 to an existing hypothesis y, filters to the k best and returns the resulting sequences. Therefore, before the final step each y in the beam is a partial token sequence, i.e., a sentence fragment, generated based on input x. Any two partial hypotheses in the beam correspond to the claimed first sequence of tokens representing a first sentence fragment and a second sequence of tokens representing a second sentence fragment);
generate, using a second search algorithm, a first complete sentence that includes the first sequence of tokens and a second complete sentence that includes the second sequence of tokens (Rush: Section 4 generating summaries teaches greedy search as a second search algorithm, explicitly teaching the greedy decoding for sentence generation);
and generate, using the first search algorithm and based on the selected sequence of tokens, a third complete sentence that includes the selected sequence of tokens (Rush: Section 4 generating summaries teaches left to right decoding appends additional tokens to an existing partial hypothesis until the full summary sequence is produce. That maps directly to generating a complete sentence that includes an initial sequence of tokens followed by an additional sequence).
Aggarwal and Rush are combinable because they are from the same field of endeavor, natural language generation and evaluation of model efficacy. Aggarwal teaches the overall pipeline, generate text, score, rank and select via entailment/hallucination, construct output. Rush teaches known search algorithms for the generation steps, greedy decoding and beam search decoding are conventional ways to generate and compete token sequences in abstractive sentence generation. It would have been obvious to one of ordinary skill in the art to implement Aggarwal’s generation/construction stages using known decoding search algorithms from rush and to use one known search algorithm for generating preliminary token fragments and another known search algorithm for completing or constructing the final sentence because rush explicitly teaches both greedy and beam decoding as search techniques for sentence generation and using such known search techniques would predictably generate candidate token sequences and complete sentences. The motivation for doing so is disclosed in the abstract of Rush: “While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data.”
The proposed combination of Aggarwal and Rush does not explicitly disclose wherein the first sequence of tokens and the second sequence of tokens both end before any end-of- sentence (EOS) token;
a first complete sentence that includes the first sequence of tokens followed by a first set of one or more additional tokens followed by a first EOS token
and a second complete sentence that includes the second sequence of tokens followed by a second set of one or more additional tokens followed by a second EOS token;
third complete sentence that includes the selected sequence of tokens followed by a third set of one or more additional tokens and a third EOS token. 
However, Freitag discloses wherein the first sequence of tokens and the second sequence of tokens both end before any end-of- sentence (EOS) token (Freitag: Abstract discloses the beam search strategy generates the translation word by word from left to right while keeping a fixed number (beam) of active candidates at each time step, and Section 3 Original Beam Search discloses for each end-of-sequence symbol that is selected among the highest scoring candidates the beam is reduced by one and the translation is stored into a final candidate list);
a first complete sentence that includes the first sequence of tokens followed by a first set of one or more additional tokens followed by a first EOS token (Freitag: Section 3 Original Beam Search teaches the active candidate, which is the claimed first sequence of tokens because it is a partial generated token sequence maintained during decoding, because Freitag generates word by word from left to right the active candidate is extended by later generated words/tokens those later words and tokens corresponds to the claimed first set of one or more additional tokens. When the end of sequence symbol is selected the system stores the translation in a final candidate list, that EOS symbol corresponds to the claimed first EOS token);
and a second complete sentence that includes the second sequence of tokens followed by a second set of one or more additional tokens followed by a second EOS token (Freitag: Section 3 Original Beam Search, builds each translation left to right a partial active candidate is extended over later time steps by adding additional words/tokens. When an EOS symbol is selected, that extended candidate becomes a final completed translation. Therefore, teaching a complete sentence made of initial token sequence one or more additional generated tokens + EOS token. The same beam search process applies to multiple candidates so it teaches both the first complete sentence and second complete sentence);
third complete sentence that includes the selected sequence of tokens followed by a third set of one or more additional tokens and a third EOS token (Freitag: Section 3 Original Beam Search, “For each end-of-sequence symbol that is selected among the highest scoring candidates the beam is reduced by one and the translation is stored into a final candidate list” and “When the beam is zero, it stops the search and picks the translation with the highest log-probability (normalized by the number of target words) out of the final candidate list.” teaches selecting a completed output from candidates in the final candidate list. Since candidates are generated left to right and stored as final candidates only after EOS is selected, the selected final translation includes the selected earlier token sequence, later addional tokens and the EOS symbol. Therefore, teaching generating a final selected complete sentence comprising a selected sequence and an additional token and an EOS token).
Aggarwal and Rush in view of Freitag are combinable because they are all from similar fields of endeavor, i.e., each disclose a method for producing generative text output from a deep neural network architecture. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Freitag’s EOS termination convention in the Aggarwal and Rush generation pipeline because Freitag teaches that conventional beam search decoding in neural natural language generation builds candidates left to right, maintains partial active candidates before EOS and stored completed candidates only after an end of sequence symbol is selected. The motivation for doing so is disclosed in Freitag’s abstract: “We speed up the original decoder by up to 43% for the two language pairs German -> English and Chinese -> English without losing any translation quality.”

Regarding Claim 3:
Aggarwal further discloses the apparatus of claim 1, except wherein the second search algorithm is, a greedy search. However, Rush discloses this limitation (Rush: Section 4 discloses using a greedy search as a baseline decoding method).
It would have been obvious to combine Aggarwal and Rush in order to obtain the claimed invention. Li discloses generating a complete sentence using a generative transformer model but does not teach greedy decoding. Rush teaches greedy decoding as a baseline method for generating output sequences in a neural summarization system. The motivation to combine Li and Rush would be to substitute greedy decoding in Aggarwal’s framework as a well-known, simpler alternative to beam search commonly used for faster inference with minimal implementation changes. This substitution would produce predictable results and not require any undue experimentation.

Regarding Claim 6:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, the at least one processor configured to: rank the first sequence of tokens and the second sequence of tokens based on the first confidence level associated with the first sequence of tokens and the second confidence level associated with the second sequence of tokens to generate a first ranking. (Aggarwal: p[0066] discloses the processor compares multiple generated sequences (here, simplified sentences is token sequences) according to numeric scores – functionally a ranking among candidate outputs).
Regarding Claim 7:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 6, wherein, to rank the first sequence of tokens and the second sequence of tokens based on the updated first confidence level and the updated second confidence level, the at least one processor configured to: update the first ranking of the first sequence of tokens and the second sequence of tokens based on the updated first confidence level and the updated second confidence level to generate a second ranking (Aggarwal: p[0050], p[0054] discloses res-scoring candidate sequences using updated confidence type metrics (entailment = faithfulness; hallucination = inverse confidence) these updated scores correspond to updated first and second confidence levels, the comparison among them constitutes re-ranking).

Regarding Claim 8:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, except wherein the second algorithm is a beam search. However, Li discloses wherein the second algorithm is a beam search (Rush: Section 4 Generating Summaries teaches a beam search decoder, Freitag: Abstract discloses the beam search strategy).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to disclose generation of multiple candidate sequences using Rush and Freitag’s beam search strategy. Both references are directed to sequence generation models in natural language generation tasks, and specifically mitigating hallucinations and producing faithful output. Beam search is a well-known, standard technique for efficiently producing multiple candidate sequences with varying likelihoods from an encoder-decoder model. One of ordinary skill in the art would recognize that substituting Rush and Freitag beam search for Aggarwal’s unspecified generation step would predictably yield improved diversity and faithfulness in the candidate outputs, without changing the underlying operation of Aggarwal’s system.

Regarding Claim 9:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein the third complete sentence is configured to summarize the at least a portion input content (Aggarwal: p[0028] discloses that the system is designed for producing abstractive summaries of the input).

Regarding Claim 10:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein the selected sequence of tokens is the first sequence of tokens based on the first updated confidence level for the first sequence of tokens exceeding the second updated confidence level for the second sequence of tokens (Aggarwal: p[0069] discloses once the modified text is determined, it is presented to the user via a user interface, this corresponds to the generated output text including the highest-ranked sequences).

Regarding Claim 11:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein the second NLI score for the second complete sentence is based on faithfulness of the second complete sentence to the input content tokens (Aggarwal: p[0050] discloses computing entailment and hallucination scores which reflect faithfulness to the original input content, it generates this for multiple sentences).

Regarding Claim 12:
	The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein the third complete sentence output text is configured to be responsive to summarize the input content as part of a conversation (Aggarwal: p[0028] discloses that the system is designed for producing abstractive summaries of the input).

Regarding Claim 13
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein the first NLI score identifies that at least a portion of the complete sentence is one of true or false, or neutral (Agarwal: p[0111] discloses the neural network outputs a label, “entailment, contradiction, neutral” which meet this limitation).

Regarding Claim 14
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein the input content includes input text (Aggarwal: p[0041] the input is a complex text selection from the user).

Regarding Claim 15:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein each token of the first sequence of tokens is at least a portion of a respective word (Aggarwal: p[0041] the sentences that are produced contain words).



Regarding Claim 16:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein the first sequence of tokens is configured to follow after a previously-determined sequence of tokens in the first complete sentence, wherein the first complete sentence includes the previously-determined sequence of tokens, the first sequence of tokens, and at least one additional token (Aggarwal: p[0048] discloses a sequence to sequence model that is auto regressive and uses an encoder decoder architecture, the process continues one token at a time until a "stop token" is predicted, signaling the end of the sentence or text generation. Like all transformer architecture this process is fundamental).

Regarding Claim 17:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein the first search algorithm is more computationally resource-intensive than the second search algorithm (Aggarwal ¶[0063]-[0066] the text simplification algorithm performs a single forward generative pass, this corresponds to standard autoregressive decoding, no loops, logic controls or thresholds. The second algorithm discloses in ¶[0097] discloses finding entailment scores for each pieces of simplified text sentence, computing scores and using a bi-directional encoder from transformers architecture (BERT). This is not generation, it requires a BERT based-based NLI inference, a plurality of sentence comparisons, each Sj triggers multiple forward passes through the second network which is an explicit and clear increase in computation compared to the previous generation algorithm. It also performs conditional logic and score aggregation altogether making this algorithm more resource-intensive by definition. This same logic applies as well through Rush and Freitag, as Beam search is more intensive than greedy search, all greedy algorithms are designed this way, greedy search picks the most probable candidate and moves on, this is cheap and fast but may be less accurate, beam search explores multiple possibilities).

Regarding Claim 18:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, wherein the at least one processor is configured to: output the third complete sentence (Aggarwal: p[0041] the output contains the sequence of tokens that were selected).

Regarding Claim 22:
Claim 22 has been analyzed with regard to claim 1 (see rejection above) and is rejected for the same reasons of obviousness used above.

Regarding Claim 24:
Claim 24 has been analyzed with regard to claim 3 (see rejection above) and is rejected for the same reasons of obviousness used above.

Regarding Claim 26:
	Claim 26 has been analyzed with regard to claim 6 (see rejection above) and is rejected for the same reasons of anticipation used above.

Regarding Claim 27:
	Claim 27 has been analyzed with regard to claim 7 (see rejection above) and is rejected for the same reasons of obviousness used above.

Regarding Claim 28:
Claim 28 has been analyzed with regard to claim 8 (see rejection above) and is rejected for the same reasons of obviousness used above.

Regarding Claim 29:
Claim 28 has been analyzed with regard to claim 17 (see rejection above) and is rejected for the same reasons of obviousness used above.

Regarding Claim 30:
	Claim 30 has been analyzed with regard to claim 18 (see rejection above) and is rejected for the same reasons of obviousness used above.

5.	Claims 2 and 23 is rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal, in view of Rush, in view of Freitag and further in view of Jiao (US 2023/0385315).

Regarding Claim 2:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, except wherein the first complete sentence includes the input content followed by the first sequence of tokens followed by the first set of one or more additional tokens followed by the first EOS token, wherein the second complete sentence includes the input content followed by the second sequence of tokens followed by the second set of one or more additional tokens followed by the second EOS token, and wherein the third complete sentence includes the input content followed by the selected sequence of tokens followed by the third set of one or more additional tokens followed by the third EOS token.
However, Jiao discloses wherein the first complete sentence includes the input content followed by the first sequence of tokens followed by the first set of one or more additional tokens followed by the first EOS token (Jiao: ¶2 discloses a first text sequence corresponding to an input query which is encoded into a source sequence representing an encoder of a machine learning model, a keyword sentence is generated from the source sequence representation using a decoder of the machine learning model, ¶3 discloses the decoder may select the prediction token of the plurality of prediction tokens based on the modified generation score and add the selected prediction token to the previously decoded partial hypothesis ¶82 discloses the decoded partial hypothesis may correspond to an intermediate output sequence that has been generated by a decoder based on an input sequence encoded by and encoder, ¶84 discloses the selected prediction tokes are added to the partially decoded sequence. Altogether Jiao discloses receiving input content, encoding it and using a decoder to generate an output sequence based on that input. The output is generated by adding selected prediction tokens to a previously decoded partial hypothesis, then repeating the process to add additional tokens until an endo of sentence token is encountered),
wherein the second complete sentence includes the input content followed by the second sequence of tokens followed by the second set of one or more additional tokens followed by the second EOS token (Jiao: ¶33 and ¶40 discloses generating a plurality of output text segments from the same input content. Each output segment is generated word by word over multiple time steps and therefore corresponds to a first or second complete sentence. The same decoding process appends selected tokens and additional later tokens until EOS is reached), and 
wherein the third complete sentence includes the input content followed by the selected sequence of tokens followed by the third set of one or more additional tokens followed by the third EOS token (Jiao: ¶2, ¶80, ¶82 and ¶84 discloses  token based on a score adding it to the prior partial hypothesis and then continuing generation by adding additional tokens until EOS. Therefore, after a selected sequence is chosen the decoder generates a complete output sequence including the selected sequence, additional tokens and an EOS token).
The combination of Aggarwal, Rush and Freitag in view of Jiao are combinable because they disclose pertinent subject matter to each other. Aggarwal teaches the hallucination mitigation system, including generation, natural language inference, faithfulness scoring, ranking and selecting token sequences for constructing output. Rush and Freitag teach conventional neural decoding search and end-of-sentence completion. Jiao further teaches that a seq2seq decoder generates output segments based on input content by adding selected prediction tokens to a previously decoded partial hypothesis, repeatedly adding additional tokens and ending when an end-of-sentence token is encountered. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to disclose Jiao’s partial hypothesis decoding structure into the Aggarwal, Rush and Freitag systems. The motivation for doing so is “a key interest for search engine service providers is developing more semantically related keywords from user queries than generating keywords from traditional information retrieval techniques” as disclosed in ¶1 of Jiao.

Regarding Claim 23:
Claim 23 has been analyzed with regard to claim 2 (see rejection above) and is rejected for the same reasons of obviousness used above.

6.	Claims 4-5 and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal in view of Rush, further in view of Freitag and further in view of  Yang et al. “Saliency Detection via Graph-Based Manifold Ranking” herein Yang.


Regarding Claim 4:
The proposed combination of Aggarwal, Rush and Freitag further discloses the apparatus of claim 1, except the at least one processor configured to: restrict candidate tokens for use in generating the first complete sentence and the second complete sentence based on comparison between respective saliency values for the candidate tokens and a saliency threshold. However, Yang discloses this limitation (Yang: At least Section 2.2 discloses computing saliency values and filtering based on a threshold to select only salient components. This satisfies the restriction of candidate tokens based on saliency exceeding a threshold).
It would have been obvious to combine Aggarwal, Rush and Freitag in view of Yang in order to obtain the claimed invention. Aggarwal discloses generating sequences using a decoder but does not filter candidate tokens based on saliency. Yang discloses restriction based on saliency values indicating content importance. Yang’s solution is reasonably pertinent to the problem faced. The motivation to combine Aggarwal and Yang would be to improve the quality, relevance and faithfulness of the generated output by pruning unimportant tokens, a common technique that could improve the overall accuracy of the model.

Regarding Claim 5:
	The proposed combination of Aggarwal, Rush, Freitag and Yang further discloses the apparatus of claim 4, wherein the saliency threshold is based on an average of the respective saliency values for the candidate tokens (Yang: Section 5.1 uses a threshold set to twice the mean saliency of the image).
It would have been obvious to combine Aggarwal, Rush and Freitag in view of Yang in order to obtain the claimed invention. Aggarwal discloses generating sequences using a decoder but does not filter candidate tokens based on saliency. Yang discloses restriction based on saliency values indicating content importance. Yang’s solution is reasonably pertinent to the problem faced. The motivation to combine Aggarwal and Yang would be to improve the quality, relevance and faithfulness of the generated output by pruning unimportant tokens, a common technique that could improve the overall accuracy of the model.

Regarding Claim 25
Claim 25 has been analyzed with regard to claim 4 (see rejection above) and is rejected for the same reasons of obviousness used above.

7.	Claims 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal in view of Rush, further in view of Freitag and further in view of Tucker (US 11,893,994).

Regarding Claim 19:
The proposed combination of Aggarwal, Rush, Freitag further discloses the apparatus of claim 1, except wherein the at least one processor is configured to: cause a display to display the third complete sentence. However, Tucker discloses this limitation: (Tucker: Fig. 5a discloses a display component for displaying the output sequence).
	Aggarwal, Rush and Freitag collectively teach generating natural language output using a generative inference model, but does not disclose displaying that output via a physical display. Tucker discloses an apparatus comprising a display configured to present text-based responses generated by a conversational/inference model, Tuckers display renders outputs in real time. It would have been obvious to one of ordinary skill in the art to combine Tucker’s system into the proposed system of Aggarwal, Rush and Freitag in order to render generated summaries or other natural language output for human users, particularly in application involving interactive summarization, accessibility or assistive interfaces. Displaying model output is a standard feature in modern machine learning applications and would involve only routine design choices

Regarding Claim 20
The proposed combination of Aggarwal, Rush, Freitag further discloses the apparatus of claim 1, except further comprising: a communication interface configured to transmit the third complete sentence to a recipient device. However, Tucker discloses this limitation (Tucker: Fig. 2 and Col 7 lines 30-40 disclose that the implementation may be triggered to run a skill in response to a third party calling the system via the internet (i.e., there is a recipient device)).
	Aggarwal, Rush and Freitag collectively teach generating natural language output using a generative inference model, but does not disclose displaying that output via a physical display. Tucker discloses an apparatus that has the ability to send results to a requesting recipient device, Tuckers system can send and receive requests in real time. It would have been obvious to one of ordinary skill in the art to combine Tucker’s system into the proposed system of Aggarwal, Rush and Freitag in order to transmit generated output to client devices or user-facing systems. This would be especially useful in distributed or client-server NLP system where the inference model runs on a backend or edge device, and the output is delivered elsewhere. This is a common application that requires not undue experimentation.

Regarding Claim 21:
The proposed combination of Aggarwal, Rush, Freitag further discloses the apparatus of claim 1, except wherein the apparatus includes at least one of a head-mounted display (HMD), a mobile handset, or a wireless communication device. However, Tucker discloses this limitation: (Tucker: Col 20 lines 60-67 wireless communication module)).
Aggarwal, Rush and Freitag collectively teach generating natural language output using a generative inference model, but does not disclose displaying that output via a physical display. Tucker discloses an apparatus contains wireless capabilities. It would have been obvious to one of ordinary skill in the art to combine Tucker’s system into the proposed system of Aggarwal, Rush and Freitag in with the use of a mobile handset or wireless headset as taught by Tucker, in order to enhance portability, accessibility, and real-time interaction. These are common platforms for natural language systems and their use involves only routine engineering implementation.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IAN SCOTT MCLEAN whose telephone number is (703)756-4599. The examiner can normally be reached "Monday - Friday 8:00-5:00 EST, off Every 2nd Friday".
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at (571) 272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/IAN SCOTT MCLEAN/Examiner, Art Unit 2654                                                  

/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654
Read full office action
Prosecution Timeline

Show 10 earlier events
Dec 02, 2025
Request for Continued Examination
Dec 17, 2025
Response after Non-Final Action
Jan 06, 2026
Non-Final Rejection mailed — §102, §103
Mar 20, 2026
Interview Requested
Apr 02, 2026
Response Filed
Apr 08, 2026
Examiner Interview Summary
Apr 08, 2026
Applicant Interview (Telephonic)
May 20, 2026
Final Rejection mailed — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/499,296
Patent 12609127
NEUTRALIZING DISTORTION IN AUDIO DATA
2y 5m to grant Granted Apr 21, 2026
18/245,802
Patent 12602553
SPEECH TRANSLATION METHOD, DEVICE, AND STORAGE MEDIUM
3y 0m to grant Granted Apr 14, 2026
17/952,401
Patent 12494199
VOICE INTERACTION METHOD AND ELECTRONIC DEVICE
3y 2m to grant Granted Dec 09, 2025
18/063,167
Patent 12443805
Systems and Methods for Multilingual Data Processing and Arrangement on a Multilingual User Interface
2y 10m to grant Granted Oct 14, 2025
17/559,283
Patent 12437144
Content Recommendation Method and User Terminal
3y 9m to grant Granted Oct 07, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
45%
Grant Probability
76%
With Interview (+31.4%)
3y 0m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 47 resolved cases by this examiner. Grant probability derived from career allowance rate.