Last updated: May 29, 2026
Application No. 18/590,222
FAITHFUL GENERATION OF OUTPUT TEXT FOR MULTIMODAL APPLICATIONS

Final Rejection §101§103
Filed
Feb 28, 2024
Priority
Sep 05, 2023 — provisional 63/580,654
Examiner
CASTILLO-TORRES, KEISHA Y
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Qualcomm Incorporated
OA Round
2 (Final)
Interview Optional

— +29.5% interview lift. Examiner has a relatively high allowance rate (74%); +29.5% interview lift. A written response may suffice.
Based on 110 resolved cases, 2023–2026
Examiner Intelligence

CASTILLO-TORRES, KEISHA Y View full profile →
Grants 74% — above average
Career Allowance Rate
82 granted / 110 resolved
+12.5% vs TC avg
Strong +30% interview lift
Without
With
+29.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
20 currently pending
Career history
142
Total Applications
across all art units
Statute-Specific Performance

§101
8.7%
-31.3% vs TC avg
§103
88.1%
+48.1% vs TC avg
§102
1.3%
-38.7% vs TC avg
§112
1.6%
-38.4% vs TC avg
Black line = Tech Center average estimate • Based on career data from 110 resolved cases
Office Action

§101 §103
DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on/of 02/05/2026. 
Claims 4 and 20 have been canceled by the Applicant.
Claim(s) 1-3 and 5-19 are pending and have been examined. Hence, this action has been made FINAL.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments and Amendments
Amendments to the claims by the Applicant have been considered and addressed below. 
With respect to the Double Patenting and 35 USC § 101 and 102/103 rejections, the Applicant provides several arguments in which the Examiner will respond accordingly, below.

Double Patenting rejection(s)
Arguments in page 6 of the Remarks filed on 02/05/2026.
Examiner’s Response to Arguments:
Applicant’s requests for the non-statutory double patenting rejection to be held in abeyance until the pending claims are in condition for allowance are acknowledged. 
Also, the Examiner refers the Applicant to the updated Double Patenting rejections, below.

35 USC § 101rejection(s)
Arguments in page 6-7 of the Remarks filed on 02/05/2026.
Examiner’s Response to Arguments:
Arguments have been considered but these are not persuasive. The Examiner respectfully disagrees with the arguments of “for purposes of expediting prosecution, and without conceding the propriety of the Office's rejections, independent claims 1 and 19 and are amended. Applicant submits that the claims, as amended, re directed to statutory subject matter.”
For more details on the Examiner’s rationale, please refer to updated 35 U.S.C. § 101 rejections, below.

35 USC § 102/103 rejection(s)
Arguments in page 7-10 of the Remarks filed on 02/05/2026.
Examiner’s Response to Arguments:
Applicant’s arguments with respect to claim(s) the independent claims 1 and 19 under 35 U.S.C. § 102 and/or 103 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Aggarwal et al. (US 20240119220 A1) and further in view of Rush et al. (Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015).). 
For more details, please refer to updated 35 U.S.C. § 103 rejections of claims 1-3 and 5-19, below.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/04/2025 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Specification
The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-3 and 5-19 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-3, 7, and 20 of copending Application No. 18/193,572 in view of Aggarwal et al. (US 20240119220 A1) and Rush et al. (Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015).). 
This is a provisional nonstatutory double patenting rejection.
The claims of the issued patent/copending application are similar in scope than that of the instant application. However, the claims of the copending Application No. 18/193,572 do not explicitly teach but, as will be mapped further below, Aggarwal et al. and Rush et al. do teach:
obtain intermediate data including a plurality of partial sentences associated with the input data, wherein the intermediate data comprises intermediate beams generated using beam search technique;
encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence;
re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data.
Please see below for pertinent mappings of the instant application in comparison to the issued patent.  
Table 1 shows the overall claim mapping comparing equivalence between claims from instant application and issued patent. 
Tables 2 shows the limitations of the independent claim 1 of the instant application when compared with issued patent, respectively, wherein the underlined portions indicate the main differences between instant application and issued patent. 
Table 1: Overall claim mapping comparing Instant Application and Issued Patent/Copending Application.
Instant Application
Issued Patent/Copeding Application
U.S. Application No. ##/###,### 
(US ########## ##)
1*, 19*
Any of 1*-2 and 22*
2
Has/have no equivalent
3
Has/have no equivalent
5
3
6
7
7
Has/have no equivalent
8
Has/have no equivalent
9
Has/have no equivalent
10
Has/have no equivalent
11
Has/have no equivalent
12
Has/have no equivalent
13
Has/have no equivalent
14
Has/have no equivalent
15
Has/have no equivalent
16
Has/have no equivalent
17
Has/have no equivalent
Note: * denotes an independent claim


Table 2: Independent claim mapping (comparing each of the limitations)

Instant Application


Copending Application
U.S. Application No. 18/193,572
Claim 1:
Claim 1:
1. An apparatus to generate output text from input data, comprising:

1. An apparatus for natural language processing, the apparatus comprising:

one or more memories configured to store the input data; and
at least one memory; and


one or more processors coupled to the one or more memories and configured to:

at least one processor coupled to the at least one memory, the at least one processor configured to:

encode the input data to generate encoded representations of the input data;

generate a sequence of tokens based on input content;

obtain intermediate data including a plurality of partial sentences associated with the input data, wherein the intermediate data comprises intermediate beams generated using beam search technique;




determine a confidence level associated with the sequence of tokens based on respective confidence levels associated with each token in the sequence of tokens;

generate, based on the intermediate data, at least one complete sentence associated with the input data;

generate a complete sentence that includes the sequence of tokens;


encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence;






generate a faithfulness score based on a comparison of the encoded representations of the input data and the at least one encoded representation of the at least one complete sentence; and

generate a natural language inference (NLI) score for the complete sentence based on faithfulness of the complete sentence to the input content; and


re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data.


adjust the confidence level for the sequence of tokens based on the NLI score for the complete sentence to generate an updated confidence level for the sequence of tokens.
*Note: Main differences between instant application and issued patent/application are underlined/strikethrough.



As to independent claim(s) 1 and 19, the claims of the of copending Application No. 18/193,572 do not explicitly teach but, Aggarwal et al. does teach:
obtain intermediate data including a plurality of partial sentences associated with the input data (see ¶ [0045, 0048, and 0095-96] citations as in limitation above. More specifically: “[0095] At operation 810, the system initializes an empty text P, where P will include the output modified text. P may be a data structure which is configured to contain representations of sentences, such as encodings. [0096] At operation 815, the system splits the complex text C and the simplified text S into sentences; e.g. C={C1, C2, . . . , Cn} and S={S1, S2, . . . , Sm}...”
[i.e., input data: input/complex text C [Wingdings font/0xE0] C={C1, C2, . . . , Cn}]);
encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence (see ¶ [0045, 0048, and 0095-96] citations as in limitation above. [i.e., complete sentence associated with input data: simplified text S [Wingdings font/0xE0] S={S1, S2, . . . , Sm}]);
re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data (see ¶ [0099-0101] citations as in limitations above. Here, the Examiner notes that the threshold determinations with respect to C, S, and P, where it is determined if P or S are output as the modified text involves arrangement/re-ranking.).
U.S. copending Application No. 18/193,572 in view of Aggarwal et al. (US 20240119220 A1) are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text summarization). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified U.S. copending Application No. 18/193,572 to incorporate the teachings of Aggarwal et al. of  obtain intermediate data including a plurality of partial sentences associated with the input data, encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence, re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data which provides the benefit of allows a user to consume content from a complex text with reduced difficulty, while preventing the user from receiving incorrect, inconsistent, or redundant information ([0018] of Aggarwal et al.).

However, U.S. copending Application No. 18/193,572 in view of Aggarwal et al. do not explicitly teach, but Rush et al. does teach:
wherein the intermediate data comprises intermediate beams generated using beam search technique (see ¶ 2 of 4. Generating Summaries: “A compromise between exact and greedy decoding is to use a beam-search decoder (Algorithm 1) which maintains the full vocabulary V while limiting itself to K potential hypotheses at each position of the summary. This has been the standard approach for neural MT models (Bahdanau et al., 2014; Sutskever et al., 2014; Luong et al., 2015). The beam-search algorithm is shown here, modified for the feed-forward model: (Algorithm 1: Beam Search)”, and ¶ 4 of 8. Results: “We also consider model and decoding ablations on the main summary model, shown in Table 3. These experiments compare to the BoW encoding models, compare beam search and greedy decoding, as well as restricting the system to be complete extractive. Of these features, the biggest impact is from using a more powerful encoder (attention versus BoW), as well as using beam search to generate summaries. The abstractive nature of the system helps, but for ROUGE even using pure extractive generation is effective”).
U.S. copending Application No. 18/193,572 in view of Aggarwal et al. (US 20240119220 A1) and Rush et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., text summarization). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified U.S. copending Application No. 18/193,572 in view of Aggarwal et al.  to incorporate the teachings of Rush et al of wherein the first type of input data and the second type of input data comprise two or wherein the intermediate data comprises intermediate beams generated using a beam search technique which provides the benefit of improving the grammaticality of the summaries in a data-driven way ([conclusion] of Rush et al).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim(s) 1-3 and 5-19 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. More specifically directed to the abstract idea grouping of: mental process and/or mathematical concept.
The independent claim(s) recite(s):
1. An apparatus to generate output text from input data, comprising:
one or more memories configured to store the input data; and
one or more processors coupled to the one or more memories and configured to:
encode the input data to generate encoded representations of the input data;
obtain intermediate data including a plurality of partial sentences associated with the input data, wherein the intermediate data comprises intermediate beams generated using beam search technique;
generate, based on the intermediate data, at least one complete sentence associated with the input data;
encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence;
generate a faithfulness score based on a comparison of the encoded representations of the input data and the at least one encoded representation of the at least one complete sentence; and
re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data.

19. A method of generating output text from input data, the method comprising:
[the limitations as in claim 1, above.]

This reads on a human (e.g., mentally and/or using pen and paper):
Using a set of predetermined rules to convert (i.e., encoding) received text or speech;
Determine/write down partial sentences from the text, using predetermined set of rules (e.g., mathematical concepts) to obtain the partial sentences;
Determine/write down a complete sentence;
Determine a score by comparing the received text and the complete sentence determined;
Determine another score for the partial sentences.

This judicial exception is not integrated into a practical application because for example: claim 1 recites “apparatus, memories, and processors”. As an example, in [0100 and 0118-0119] of the as filed specification, it is disclosed: “[0100] The computing device can include any suitable device, such as a mobile device (e.g., a mobile phone), a desktop computing device, a tablet computing device, an XR device (e.g., a VR headset, an AR headset, AR glasses, etc.), a wearable device (e.g., a network-connected watch or smartwatch, or other wearable device), a server computer, a vehicle (e.g., an autonomous vehicle) or computing device of the vehicle, a robotic device, a laptop computer, a smart television, a camera, and/or any other computing device with the resource capabilities to perform the processes described herein, including the process 1000 and/or any other process described herein. [0118] Example system 1200 includes at least one processing unit (CPU or processor) 1210 and connection 1205 that couples various system components including system memory 1215, such as read-only memory (ROM) 1220 and random access memory (RAM) 1225 to processor 1212. Computing system 1200 can include a cache 1211 of high-speed memory connected directly with, in close proximity to, or integrated as part of processor 1212.  [0119] Processor 1212 can include any general purpose processor and a hardware service or software service, such as services 1232, 1234, and 1236 stored in storage device 1230, configured to control processor 1212 as well as a special-purpose processor where software instructions are incorporated into the actual processor design. Processor 1212 may essentially be a completely self-contained computing system, containing multiple cores or processors, a bus, memory controller, cache, etc. A multi-core processor may be symmetric or asymmetric.” Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible. 
With respect to claim 2, the claim(s) recite:
2. The apparatus of claim 1, wherein the input data comprises at least one of audio data, text data, image data, or video data.

This reads on a human (e.g., mentally and/or using pen and paper):
Receiving text or images from another human or source (e.g., book).
No additional limitations are present. 	

With respect to claim 3, the claim(s) recite:
3. The apparatus of claim 2, wherein the input data comprises two or more of the audio data, the text data, the image data, and the video data.

This reads on a human (e.g., mentally and/or using pen and paper):
Receiving text and images from another human or source (e.g., book).
No additional limitations are present. 	

With respect to claim 5, the claim(s) recite:
5. The apparatus of claim 1, wherein the one or more processors is configured to generate the at least one complete sentence based on the intermediate data using a greedy search technique.

This reads on a human (e.g., mentally and/or using pen and paper):
Using predetermined set of rules (e.g., mathematical concepts) to obtain the complete sentences.
No additional limitations are present. 	

With respect to claim 6, the claim(s) recite:
6. The apparatus of claim 1, wherein the one or more processors is configured to re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score and a model confidence to generate the re-ranked data.

This reads on a human (e.g., mentally and/or using pen and paper):
Determine another score for the partial sentences.
No additional limitations are present. 	

With respect to claim 7, the claim(s) recite:
7. The apparatus of claim 6, wherein the one or more processors is configured to:
determine a beam score based on a probability of a next word in each of the plurality of partial sentences, the model confidence, and the faithfulness score;
determine a cumulative probability based on the beam score; and
re-rank the plurality of partial sentences of the intermediate data based on the cumulative probability.

This reads on a human (e.g., mentally and/or using pen and paper):
Determine scores for segments of the partial sentences (e.g., individual words);
Determining a cumulative (e.g., sum) probability and
Determine another score for the partial sentences based on the probability.
No additional limitations are present. 	

With respect to claim 8, the claim(s) recite:
8. The apparatus of claim 7, wherein the one or more processors is configured to determine the model confidence based on an entropy value and a kurtosis value.

This reads on a human (e.g., mentally and/or using pen and paper):
Using predetermined set of rules (e.g., mathematical/probabilistic concepts)
No additional limitations are present. 	

With respect to claim 9, the claim(s) recite:
9. The apparatus of claim 1, wherein the input data comprises video data, and wherein the one or more processors is configured to:
downsample a plurality of frames of the video data; and
fuse encoded representations of the plurality of frames of the video data to generate a fused representation of the video data, wherein the encoded representations of the input data include the fused representation of the video data.

This reads on a human (e.g., mentally and/or using pen and paper):
Receiving a sequence of images (e.g., video data) and adjusting the frequency or speed to examine individual images (e.g., frames)
Grouping the individual images (e.g., frames)
No additional limitations are present. 	

With respect to claim 10, the claim(s) recite:
10. The apparatus of claim 1, wherein:
the input data comprises at least a first type of input data and a second type of input data;
to encode the input data to generate the encoded representations of the input data, the one or more processors is configured to:
encode the first type of input data to generate an encoded representation of the first type of input data; and
encode the second type of input data to generate an encoded representation of the second type of input data; and
the one or more processors is further configured to generate, based on the encoded representation of the first type of input data and the encoded representation of the second type of input data, a combined representation of the first type of input data and the second type of input data.

This reads on a human (e.g., mentally and/or using pen and paper):
Performing the steps as discussed in claim 1 and 19, above.
No additional limitations are present. 	

With respect to claim 11, the claim(s) recite:
11. The apparatus of claim 10, wherein, to generate the combined representation of the first type of input data and the second type of input data, the one or more processors is configured to:
determine a weighted average of the encoded representation of the first type of input data and the encoded representation of the second type of input data.

This reads on a human (e.g., mentally and/or using pen and paper):
Determining a combination of data received and computing a weighted average using a predetermined set of rules (e.g., mathematical concept)
No additional limitations are present. 	

With respect to claim 12, the claim(s) recite:
12. The apparatus of claim 10, wherein the one or more processors is configured to normalize the combined representation of the first type of input data and the second type of input data.

This reads on a human (e.g., mentally and/or using pen and paper):
Using a predetermined set of rules (e.g., mathematical concept) to normalize the combination of received data.
No additional limitations are present. 	

With respect to claim 13, the claim(s) recite:
13. The apparatus of claim 10, wherein the first type of input data and the second type of input data comprise two or more of audio data, text data, image data, and video data.

This reads on a human (e.g., mentally and/or using pen and paper):
Receiving text and images from another human or source (e.g., book).
No additional limitations are present. 	

With respect to claim 14, the claim(s) recite:
14. The apparatus of claim 10, wherein the one or more processors is configured to generate the faithfulness score based on a comparison of the combined representation and the at least one encoded representation of the at least one complete sentence.

This reads on a human (e.g., mentally and/or using pen and paper):
Calculating a score using a predetermined set of rules (i.e., mathematical concept)
No additional limitations are present. 	

With respect to claim 15, the claim(s) recite:
15. The apparatus of claim 1, wherein the one or more processors is configured to:
generate, based on the re-ranked data, output text associated with the input data.

This reads on a human (e.g., mentally and/or using pen and paper):
Determining/writing down text associated with the received text.
No additional limitations are present. 	

With respect to claim 16, the claim(s) recite:
16. The apparatus of claim 1, further comprising at least one of an image sensor or a microphone configured to capture at least a part of the input data.

This reads on a human (e.g., mentally and/or using pen and paper):
Receiving text or speech.
This judicial exception is not integrated into a practical application because for example: claim 16 recites “an image sensor or microphone”. As an example, in [0100 and 0118-0119] of the as filed specification, it is disclosed: (see citations in explanation for claims 1 and 19, above). Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible. 
With respect to claim 17, the claim(s) recite:
17. The apparatus of claim 1, wherein the one or more processors is configured to generate the intermediate data using at least one neural network model.

This reads on a human (e.g., mentally and/or using pen and paper):
Generating partial sentences using a predetermined set of rules (i.e., model).
No additional limitations are present. 	
With respect to claim 18, the claim(s) recite:
18. The apparatus of claim 17, wherein the at least one neural network model includes a transformer neural network model.

This reads on a human (e.g., mentally and/or using pen and paper):
Generating partial sentences using a predetermined set of rules (i.e., model/transformer).
No additional limitations are present. 	

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


Claims 1-2, 5-6, 10, 14-15, and 17-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal et al. (US 20240119220 A1) and further in view of Rush et al. (Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015).). 

As to independent claim 1, Aggarwal et al. teaches:
1. An apparatus to generate output text from input data (see ¶ [0003]: “The present disclosure describes systems and methods for generating simplified text based on a complex text while reducing hallucinations. Embodiments include a text simplification apparatus configured to generate the simplified text and remove the hallucinations…”), comprising:
one or more memories configured to store the input data (see ¶ [0006]: “An apparatus, system, and method for text simplification of complex domain-specific text are described. One or more aspects of the apparatus, system, and method include a processor; a memory storing instructions executable by the processor;…”); and
one or more processors coupled to the one or more memories (see ¶ [0006] citation as in limitation above.) and configured to:
encode the input data to generate encoded representations of the input data (see ¶ [0045, 0048, and 0095-96]: “[0045] According to an embodiment, neural network 220 includes a bi-directional encoder representations from transformers (BERT) architecture. BERT is a transformer-based model that is used for natural language processing and for processing other forms of ordered data...  [0048] Text simplification component 225 may include a transformer network. A transformer or transformer network is a type of neural network model used for natural language processing tasks. A transformer network transforms one sequence into another sequence using an encoder and a decoder. An encoder and decoder include modules that can be stacked on top of each other multiple times. The modules comprise multi-head attention and feed forward layers. The inputs and outputs (target sentences) are first embedded into an n-dimensional space. Positional encoding of the different words (i.e., give every word/part in a sequence a relative position since the sequence depends on the order of its elements) are added to the embedded representation (n-dimensional vector) of each word. In some examples, a transformer network includes attention mechanism, where the attention looks at an input sequence and decides at each step which other parts of the sequence are important. Some examples of the transformer model are based on iterations of the transformer model such as GPT-2. In some cases, the transformer model is configured as an encoder-decoder model that receives data in sequence form and output data in sequence form, i.e. “seq2seq.” [0095] At operation 810, the system initializes an empty text P, where P will include the output modified text. P may be a data structure which is configured to contain representations of sentences, such as encodings. [0096] At operation 815, the system splits the complex text C and the simplified text S into sentences; e.g. C={C1, C2, . . . , Cn} and S={S1, S2, . . . , Sm}. In this example, there are n sentences in the complex text C and m sentences in the simplified text S. In many cases, the simplified text S comprises a greater number of sentences than the complex text S, as increasing the number of sentences typically increases readability. For example, fewer sentences may result in increased readability based on the metrics discussed in the description for FIG. 5.”
[i.e., input data: input/complex text C]);
obtain intermediate data including a plurality of partial sentences associated with the input data (see ¶ [0045, 0048, and 0095-96] citations as in limitation above. More specifically: “[0095] At operation 810, the system initializes an empty text P, where P will include the output modified text. P may be a data structure which is configured to contain representations of sentences, such as encodings. [0096] At operation 815, the system splits the complex text C and the simplified text S into sentences; e.g. C={C1, C2, . . . , Cn} and S={S1, S2, . . . , Sm}...”
[i.e., input data: input/complex text C [Wingdings font/0xE0] C={C1, C2, . . . , Cn}]);
generate, based on the intermediate data, at least one complete sentence associated with the input data (see ¶ [0045, 0048, and 0095-96] citations as in limitation above. [i.e., complete sentence associated with input data: simplified text S [Wingdings font/0xE0] S={S1, S2, . . . , Sm}]);
encode the at least one complete sentence to generate at least one encoded representation of the at least one complete sentence (see ¶ [0045, 0048, and 0095-96] citations as in limitation above. [i.e., complete sentence associated with input data: simplified text S [Wingdings font/0xE0] S={S1, S2, . . . , Sm}]);
generate a faithfulness score based on a comparison of the encoded representations of the input data and the at least one encoded representation of the at least one complete sentence (see ¶ [0099-0101]: “[0099] After all sentences from the simplified text are processed, (e.g., a list of plurality scores are determined for each sentence from the simplified text) P is fully constructed. P corresponds to the “new body of text” as described above with reference to FIGS. 4, 6, and 7. At operation 840, system computes a semantic similarity score SS and a hallucination score DH for 1) between C and S, and 2) between C and P. [0100] At operation 845, the system determines if SS and DH are greater than some threshold(s). In some cases, the system determines each score SS and DH against corresponding thresholds. For example, the hallucination score DH may represent a degree of hallucination, and in some embodiments, a lower score indicates a lower degree of hallucination. In this case, operation 845 may determine if DH is below a threshold (low hallucination) and if SS is above a threshold (high similarity) in order to “pass” P, and proceed to operation 855. If the scores between C and P are above the threshold(s) (i.e., P passes with a low degree of hallucination and a high semantic similarity), at operation 855, the system sets output text to modified text P. Otherwise, if P “fails,”, at operation 850, the system sets output text to simplified text S. This completes the pruning algorithm. [0101] At operation 860, the system presents output text through user interface. In some cases, the system additionally outputs metrics of the modified text including faithfulness, readability, and simplicity.”
[i.e., input data: input/complex text C [Wingdings font/0xE0] C={C1, C2, . . . , Cn} and complete sentence: simplified text S [Wingdings font/0xE0] S={S1, S2, . . . , Sm}]); and
re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score to generate re-ranked data (see ¶ [0099-0101] citations as in limitations above. Here, the Examiner notes that the threshold determinations with respect to C, S, and P, where it is determined if P or S are output as the modified text involves arrangement/re-ranking.)

However, Aggarwal et al. does not explicitly teach, but Rush et al. does teach:
wherein the intermediate data comprises intermediate beams generated using beam search technique (see ¶ 2 of 4. Generating Summaries: “A compromise between exact and greedy decoding is to use a beam-search decoder (Algorithm 1) which maintains the full vocabulary V while limiting itself to K potential hypotheses at each position of the summary. This has been the standard approach for neural MT models (Bahdanau et al., 2014; Sutskever et al., 2014; Luong et al., 2015). The beam-search algorithm is shown here, modified for the feed-forward model: (Algorithm 1: Beam Search)”, and ¶ 4 of 8. Results: “We also consider model and decoding ablations on the main summary model, shown in Table 3. These experiments compare to the BoW encoding models, compare beam search and greedy decoding, as well as restricting the system to be complete extractive. Of these features, the biggest impact is from using a more powerful encoder (attention versus BoW), as well as using beam search to generate summaries. The abstractive nature of the system helps, but for ROUGE even using pure extractive generation is effective”).
Aggarwal et al. and Rush et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., summarization). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al.  to incorporate the teachings of Rush et al of wherein the first type of input data and the second type of input data comprise two or wherein the intermediate data comprises intermediate beams generated using a beam search technique which provides the benefit of improving the grammaticality of the summaries in a data-driven way ([conclusion] of Rush et al).

As to independent claim 19, Aggarwal et al. teaches:
19. A method of generating output text from input data (see ¶ [0003]: “The present disclosure describes systems and methods for generating simplified text based on a complex text while reducing hallucinations. Embodiments include a text simplification apparatus configured to generate the simplified text and remove the hallucinations…”), the method comprising:
[the limitations taught by Aggarwal et al. in combination with Rush et al. as in claim 1, above.]

Regarding claim 2, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 1, above.
Aggarwal et al. further teaches:
2. The apparatus of claim 1, wherein the input data comprises at least one of audio data, text data, image data, or video data (see ¶ [0003] citation as in claim 1, above. More specifically: “The present disclosure describes systems and methods for generating simplified text based on a complex text while reducing hallucinations...”).

Regarding claim 5, Aggarwal et al. in combination with Rush et al. teach the limitations as in claim 1, above.
Rush et al. further teaches:
5. The apparatus of claim 1, wherein the one or more processors is configured to generate the at least one complete sentence based on the intermediate data using a greedy search technique (see ¶ 2 of 4. Generating Summaries: “A compromise between exact and greedy decoding is to use a beam-search decoder (Algorithm 1) which maintains the full vocabulary V while limiting itself to K potential hypotheses at each position of the summary. This has been the standard approach for neural MT models (Bahdanau et al., 2014; Sutskever et al., 2014; Luong et al., 2015). The beam-search algorithm is shown here, modified for the feed-forward model: (Algorithm 1: Beam Search)”, and ¶ 4 of 8. Results: “We also consider model and decoding ablations on the main summary model, shown in Table 3. These experiments compare to the BoW encoding models, compare beam search and greedy decoding, as well as restricting the system to be complete extractive. Of these features, the biggest impact is from using a more powerful encoder (attention versus BoW), as well as using beam search to generate summaries. The abstractive nature of the system helps, but for ROUGE even using pure extractive generation is effective”).
Aggarwal et al. and Rush et al are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al.  to incorporate the teachings of Rush et al of wherein the one or more processors is configured to generate the at least one complete sentence based on the intermediate data using a greedy search technique which provides the benefit of improving the grammaticality of the summaries in a data-driven way ([conclusion] of Rush et al).

Regarding claim 6, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 1, above.
Aggarwal et al. further teaches:
6. The apparatus of claim 1, wherein the one or more processors is configured to re-rank the plurality of partial sentences of the intermediate data based on the faithfulness score and a model confidence to generate the re-ranked data (see ¶ [0099]: “After all sentences from the simplified text are processed, (e.g., a list of plurality scores are determined for each sentence from the simplified text) P is fully constructed. P corresponds to the “new body of text” as described above with reference to FIGS. 4, 6, and 7. At operation 840, system computes a semantic similarity score SS and a hallucination score DH for 1) between C and S, and 2) between C and P.
Here, the Examiner notes that the threshold determinations with respect to C, S, and P, where it is determined if P or S are output as the modified text involves arrangement/re-ranking.).

Regarding claim 10, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 1, above.
Aggarwal et al. further teaches:
10. The apparatus of claim 1, wherein:
the input data comprises at least a first type of input data and a second type of input data (¶ [0003]: “The present disclosure describes systems and methods for generating simplified text based on a complex text while reducing hallucinations. Embodiments include a text simplification apparatus configured to generate the simplified text and remove the hallucinations…”);
to encode the input data to generate the encoded representations of the input data (see ¶ [0045, 0048, and 0095-96]: “[0045] According to an embodiment, neural network 220 includes a bi-directional encoder representations from transformers (BERT) architecture. BERT is a transformer-based model that is used for natural language processing and for processing other forms of ordered data...  [0048] Text simplification component 225 may include a transformer network. A transformer or transformer network is a type of neural network model used for natural language processing tasks. A transformer network transforms one sequence into another sequence using an encoder and a decoder. An encoder and decoder include modules that can be stacked on top of each other multiple times. The modules comprise multi-head attention and feed forward layers. The inputs and outputs (target sentences) are first embedded into an n-dimensional space. Positional encoding of the different words (i.e., give every word/part in a sequence a relative position since the sequence depends on the order of its elements) are added to the embedded representation (n-dimensional vector) of each word. In some examples, a transformer network includes attention mechanism, where the attention looks at an input sequence and decides at each step which other parts of the sequence are important. Some examples of the transformer model are based on iterations of the transformer model such as GPT-2. In some cases, the transformer model is configured as an encoder-decoder model that receives data in sequence form and output data in sequence form, i.e. “seq2seq.” [0095] At operation 810, the system initializes an empty text P, where P will include the output modified text. P may be a data structure which is configured to contain representations of sentences, such as encodings. [0096] At operation 815, the system splits the complex text C and the simplified text S into sentences; e.g. C={C1, C2, . . . , Cn} and S={S1, S2, . . . , Sm}. In this example, there are n sentences in the complex text C and m sentences in the simplified text S. In many cases, the simplified text S comprises a greater number of sentences than the complex text S, as increasing the number of sentences typically increases readability. For example, fewer sentences may result in increased readability based on the metrics discussed in the description for FIG. 5.”
[i.e., input data: input/complex text C]), the one or more processors is configured to:
encode the first type of input data to generate an encoded representation of the first type of input data (see ¶ [0045, 0048, and 0095-96] citations as in limitation above. Here, the Examiner notes that the first type of the input data and the second type of input data are read by Aggarwal’s input/complex text’s sentences (i.e., C1, C2, …, Cn).); and
encode the second type of input data to generate an encoded representation of the second type of input data (see ¶ [0045, 0048, and 0095-96] citations as in limitation above. Here, the Examiner notes that the first type of the input data and the second type of input data are read by Aggarwal’s input/complex text’s sentences (i.e., C1, C2, …, Cn).); and
the one or more processors is further configured to generate, based on the encoded representation of the first type of input data and the encoded representation of the second type of input data, a combined representation of the first type of input data and the second type of input data (see ¶ [0045, 0048, and 0095-96] citations as in limitation above. [i.e., C={C1, C2, . . . , Cn}]).

Regarding claim 14, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 10, above.
Aggarwal et al. further teaches:
14. The apparatus of claim 10, wherein the one or more processors is configured to generate the faithfulness score based on a comparison of the combined representation and the at least one encoded representation of the at least one complete sentence (see ¶ [0099-0101]: “[0099] After all sentences from the simplified text are processed, (e.g., a list of plurality scores are determined for each sentence from the simplified text) P is fully constructed. P corresponds to the “new body of text” as described above with reference to FIGS. 4, 6, and 7. At operation 840, system computes a semantic similarity score SS and a hallucination score DH for 1) between C and S, and 2) between C and P. [0100] At operation 845, the system determines if SS and DH are greater than some threshold(s). In some cases, the system determines each score SS and DH against corresponding thresholds. For example, the hallucination score DH may represent a degree of hallucination, and in some embodiments, a lower score indicates a lower degree of hallucination. In this case, operation 845 may determine if DH is below a threshold (low hallucination) and if SS is above a threshold (high similarity) in order to “pass” P, and proceed to operation 855. If the scores between C and P are above the threshold(s) (i.e., P passes with a low degree of hallucination and a high semantic similarity), at operation 855, the system sets output text to modified text P. Otherwise, if P “fails,”, at operation 850, the system sets output text to simplified text S. This completes the pruning algorithm. [0101] At operation 860, the system presents output text through user interface. In some cases, the system additionally outputs metrics of the modified text including faithfulness, readability, and simplicity.”
[i.e., input data: input/complex text C [Wingdings font/0xE0] C={C1, C2, . . . , Cn} and complete sentence: simplified text S [Wingdings font/0xE0] S={S1, S2, . . . , Sm}]).

Regarding claim 15, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 1, above.
Aggarwal et al. further teaches:
15. The apparatus of claim 1, wherein the one or more processors is configured to:
generate, based on the re-ranked data, output text associated with the input data (see ¶ [0099-0101] citations as in claims 1 and 10, above. Here, the Examiner notes that the threshold determinations with respect to C, S, and P, where it is determined if P or S are output as the modified text involves arrangement/re-ranking.).

Regarding claim 17, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 1, above.
Aggarwal et al. further teaches:
17. The apparatus of claim 1, wherein the one or more processors is configured to generate the intermediate data using at least one neural network model (see ¶ [0048]: “[0048] Text simplification component 225 may include a transformer network. A transformer or transformer network is a type of neural network model used for natural language processing tasks. A transformer network transforms one sequence into another sequence using an encoder and a decoder…”).

Regarding claim 18, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 17, above.
Aggarwal et al. further teaches:
18. The apparatus of claim 17, wherein the at least one neural network model includes a transformer neural network model (see ¶ [0048]: “[0048] Text simplification component 225 may include a transformer network. A transformer or transformer network is a type of neural network model used for natural language processing tasks. A transformer network transforms one sequence into another sequence using an encoder and a decoder…”).

Claims 3, 13, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal et al. (US 20240119220 A1) further in view of Rush et al. (Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015)) as applied to claims 1-2 and 10 above, and further in view of Mallya Kasaragod et al. (US 20230368074 A1). 

Regarding claim 3, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 2, above.
However, Aggarwal et al. in combination with Rush et al. does not explicitly teach, but Mallya Kasaragod et al. does teach:
3. The apparatus of claim 2, wherein the input data comprises two or more of the audio data, the text data, the image data, and the video data (see ¶ [0055]: “Depending on the requested inference task, the model serving engine 220 selects one or more models from the model datastore 285 that are configured to perform the desired inference task. The model serving engine 220 generates predictions by applying the selected models to the input data or features extracted from the input data. For example, the model serving engine 220 may select two models from the model datastore 285 that are each configured to perform summarization of text and generate predictions for the input text by applying each of the selected models to the encoded input text. The predictions output from the model may be a sequence of encodings that represent a summarized version of the input text. As another example, the model serving engine 220 may select a model from the model datastore 285 that is configured to receive a combination of image and text data and generate predictions by applying the selected model to the input data.” and ¶ [0078]: “…For example, the profile for a summarization task may include a set of attributes, including preferences related to grammar, adequacy, hallucination, consistency, active or passive voice, certain expressions that the user tends to use in text…”
[i.e., combination of text and image data]).
Aggarwal et al., Rush et al., and Mallya Kasaragod et al are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., summarization). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al. in combination with Rush et al. to incorporate the teachings of Mallya Kasaragod et al of wherein the input data comprises two or more of the audio data, the text data, the image data, and the video data which provides the benefit of faster access to additional context for improved model accuracy ([0073] of Mallya Kasaragod et al).
 
Regarding claim 13, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 10, above.
However, Aggarwal et al. in combination with Rush et al. does not explicitly teach, but Mallya Kasaragod et al. does teach:
13. The apparatus of claim 10, wherein the first type of input data and the second type of input data comprise two or more of audio data, text data, image data, and video data (see ¶ [0055 and 0078] citations as in claim 3, above. [i.e., combination of text and image data]).
Aggarwal et al., Rush et al., and Mallya Kasaragod et al are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., summarization). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al.  in combination with Rush et al. to incorporate the teachings of Mallya Kasaragod et al of wherein the first type of input data and the second type of input data comprise two or more of audio data, text data, image data, and video data which provides the benefit of faster access to additional context for improved model accuracy ([0073] of Mallya Kasaragod et al).

Regarding claim 16, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 1, above.
However, Aggarwal et al. in combination with Rush et al. does not explicitly teach, but Mallya Kasaragod et al. does teach:
16. The apparatus of claim 1, further comprising at least one of an image sensor or a microphone configured to capture at least a part of the input data (see ¶ [0028 and 0073]: “[0028] The client device 116 is a computing device capable of receiving user input as well as communicating via the network 150. While two example client devices 116A and 116B are illustrated in FIG. 1, in practice many client devices 116 may communicate with the systems in environment 100. In one embodiment, a client device 116 is a conventional computer system, such as a desktop or laptop computer. Alternatively, a client device 116 may be a device having computer functionality, such as a personal digital assistant (PDA), a mobile telephone, a smartphone or another suitable device. A client device 116 is configured to communicate via the network 150. [0073] The knowledge bank engine 255 provides access to pre-packaged domain specific data stored locally in the form of raw text documents or embedding vectors. This domain specific data may serve two main purposes. First, it provides faster access to additional context for improved model accuracy and second, to enforce privacy and security as the data is held secure within the perimeters of the AI Container 120. Typically, documents such as internal documentation, company policies, customer analytics, and transactional data may be included in the knowledge bank engine 255. The knowledge bank engine 255 takes in as input a text (or image, video, other data modality) query from the query engine 260 and outputs a KnowledgeCollection which is a JSON collection of embeddings or text documents to be used as additional context by the model. For example, when running a summarization report generation task, additional context in the form of company policy document could be used to provide a pertinent summary that adheres to the acceptable policies and rules of the company.”).
Aggarwal et al., Rush et al., and Mallya Kasaragod et al are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing (e.g., summarization). Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al.  in combination with Rush et al. to incorporate the teachings of Mallya Kasaragod et al of further comprising at least one of an image sensor or a microphone configured to capture at least a part of the input data which provides the benefit of faster access to additional context for improved model accuracy ([0073] of Mallya Kasaragod et al).

Claim 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal et al. (US 20240119220 A1) and further in view of Rush et al. (Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015)) as applied to claim 6 above, and further in view of Juergen et al. (US 20100076761 A1). 

Regarding claim 7, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 6, above.
However, Aggarwal et al. in combination with Rush et al. does not explicitly teach, but Juergen et al. does teach:
7. The apparatus of claim 6, wherein the one or more processors is configured to:
determine a beam score based on a probability of a next word in each of the plurality of partial sentences, the model confidence, and the faithfulness score (see ¶ [0026]: “To understand the operation of this embodiment of the present invention, consider as an example the use of a trigram language model. During Viterbi beam search, the LVCSR decoder expands hypothesized partial sequences of words w.sub.1 . . . w.sub.i-1 by likely following words w.sub.i. To do that, it queries the language model for the probability p(w.sub.i|w.sub.i-1,w.sub.i-2) for each word w.sub.1 preceded by the 2-word history w.sub.1-1,w.sub.i-2. For every such hypothesized word, the decoder also queries the acoustic model for the likelihood of the word w.sub.i given the acoustic speech signal, and combines it with the language model probability to produce a total word score. To predict non-verbalized punctuations in the absence of acoustic evidence, the decoder needs the ability to hypothesize tokens without consuming input frames.”);
determine a cumulative probability based on the beam score (see ¶ [0026] citation as in limitation above: “…combines it with the language model probability to produce a total word score…”); and
re-rank the plurality of partial sentences of the intermediate data based on the cumulative probability (see ¶ [0026] citation as in limitation above: “…For every such hypothesized word, the decoder also queries the acoustic model for the likelihood of the word w.sub.i given the acoustic speech signal, and combines it with the language model probability to produce a total word score.…”).
Aggarwal et al., Rush et al., and Juergen et al are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al.  in combination with Rush et al. to incorporate the teachings of Juergen et al of determine a beam score based on a probability of a next word in each of the plurality of partial sentences, the model confidence, and the faithfulness score; determine a cumulative probability based on the beam score ; and re-rank the plurality of partial sentences of the intermediate data based on the cumulative probability which provides the benefit of  improved punctuation prediction accuracy, while reducing system complexity and memory requirements compared to prior art approaches([0019] of Juergen et al).

Claim 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal et al. (US 20240119220 A1)  and further in view of Rush et al. (Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015)) and Juergen et al. (US 20100076761 A1) as applied to claims 7 above, and further in view of Potamianos et al. (US 12374326 B1). 

Regarding claim 8, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 7, above.
However, Aggarwal et al. in combination with Rush et al. in combination with Juergen et al. do not explicitly teach, but Potamianos et al. does teach:
8. The apparatus of claim 7, wherein the one or more processors is configured to determine the model confidence based on an entropy value and a kurtosis value (see ¶ Col. 26, lines 34-46: “(136) The ASR output data 147 may include other ASR output related data such as other features from the ASR component 145 or data determined by another component. For example, the system 100 may determine an entropy of the ASR output data 147 (for example a trellis entropy or the like) that indicates how spread apart the probability mass of the trellis is among the alternate hypotheses. A large entropy (e.g., large spread of probability mass over many hypotheses) may indicate the ASR component 145 being less confident about its best hypothesis, which in turn may correlate to detected speech not being system-directed. In some embodiments, the entropy may be a feature included in other data input to the verification component 150.”).
Aggarwal et al., Rush et al., Juergen et al., and Potamianos et al are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al. in combination with Rush et al. and Juergen et al. to incorporate the teachings of Potamianos et al of wherein the one or more processors is configured to determine the model confidence based on an entropy value and a kurtosis value which provides the benefit of providing an improved user experience([Col. 5, line 25] of Potamianos et al).

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal et al. (US 20240119220 A1) and further in view of Rush et al. (Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015)) as applied to claims 1 above, and further in view of Jiang et al. (US 20210090217 A1). 
Regarding claim 9, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 1, above.
However, Aggarwal et al. in combination with Rush et al. does not explicitly teach, but Jiang et al. does teach:
9. The apparatus of claim 1, wherein the input data comprises video data (see ¶ [0006]: “One aspect of the present disclosure includes a video super resolution (SR) method based on video coding for machine (VCM) is provided for an electronic device. The method includes obtaining a lower resolution (LR) video;…”), and wherein the one or more processors is configured to:
downsample a plurality of frames of the video data (see ¶ [0054]: “The reconstruction module 418 may be configured to use the fused feature representation and a decoded down-sampled LR video to generate an HR video. That is, the reconstruction module 418 may perform a reconstruction process to generate the corresponding HR frames 432 based on the fused feature representations 428 and the decoded LR video frames 426.”); and
fuse encoded representations of the plurality of frames of the video data to generate a fused representation of the video data, wherein the encoded representations of the input data include the fused representation of the video data (see ¶ [0054] citation as in limitation above: “…That is, the reconstruction module 418 may perform a reconstruction process to generate the corresponding HR frames 432 based on the fused feature representations 428 and the decoded LR video frames 426.”).
Aggarwal et al., Rush et al., and Jiang et al are considered to be analogous to the claimed invention because they are in the same field of endeavor in data processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al.  in combination with Rush et al. to incorporate the teachings of Jiang et al of wherein the input data comprises video data, and wherein the one or more processors is configured to: downsample a plurality of frames of the video data; and fuse encoded representations of the plurality of frames of the video data to generate a fused representation of the video data, wherein the encoded representations of the input data include the fused representation of the video data which provides the benefit of improving efficiency ([0083] of Jiang et al).

Claim 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal et al. (US 20240119220 A1) and further in view of Rush et al. (Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015)) as applied to claim 10 above, and further in view of Lauber (US 20220261545 A1). 

Regarding claim 11, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 10, above.
Aggarwal et al. further teaches :
11. The apparatus of claim 10, wherein, to generate the combined representation of the first type of input data and the second type of input data, the one or more processors (see ¶ [0099-0101]: “[0099] After all sentences from the simplified text are processed, (e.g., a list of plurality scores are determined for each sentence from the simplified text) P is fully constructed. P corresponds to the “new body of text” as described above with reference to FIGS. 4, 6, and 7. At operation 840, system computes a semantic similarity score SS and a hallucination score DH for 1) between C and S, and 2) between C and P. [0100] At operation 845, the system determines if SS and DH are greater than some threshold(s). In some cases, the system determines each score SS and DH against corresponding thresholds. For example, the hallucination score DH may represent a degree of hallucination, and in some embodiments, a lower score indicates a lower degree of hallucination. In this case, operation 845 may determine if DH is below a threshold (low hallucination) and if SS is above a threshold (high similarity) in order to “pass” P, and proceed to operation 855. If the scores between C and P are above the threshold(s) (i.e., P passes with a low degree of hallucination and a high semantic similarity), at operation 855, the system sets output text to modified text P. Otherwise, if P “fails,”, at operation 850, the system sets output text to simplified text S. This completes the pruning algorithm. [0101] At operation 860, the system presents output text through user interface. In some cases, the system additionally outputs metrics of the modified text including faithfulness, readability, and simplicity.”
[i.e., input data: input/complex text C [Wingdings font/0xE0] C={C1, C2, . . . , Cn} and complete sentence: simplified text S [Wingdings font/0xE0] S={S1, S2, . . . , Sm}]) is configured to:

However, Aggarwal et al. in combination with Rush et al. does not explicitly teach, but Jiang et al. does teach:
determine a weighted average of the encoded representation of the first type of input data and the encoded representation of the second type of input data (see ¶ [0016 and 0045]: “[0016] …Embodiments may use a ranking process such as a personalized TextRank algorithm to score the importance of a sentence and thereby identify the most salient ones to be used for the purpose of extractive text summarization. [0045] In operation 430, for a particular document, a preliminary or initial document embedding may be created based on embeddings associated with each phrase (e.g. in some cases phrases as created in operation 410) contained in the document. For example, a preliminary document embedding can be calculated as the weighted average of a document's phrase embedding sequence, using or based on a combination of SIF and its position in the sequence, as in example equations Eq. 1A and Eq. 1B herein. Each phrase in the document may be ordered in the sequence that it appears in the document, e.g. first phrase (sequence or index 1), second phrase (sequence or index 2) etc. ”

    PNG
    media_image1.png
    114
    640
    media_image1.png
    Greyscale
).
Aggarwal et al., Rush et al., and Lauber are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al.  in combination with Rush et al. to incorporate the teachings of Lauber of determine a weighted average of the encoded representation of the first type of input data and the encoded representation of the second type of input data which provides the benefit of improving embedding and solving problems such as noise([0057] of Lauber).

Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aggarwal et al. (US 20240119220 A1) and further in view of Rush et al. (Rush, Alexander M., Sumit Chopra, and Jason Weston. "A neural attention model for abstractive sentence summarization." arXiv preprint arXiv:1509.00685 (2015)) as applied to claim 10 above, and further in view of Zelenkov (US 20170228369 A1). 

Regarding claim 12, Aggarwal et al. in combination with Rush et al. teaches the limitations as in claim 10, above.
However, Aggarwal et al. in combination with Rush et al. does not explicitly teach, but Zelenkov does teach:
12. The apparatus of claim 10, wherein the one or more processors is configured to normalize the combined representation of the first type of input data and the second type of input data (see ¶ [0088 and 0097]: “[0088] Just as an example, a first given concept phrase 312 can be “information search systems” and a second given concept phrase 312 can be “system for information searching”. Using the various techniques described above, the parsing module 204 normalizes the first given concept phrase 312 to “system information search” and the second given concept phrase 312 to “system information search”. [0097] Just as an example, a first given concept phrase 312 can be “information search systems” and a second given concept phrase 312 can be “systems for information searching”. The CIR values of both the first given concept phrase 312 relative to the second given concept phrase 312 and the second given concept phrase 312 to the first given concept phrase is 1.00 (one point zero zero), calculated as 3 (“system”, “information” and “search”) divided by three (“system information search”, when normalized and re-arranged).”).
Aggarwal et al., Rush et al., and Zelenkov are considered to be analogous to the claimed invention because they are in the same field of endeavor in natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Aggarwal et al.  in combination with Rush et al. to incorporate the teachings of Zelenkov of wherein the one or more processors is configured to normalize the combined representation of the first type of input data and the second type of input data which provides the benefit of improving the grammaticality of the summaries in a data-driven way ([conclusion] of Zelenkov).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Keisha Y. Castillo-Torres
Examiner
Art Unit 2659



/Keisha Y. Castillo-Torres/Examiner, Art Unit 2659  

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Feb 28, 2024
Application Filed
Nov 05, 2025
Non-Final Rejection mailed — §101, §103
Feb 05, 2026
Response Filed
Apr 22, 2026
Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/467,236
Patent 12627724
SYSTEMS AND METHODS FOR ARTIFICIAL DUBBING
4y 8m to grant Granted May 12, 2026
17/865,788
Patent 12620410
ALIGNING PARAMETER DATA WITH AUDIO RECORDINGS
3y 9m to grant Granted May 05, 2026
18/441,704
Patent 12608546
PROCESSING EVENT DATA AND/OR TABULAR DATA FOR INPUT TO ONE OR MORE MACHINE LEARNING MODELS
2y 2m to grant Granted Apr 21, 2026
17/710,137
Patent 12573402
GENERATING AND/OR UTILIZING UNINTENTIONAL MEMORIZATION MEASURE(S) FOR AUTOMATIC SPEECH RECOGNITION MODEL(S)
3y 11m to grant Granted Mar 10, 2026
18/187,330
Patent 12536989
Language-agnostic Multilingual Modeling Using Effective Script Normalization
2y 10m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+29.5%)
2y 10m (~7m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 110 resolved cases by this examiner. Grant probability derived from career allowance rate.