DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Remarks
This action is in response to the amendments received on 1/20/26. Claims 1-30 are pending in the application. Applicants' arguments have been carefully and respectfully considered.
Claims 1-9, 16-20, and 24-27 are provisionally rejected on the ground of nonstatutory double patenting.
Claim(s) 1-4, 10-18, 21-26, and 28-30 are rejected under 35 U.S.C. 103 as being unpatentable over Pathak et al. (US 2024/0394479), and further in view of Wan et al. (US 2025/0094538).
Claim(s) 5-9, 19, 20, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Pathak in view of Wan, and further in view of Reza et al. (US 2023/0237277).
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
19/027199
19/027469
A non-transitory computer-readable medium comprising computer-readable instructions stored thereon that when executed by a processor cause the processor to:
receive a set of documents for generating a prompt to be input into a language model having a context window with a token limit from which to generate a topic label and a topic description for a topic,
wherein the topic label comprises a name for the topic and the topic description comprises a description of the topic in a human-understandable format;
A non-transitory computer-readable medium comprising computer-readable instructions stored thereon that when executed by a processor cause the processor to:
receive a set of documents for generating a prompt to be input into a language model having a context window with a token limit from which to generate a topic label and a topic description for a topic,
wherein the topic label comprises a name for the topic and the topic description comprises a description of the topic in a human-understandable format;
input the set of documents into an unsupervised machine learning model;
input the set of documents into an unsupervised machine learning model;
execute the unsupervised machine learning model to output a plurality of topics for the set of the documents, each of the plurality of topics comprising a plurality of topic terms and each of the plurality of topic terms associated with a first weight value;
execute the unsupervised machine learning model to output the topic for the set of the documents, the topic comprising a plurality of topic terms;
select a first subset of topic terms for each topic of the plurality of topics, wherein the first subset of topic terms for each topic are selected from the plurality of topic terms of that topic based on the first weight value assigned to each of the plurality of topic terms of that topic;
select a subset of topic documents from the set of documents, wherein the subset of topic documents belong to the topic and are selected based on the plurality of topic terms;
input the subset of topic documents into an information extraction model;
execute the information extraction model to generate a plurality of snippets from the subset of topic documents for the topic;
compute an inverse document frequency weight value for each topic term in the first subset of topic terms of each topic;
compute a second weight value for each topic term in the first subset of topic terms based on the first weight value and the inverse document frequency weight value for that topic term;
4. The non-transitory computer-readable medium of claim 1, wherein to generate the compressed representation of the set of documents based on the plurality of snippets, the computer-readable instructions further cause the processor to: rank the plurality of snippets based on a frequency of occurrence of each snippet of the plurality of snippets in the set of topic documents, wherein the higher the frequency of occurrence, the higher the rank of the snippet; and …
select a second subset of topic terms for each topic from the first subset of topic terms, wherein the second subset of topic terms are selected based on the second weight value of each topic term in the first subset of topic terms;
4 continued. … select a predetermined number of highest ranked snippets of the plurality of snippets to obtain a subset of snippets.
generate a compressed representation of the set of documents from the second subset of topic terms of each topic to include in a prompt for each topic, wherein the compressed representation having a first number of tokens to be stored in a computer memory that is less than a second number of tokens in the plurality of topic terms;
generate a compressed representation of the set of documents based on the plurality of snippets to include in a prompt, wherein the compressed representation having a first number of tokens to be stored in a computer memory that is less than a second number of tokens in the plurality of topic terms,
wherein: the compressed representation reduces a token count of the prompt to fit within the context window of the language model;
the compressed representation is generated based on (i) text segments extracted by an information extraction model
wherein: the compressed representation reduces a token count of the prompt to fit within the context window of the language model;
the compressed representation is generated based on (i) text segments extracted by an information extraction model
the compressed representation is generated based on (i) text segments extracted by an information extraction model and (ii) topic terms and associated weight values output by the unsupervised machine learning model;
the compressed representation is generated based on (i) text segments extracted by an information extraction model and (ii) topic terms and associated weight values output by the unsupervised machine learning model;
generating the compressed representation comprises selecting, ordering, and concatenating topic terms based on the weight values to form a machine-consumable prompt
generating the compressed representation comprises selecting, ordering, and concatenating topic terms based on the weight values to form a machine-consumable prompt
the machine-consumable prompt has a token count that fits within a context window token limit of the language model;
the machine-consumable prompt has a token count that fits within a context window token limit of the language model;
input the machine-consumable prompt of each topic into a language model; and
input the machine-consumable prompt of the topic into a language model; and
generate the topic label and topic description for each topic of the plurality of topics by executing the language model based on the input of the machine-consumable prompt.
generate the topic label and topic description for the topic by executing the language model based on the input of the machine-consumable prompt.
Claims 1-9, 16-20, and 24-27 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-9, 12-20, and 23-29 of copending Application No. 19/027199 in view of Wan et al. (US 2025/0094538). Copending application does not claim “input the subset of topic documents into an information extraction model; execute the information extraction model to generate a plurality of snippets from the subset of topic documents for the topic.”
Wan teaches input the subset of topic documents into an information extraction model (Wan, pa 0052, the input 204 includes the raw natural language text of multiple datasets (e.g., multiple conversations in a chat thread) and a summarization instruction, where each dataset is represented by (Xi) & pa 0038-0039, A "natural language summary" as described herein refers to text summarization. Text summarization ( or automatic summarization or NLP text summarization) is the process of breaking down text (e.g., several paragraphs) into smaller text (e.g., one sentence or paragraph). … This method extracts vital information while also preserving the meaning of the text. … the summarization may be, "I'm heading to the store to buy fruit ," where "store" is a new word input into the new sentence ( e.g., based on NLP semantic analysis and/or NER) and 'Tm going" is removed from the original sentence. NER is an information extraction technique that identifies and classifies tokens/words or "entities" in natural language text into predefined categories);
execute the information extraction model to generate a plurality of snippets from the subset of topic documents for the topic (Wan, 0052, the language model encoder(s)/decoder(s) 206 generates a text summary-i.e., the "summary of each dataset" as illustrated in the output 208, which is represented by (fi).).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have included the teachings of Wan because it provides a summary of each dataset (Wan, pa 0052).
This is a provisional nonstatutory double patenting rejection.
19/027469
19/028561
A non-transitory computer-readable medium comprising computer-readable instructions stored thereon that when executed by a processor cause the processor to:
receive a set of documents for generating a prompt to be input into a language model having a context window with a token limit from which to generate a topic label and a topic description for a topic,
wherein the topic label comprises a name for the topic and the topic description comprises a description of the topic in a human-understandable format;
A non-transitory computer-readable medium comprising computer-readable instructions stored thereon that when executed by a processor cause the processor to:
receive a set of documents for generating a prompt to be input into a language model having a context window with a token limit from which to generate a topic label and a topic description for a topic;
input the set of documents into an unsupervised machine learning model;
input the set of documents into an unsupervised machine learning model;
execute the unsupervised machine learning model to output the topic for the set of the documents, the topic comprising a plurality of topic terms;
execute the unsupervised machine learning model to output the topic for the set of the documents, the topic comprising a plurality of topic terms;
select a subset of topic documents from the set of documents, wherein the subset of topic documents belong to the topic and are selected based on the plurality of topic terms;
select a first subset of topic documents from the set of documents that belong to the topic;
input the subset of topic documents into an information extraction model;
execute the information extraction model to generate a plurality of snippets from the subset of topic documents for the topic;
select a second subset of topic documents from the first subset of topic documents based on the plurality of topic terms;
identify a title from each of the second subset of topic documents to obtain a plurality of titles;
generate a compressed representation of the set of documents based on the plurality of snippets to include in a prompt, wherein the compressed representation having a first number of tokens to be stored in a computer memory that is less than a second number of tokens in the plurality of topic terms,
generate a compressed representation of the set of documents based on the plurality of titles to include in a prompt, wherein the compressed representation having a first number of tokens stored in a computer memory that is less than a second number of tokens in the plurality of topic terms,
wherein: the compressed representation reduces a token count of the prompt to fit within the context window of the language model;
wherein: the compressed representation reduces a token count of the prompt to fit within the context window of the language model;
the compressed representation is generated based on (i) text segments extracted by an information extraction model and (ii) topic terms and associated weight values output by the unsupervised machine learning model;
the compressed representation is generated based on (i) text segments extracted by an information extraction model and (ii) topic terms and associated weight values output by the unsupervised machine learning model;
generating the compressed representation comprises selecting, ordering, and concatenating topic terms based on the weight values to form a machine-consumable prompt;
generating the compressed representation comprises selecting, ordering, and concatenating topic terms based on the weight values to form a machine-consumable prompt;
the machine-consumable prompt has a token count that fits within a context window token limit of the language model;
the machine-consumable prompt has a token count that fits within a context window token limit of the language model;
input the machine-consumable prompt of each topic into the language model that is distinct from the unsupervised machine learning model; and
input the machine-consumable prompt of each topic into the language model that is distinct from the unsupervised machine learning model; and
generate the topic label and topic description for the topic by executing the language model based on the input of the machine-consumable prompt.
generate the topic label and topic description for the topic by executing execute the language model based on the input of the machine-consumable prompt.
Claims 1-9, 16-20, and 24-27 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1-8, 16-19, and 26-29 of copending Application No. 19/028561 in view of Wan et al. (US 2025/0094538). Copending application does not claim “input the subset of topic documents into an information extraction model; execute the information extraction model to generate a plurality of snippets from the subset of topic documents for the topic.”
Wan teaches input the subset of topic documents into an information extraction model (Wan, pa 0052, the input 204 includes the raw natural language text of multiple datasets (e.g., multiple conversations in a chat thread) and a summarization instruction, where each dataset is represented by (Xi) & pa 0038-0039, A "natural language summary" as described herein refers to text summarization. Text summarization ( or automatic summarization or NLP text summarization) is the process of breaking down text (e.g., several paragraphs) into smaller text (e.g., one sentence or paragraph). … This method extracts vital information while also preserving the meaning of the text. … the summarization may be, "I'm heading to the store to buy fruit ," where "store" is a new word input into the new sentence ( e.g., based on NLP semantic analysis and/or NER) and 'Tm going" is removed from the original sentence. NER is an information extraction technique that identifies and classifies tokens/words or "entities" in natural language text into predefined categories);
execute the information extraction model to generate a plurality of snippets from the subset of topic documents for the topic (Wan, 0052, the language model encoder(s)/decoder(s) 206 generates a text summary-i.e., the "summary of each dataset" as illustrated in the output 208, which is represented by (fi).).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have included the teachings of Wan because it provides a summary of each dataset (Wan, pa 0052).
This is a provisional nonstatutory double patenting rejection.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1-4, 10-18, 24-26, and 28-30 are rejected under 35 U.S.C. 103 as being unpatentable over Pathak et al. (US 2024/0394479), and further in view of Wan et al. (US 2025/0094538).
With respect to claim 1, Pathak teaches a non-transitory computer-readable medium comprising computer-readable instructions stored thereon that when executed by a processor cause the processor to:
receive a set of documents for generating a prompt to be input into a language model (Pathak, pa 0069, The application system 108 includes functionality that enables a user to interact with an online resource that links related information items in a graph. In a particular dialogue turn, the user submits an input query that incorporates information pulled from the graph. Or the knowledge-supplementing component 136 extracts information from the graph) having a context window with a token limit (Pathak, pa 0053, the dialogue system 104 can adapt the way it constructs the prompt information 124 … the execution platform that runs the language model 106. & pa 0078, the resource availability-assessing component 606 receives an input signal that indicates the current processing capacity of the application system 108 that uses the dialogue system 104. The resource availability assessing component 606 uses a rules-based system and/or a machine-trained model and/or other functionality to map these factors into a complexity level.);
input the set of documents into an unsupervised machine learning model (Pathak, Fig. 9 topic modeling component 910 & pa 0090, the compression component 138 compresses the content in the candidate context information 902, including the dialogue history and/or the knowledge information retrieved by the knowledge-supplementing component 136 & pa 0092, once the compression component 138 is invoked, the compression-managing component 914 invokes all of the individual compression components (906, 908, 910, 912), which can then operate in parallel. & pa 0097, The topic-modeling component 910 can likewise uses various rules-based logic and/or machine-trained models to extract topics associated with the source information 904, including Latent Dirichlet Allocation (LDA));
execute the unsupervised machine learning model to output the topic for the set of documents, the topic comprising a plurality of topic terms (Pathak, pa 0097, The topic-modeling component 910 can likewise uses various rules-based logic and/or machine-trained models to extract topics associated with the source information 904, including Latent Dirichlet Allocation (LDA));
select a subset of topic documents from the set of documents, wherein the subset of topic documents belong to the topic and are selected based on the plurality of topic terms (Pathak, pa 0098, the compression component 138 also weights the relevance of selected terms (keywords, named entities, topics, etc.) based on one or more weighting factors, and uses those weights factors in determining which terms are to be included in the prompt information 124…. By favorably weighting a selected term, the compression component 138 promotes this term over other terms that are not similarly weighted, and increases the likelihood that the selected term will be included in the top K information items);
input the subset of topic documents into an information extraction model (Pathak, Fig. 9, NER-Extracting component 908 & pa 0092, A compression-managing component 914 uses rules-based logic and/or a machine-trained logic and/or other functionality to determine when to invoke the individual compression components (906, 908, 910, and 912).);
execute the information extraction model to generate a plurality of snippets from the subset of topic documents for the topic (Pathak, Fig. 9, pa 0096, Likewise, the NER-extracting component 908 uses any rules-based logic (e.g., any algorithm) and/or machine trained model to identify named entities associated with the source information 904.)
generate a compressed representation … to include in a prompt for each topic (Pathak, Fig. 1 & pa 0045, The dynamic prompt-generating component 128 also assembles information provided by the separate analysis components 130 into the prompt information 124. & pa 0098, By favorably weighting a selected term, the compression component 138 promotes this term over other terms that are not similarly weighted, and increases the likelihood that the selected term will be included in the top K information items), wherein the compressed representation having a first number of tokens to be stored in a computer memory that is less than a second number of tokens in the plurality of topic terms (Pathak, pa 0054, the dialogue system 104 compresses source information from which the prompt information 124 is constructed, e.g., by picking salient terms from the source information and/or removing redundant information from the source information. & pa 0059, the compression component 138 uses topic analysis to identify one or more topics that are pertinent to the source information. The compression component 138 has the effect of compressing the source information by using selected terms to describe it. & pa 0091, The compression component 138 uses different components and associated techniques to perform different types of compression. Generally, each of the techniques provides a reduced-sized representation of the source information that preserves at least some semantic content of the source information in an original form. The reduced-sized representation of the source information is included in the prompt information 124 in lieu of the source information in its original form.) wherein:
the compressed representation reduces a token count of the prompt to fit within the context window of the language model (Pathak, pa 0073, the content unit amount-assessing component 608 determines, based on the assessed complexity level, a maximum number of content units to include in the prompt information 124 for the current dialogue turn & pa 0091, Generally, each of the techniques provides a reduced-sized representation of the source information that preserves at least some semantic content of the source information in an original form.);
the compressed representation is generated based on (i) text segments extracted by an information extraction model (Pathak, pa 0094, The keyword-extracting component 906 uses any rules-based logic (e.g., any algorithm) or machine-trained model to detect prominent keywords or named entities associated with the source information 904.) and (ii) topic terms and associated weight values output by the unsupervised machine learning model (Pathak, pa 0098, the compression component 138 also weights the relevance of selected terms (keywords, named entities, topics, etc.) based on one or more weighting factors, and uses those weights factors in determining which terms are to be included in the prompt information 124.);
generating the compressed representation comprises selecting, ordering, and concatenating topic terms based on the weight values to form a machine-consumable prompt (Pathak, pa 0098, the compression component 138 selects the K top-ranked terms. By favorably weighting a selected term, the compression component 138 promotes this term over other terms that are not similarly weighted, and increases the likelihood that the selected term will be included in the top K information items.);
the machine-consumable prompt has a token count that fits within a context window token limit of the language model (Pathak, pa 0002, an application typically limits the size of the prompt that can be input to the language model. & pa 0073, the content unit amount-assessing component 608 determines, based on the assessed complexity level, a maximum number of content units to include in the prompt information 124 for the current dialogue turn);
input the machine-consumable prompt of the topic into the language model that is distinct from the unsupervised machine learning model (Pathak, pa 0116, The language model 1402 commences with the receipt of the model-input information, e.g., corresponding to the prompt information 124. The model-input information is expressed as a series of linguistic tokens 1406 & Fig. 16, pa 0129, In block 1608, the dialogue system submits the prompt information to the machine trained language model, and receives a response (e.g., the response 126) from the machine-trained language model based on the prompt information.) & Fig. 1, compression component (having topic model as shown in Fig. 9) and separate language model 106 & Fig. 9, compression managing component 914 (compression component 138) including topic modeling component 910 creating compressed source information & ); and
generate the topic label … the topic by executing the language model based on the input of the machine-consumable prompt (Pathak, Fig. 16, pa 0129, In block 1610, the dialogue system 104 generates output information (e.g., the output information 120) based on the response).
Pathak doesn't expressly discuss wherein the topic label comprises a name for the topic and the topic description comprises a description of the topic in a human-understandable format and generate the topic label and topic description for each topic of the plurality of topics by executing the language model based on the prompt.
Wan teaches receive a set of documents from which to generate a topic label and a topic description for a topic (Wan, pa 0052, the input 204 includes the raw natural language text of multiple datasets (e.g., multiple conversations in a chat thread) and a summarization instruction, where each dataset is represented by (Xi)), wherein the topic label comprises a name for the topic and the topic description comprises a description of the topic in a human-understandable format (Wan, pa 0030, humans can more easily interpret and make sense of the results because the outputs of the model are in natural language (e.g., cluster descriptions, and cluster labels), as opposed to numerical representations.);
input the subset of topic documents into an information extraction model (Wan, pa 0052, the input 204 includes the raw natural language text of multiple datasets (e.g., multiple conversations in a chat thread) and a summarization instruction, where each dataset is represented by (Xi) & pa 0038-0039, A "natural language summary" as described herein refers to text summarization. Text summarization ( or automatic summarization or NLP text summarization) is the process of breaking down text (e.g., several paragraphs) into smaller text (e.g., one sentence or paragraph). … This method extracts vital information while also preserving the meaning of the text. … the summarization may be, "I'm heading to the store to buy fruit ," where "store" is a new word input into the new sentence ( e.g., based on NLP semantic analysis and/or NER) and 'Tm going" is removed from the original sentence. NER is an information extraction technique that identifies and classifies tokens/words or "entities" in natural language text into predefined categories);
execute the information extraction model to generate a plurality of snippets from the subset of topic documents for the topic (Wan, 0052, the language model encoder(s)/decoder(s) 206 generates a text summary-i.e., the "summary of each dataset" as illustrated in the output 208, which is represented by (fi).);
generate a compressed representation of the set of documents … to include in a prompt (Wan, pa 0054, The input 304 includes multiple batches of dataset summaries (i.e., natural language summaries), an instruction of cluster descriptions, an instruction of cluster labels, and one or more constraint instructions. The multiple batches of dataset summaries represents a collection of document summaries generated from FIG. 2), wherein the compressed representation having a first number of tokens to be stored in a computer memory that is less than a second number of tokens in the plurality of topic terms (Wan, pa 0038, A "natural language summary" as described herein refers to text summarization. Text summarization ( or automatic summarization or NLP text summarization) is the process of breaking down text (e.g., several paragraphs) into smaller text (e.g., one sentence or paragraph).);
input the machine-consumable prompt of each topic into the language model that is distinct from the unsupervised machine learning model (Wan, pa 0053, The input 304 is fed to the language model encoder(s) and/or decoder(s) 306 (which may be the same model as 206 of FIG. 2), which then produces the output 308); and
generate the topic label and the topic description for the topic by executing the language model based on the input of the machine-consumable prompt (Wan, pa 0056, Continuing with FIG. 3, the output 308 includes the generated cluster description(s) and label(s) for each batch.).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Pathak with the teachings of Wan because humans can more easily interpret and make sense of the results because the outputs of the model are in natural language (e.g., cluster descriptions, and cluster labels) (Wan, pa 0030).
With respect to claim 2, Pathak in view of Wan teaches the non-transitory computer-readable medium of claim 1, wherein the unsupervised machine learning model is a topic model (Pathak, pa 0097, The topic-modeling component 910 can likewise uses various rules-based logic and/or machine-trained models to extract topics associated with the source information 904, including Latent Dirichlet Allocation (LDA)), and wherein the language model is a Large Language Model (LLM) (Pathak, pa 0005, language model refers to a machine-trained model that is capable of processing language-based input information and, optionally, any other kind of input information (including video information, image information, audio information, etc.). As such, a language model can correspond to a multi-modal machine-trained model.).
With respect to claim 3, Pathak in view of Wan teaches the non-transitory computer-readable medium of claim 2, wherein the topic model is a Latent Dirichlet Allocation (LDA) clustering model (Pathak, pa 0097, The topic-modeling component 910 can likewise uses various rules-based logic and/or machine-trained models to extract topics associated with the source information 904, including Latent Dirichlet Allocation (LDA)).
With respect to claim 4, Pathak in view of Wan non-transitory computer-readable medium of claim 1, wherein to generate the compressed representation of the set of documents based on the plurality of snippets, the computer-readable instructions further cause the processor to: rank the plurality of snippets based on a frequency of occurrence of each snippet of the plurality of snippets in the set of topic documents, wherein the higher the frequency of occurrence, the higher the rank of the snippet; and select a predetermined number of highest ranked snippets of the plurality of snippets to obtain a subset of snippets (Pathak, pa 0098, the compression component 138 selects the K top-ranked terms. By favorably weighting a selected term, the compression component 138 promotes this term over other terms that are not similarly weighted, and increases the likelihood that the selected term will be included in the top K information items.).
With respect to claim 10, Pathak in view of Wan teaches the non-transitory computer-readable medium of claim 1, wherein the information extraction model is a rule-based model (Pathak, pa 0096, the NER-extracting component 908 uses any rules-based logic (e.g., any algorithm) and/or machine trained model to identify named entities associated with the source information 904.).
With respect to claim 11, Pathak in view of Wan teaches the non-transitory computer-readable medium of claim 1, wherein the information extraction model is a machine learning model trained for a specific domain or application or an extractive summarization model (Pathak, pa 0096, the NER-extracting component 908 uses any rules-based logic (e.g., any algorithm) and/or machine trained model to identify named entities associated with the source information 904. & Wan, pa 0039, Such predefined categories may be indicated in corresponding tags or labels, which can be used in summaries. Entities can be, for example, names of people, specific organizations, specific locations, specific times, specific quantities, specific monetary price values, specific percentages, specific pages, and the like. Likewise, the corresponding tags or labels can be specific people, organizations, location, time, price ( or other invoice data)).
With respect to claim 12, Pathak in view of Wan teaches the non-transitory computer-readable medium of claim 1, wherein the information extraction model is a combination of a rule-based model and one of a machine learning model trained for a specific domain or application or an extractive summarization model (Pathak, pa 0096, the NER-extracting component 908 uses any rules-based logic (e.g., any algorithm) and/or machine trained model to identify named entities associated with the source information 904. & Wan, pa 0039, Such predefined categories may be indicated in corresponding tags or labels, which can be used in summaries. Entities can be, for example, names of people, specific organizations, specific locations, specific times, specific quantities, specific monetary price values, specific percentages, specific pages, and the like. Likewise, the corresponding tags or labels can be specific people, organizations, location, time, price ( or other invoice data). NER and/or other NLP functionality can be used to understand and summarize natural language, such as tokenization (breaking text into words or phrases), stemming (reducing words to their base form), and part-of-speech tagging (identifying the grammatical role of words), semantic analysis (to derive meaning of a first word based on context/meaning of other words by the first word), and/or syntactic analysis (detecting the grammatical structure of a sentence or a sequence of words to determine its syntactic structure, or understand how words are organized in a sentence and how they relate to each other in terms of grammatical rules).).
With respect to claim 13, Pathak in view of Wan teaches the non-transitory computer-readable medium of claim 10, wherein to generate the title from the body of a topic document, the computer-readable instructions further cause the processor to:
input the plurality of snippets into a Large Language Model (LLM) (Wan, pa 0086, Per block 1002, some embodiments receive a dataset. In some embodiments, a dataset is a raw dataset that has not been summarized. In some embodiments, the dataset is a larger pool of smaller datasets or a smaller set of pool of larger datasets. In some embodiments, the dataset includes several summarized and/or batched datasets. In an illustrative example, in some embodiments, the dataset represents a summary (e.g., a LLM text summary) of a larger dataset.);
execute the LLM to generate a title for each of the subset of topic documents to obtain a plurality of summaries (Wan, pa 0090, Per block 1006, in response to the receiving of the natural language prompt, some embodiments generate, via a machine learning model (e.g., a LLM or other language model), at least one of: the category ( e.g., a cluster or group), the description, and/or the label.));
generate the machine-consumable prompt from the plurality of summaries (Wan, pa 0090, For every other batch, of the two or more batches that excludes the first batch, particular embodiments revise (via an update prompt), the respective label of a first cluster or the description of the first cluster, as described, for example, in FIG. 8.).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Pathak with the teaches of Wan because this reduces the number of content units in the instances of prompt information submitted to the language model (Wan, pa 0042).
With respect to claim 14, Pathak in view of Wan teaches the non-transitory computer-readable medium of claim 1, wherein each snippet of the plurality of snippets includes a plurality of key words from the subset of topic documents, a plurality of key phrases from the subset of topic documents, or a combination of key words and key phrases from the subset of topic documents (Wan, pa 0038, the summarization may be, "I'm heading to the store to buy fruit ," where "store" is a new word input into the new sentence ( e.g., based on NLP semantic analysis and/or NER) and 'I’m going" is removed from the original sentence.).
With respect to claim 15, Pathak in view of Wan teaches the non-transitory computer-readable medium of claim 14, wherein each snippet of the plurality of snippets further includes context around at least one of one or more of the key words or one or more of the key phrases (Wan, pa 0038, the summarization may be, "I'm heading to the store to buy fruit ," where "store" is a new word input into the new sentence ( e.g., based on NLP semantic analysis and/or NER) and 'I’m going" is removed from the original sentence.).
With respect to claims 16-18 and 21-23, the limitations are essentially the same as claims 1-4 and 10-15, and are rejected for the same reasons.
With respect to claims 24-23 and 28-30, the limitations are essentially the same as claims 1-4 and 10-15, and are rejected for the same reasons.
Claim(s) 5-9, 19, 20, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Pathak in view of Wan, and further in view of Reza et al. (US 2023/0237277).
With respect to claim 5, Pathak in view of Wan teaches the non-transitory computer-readable medium of claim 4, as discussed above.
Reza teaches wherein to generate the compressed representation of the set of documents based on the plurality of snippets, the computer-readable instructions further cause the processor to concatenate the subset of snippets to generate a string for the topic (Reza, pa 0059, A domain adaptation algorithm may be used to train T5 to generate unique domain relevant features (DRFs; a set of keywords that characterize domain information) for each input. Then those DRFs can be concatenated with the input to form a template).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Pathak in view of Wan with the teachings of Reza because it provides dynamic prompting which can be highly beneficial to develop a pre-trained model by appending the prompts to each set of input with an opinion and aspect. This will provide a better in-context learning and capture the opinion context information, which can lead to effective semantic information modelling (Reza, pa 0034).
With respect to claim 6, Pathak in view of Wan and Reza teaches the non-transitory computer-readable medium of claim 5, wherein the prompt for the topic comprises the string for the topic, an output definition defining a format for the topic label and the topic description for the topic, and one or more constraints (Reza, Fig. 1 & pa 0039, The original input 105 is then concatenated with the respective generated prompting template 115 to create a prompting function 120. For example, the original input 105 and the prompting template 115 may be used as input for a concatenate function configured to join the two text strings into a single text string: the original input 105; the prompting template 115, such that the two text strings are now linked or associated with one another.).
With respect to claim 7 Pathak in view of Wan and Reza teaches the non-transitory computer-readable medium of claim 6, wherein the one or more constraints include a system role and a user role to provide a framework for how to generate the topic label and topic description for the topic (Reza, pa 0048, The training data may be acquired from the public domain or private domain. For example, a user such as a customer in the food and service industry may provide training data for fine-tuning a model to analyze sentiment in online food blog posts. & pa 0036, FIG. 1 is a block diagram illustrating the overall concept 100 of dynamic aspect based prompting and its influence on improving the confidence in a downstream task. As shown, original input 105 is obtained from a set of training data. The original input 105 is a text example such as “I like the food but not the service” from the set of training data. …The set of training data includes labels. The labels comprise: (i) text that relate to possible solutions for the given task to be learned by the model, and (ii) the specified solutions (e.g., a class identifier or ground truth for the text example). The labels may be provided by a user (e.g., a customer) and may be particular to a domain that the user intends to train the model within.).
With respect to claim 8, Pathak in view of Wan and Reza teaches the non-transitory computer-readable medium of claim 7, wherein the one or more constraints further include a summary of what to include in the topic description (Reza, pa 0036, The labels may be provided by a user (e.g., a customer) and may be particular to a domain that the user intends to train the model within. For example, the text labels may be words such as terrible, bland, flavorful, delicious, disgusting, sour, sweet, poison, enjoyable, spicy, etc. that relate to various semantic classes (e.g., positive, negative, neutral, or the like) to be predicted for each text example within the domain of food. In other words, the original input 105 may include text that relates to possible solutions for the given task (e.g., the food was good but the service was bad—with good and bad being text that relate to possible sentiment solutions or classes);).
With respect to claim 9, Pathak in view of Wan and Reza teaches the non-transitory computer-readable medium of claim 6, wherein the format comprises: <topic number>:<topic label>:<topic description> (Examiner note: the format of the topic label is an obvious variant of design choice that could be specified by the programmer.).
With respect to claims 19, 20, and 27, the limitations are essentially the same as claims 5-9, and are rejected for the same reasons.
Response to Arguments
35 U.S.C. 103 rejections
Applicant seems to argue a newly amended limitation. Applicant’s amendment has rendered the previous rejection moot. Upon further consideration of the amendment, a new grounds of rejection is made in view of Pathak et al. (US 2024/0394479).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Kelly et al. (US 2024/0289560) teaches generating a contextual classification for the one or more request text fields and identifying a refined document subset based on the contextual classification and generating, using a large language model, one or more generative text fields using a generative model prompt based on the prompt document subset and the one or more request text fields.
Ailem et al. (US 2026/0004135) teaches using one or more large language models and the plurality of instructive, generate a plurality of annotated clusters, wherein an annotated cluster comprises a category annotation and a summary annotation.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRITTANY N ALLEN whose telephone number is (571)270-3566. The examiner can normally be reached M-F 9 am - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sherief Badawi can be reached on 571-272-9782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRITTANY N ALLEN/ Primary Examiner, Art Unit 2169