Last updated: May 29, 2026
Application No. 18/226,297
NATURAL LANGUAGE PROCESSING OF N-GRAM TEXTUAL DATA INDICATIVE OF TRENDING THEMES

Non-Final OA §103
Filed
Jul 26, 2023
Priority
Jan 23, 2020 — provisional 62/964,837 +1 more
Examiner
LERNER, MARTIN
Art Unit
2658
Tech Center
2600 — Communications
Assignee
The United States Of America AS Represented By The Secretary Of The Navy
OA Round
3 (Non-Final)
Interview Optional

— +13.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 78% grant rate with +13.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 988 resolved cases, 2023–2026
Examiner Intelligence

LERNER, MARTIN View full profile →
Grants 78% — above average
Career Allowance Rate
771 granted / 988 resolved
+16.0% vs TC avg
Moderate +13% lift
Without
With
+13.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
22 currently pending
Career history
1008
Total Applications
across all art units
Statute-Specific Performance

§101
10.0%
-30.0% vs TC avg
§103
74.1%
+34.1% vs TC avg
§102
3.5%
-36.5% vs TC avg
§112
8.7%
-31.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 988 resolved cases
Office Action

§103
DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 2, 7 to 11, and 16 to 17 are rejected under 35 U.S.C. 103 as being unpatentable over Tiwari (U.S. Patent Publication 2019/0102374) in view of Consul et al. (U.S. Patent No. 9,165,056).
Concerning independent claims 1 and 10, Tiwari discloses a method and system for predicting future trending topics, comprising:
“reading data from at least one data file including a plurality of textual data entries” – text extractor 344 can receive through interface 342 a set of posts or content items (“a plurality of textual data entries”); text extractor 344 can extract text from the received set of posts by taking textual content from a post (¶[0032]: Figure 3); process 400 can extract text from a current post (¶[0042]: Figure 4: Step 410);
“processing the plurality of textual data entries including one or more of removing word cases and punctuation, lemmatizing nouns, removing a first category of stop words, and removing a second category of stop words including thematic or data file specific stop words and phrases” – n-gram generator 348 can normalize the extracted text for each received post; n-gram generator 348 can remove from a cumulative set n-grams that contain certain specified stop words (“processing the plurality of textual data entries including one or more of . . . removing a first category of stop words . . .”); a set of stop words can be words that appear above a threshold frequency in a language, words determined to be offensive, or manually selected words, e.g., words determined to be unhelpful for determining a trending topic (“removing a second category of stop words including thematic or data file specific stop words and phrases”); stop words can include numbers (¶[0034] - ¶[0035]: Figure 3); process 400 can remove n-grams from a cumulative set of n-grams that include one or more stop words; a defined set of stop words can include any words that appear above a threshold frequency in a language, words determined to be offensive, or manually selected words, e.g., words determined to be unhelpful for determining a trending topic; stop words can include numbers (¶[0047]: Figure 4: Step 422); 
“applying natural language processing (NLP) to each of the processed plurality of textual data entries including: identifying word level n-grams for one or more selectable word level n-gram lengths” – text is extracted from each post, text is normalized and tokenized, and tokens are organized into n-grams; in various implementations, n-grams can be limited to an exact number of words, e.g., two (¶[0010]); n-gram generator 348 can normalize text, and organize the tokenized text into n-grams of a particular length (¶[0034]: Figure 3); n-grams can be tri-grams (¶[0035]); process 400 can organize tokens into n-grams of a specified length, e.g., one, two, or three words (“one or more selectable word level n-gram lengths”) (¶[0045]: Figure 4: Step 416); implicitly, n-gram processing and stop word removal are “natural language processing”;
“identifying textual data entries of the plurality of textual data entries containing repeated n-gram instances of the identified word level n-grams” – trending topics prediction system can identify topics across billions of posts by extracting text from each post; trending topics prediction system can have each n-gram, extracted from an originating post, tagged and stored in a cumulative set of n-grams (¶[0010]); 
“determining respective numbers of the identified textual data entries of the plurality of textual data entries containing repeated n-gram instances of the identified word level n-grams in the at least one data file” – frequency computer 350 can receive a cumulative set of n-grams and compute a frequency score for each unique n-gram; a frequency score within a set of a ‘unique’ n-gram is an occurrence value for all n-grams within that set that have the same sequence of tokens; in a set of n-grams ‘here we go’, ‘we’re on our way’, ‘here we go’, and ‘here we go’, where an occurrence value is a total count, there are two unique n-grams, ‘here we go’ and ‘we’re on our way’, with an occurrence value of three for ‘here we go’ and an occurrence value of one for ‘we’re on our way’; frequency computer 350 can count for each unique n-gram a number of times that unique n-gram occurs (¶[0036]: Figure 3); “determining respective numbers of the identified textual data entries” is performed because a process 400 for predicting trending topics sets a first post in Step 410 and then determines if all of the posts from the set of posts are processed by a loop in Step 418 (¶[0040] - ¶[0046]: Figure 4); that is, process 400 provides a loop that maintains a count of the set of posts that iterates to consider every post, and the set of posts contain repeated n-gram instances (“determining respective numbers of the identified textual data entries of the plurality of textual data entries containing repeated n-gram instances”); 
“sorting the repeated n-gram instances based, in part, on the determined respective numbers of the identified textual data entries containing the repeated n-gram instances of the identified word level n-grams in the at least one data file occurring across the plurality of data entries to determine a most mentioned list of repeated n-gram instances for the at least one data file that is indicative of trending themes occurring across the textual data in the at least one data file” – n-grams can be sorted by their frequency score; n-grams with a frequency above a threshold, i.e., high frequency n-grams, can be passed to prediction engine 352; prediction engine 352 can receive the high frequency n-grams and compute a prediction value as an expectation of how much that n-gram will be trending in the future; n-grams can be sorted according to their computed prediction value; top scoring n-grams, e.g., n-grams with a prediction value above a threshold, can be determined to be likely trending in the future (¶[0038]: Figure 3); process 400 can compute a value for each n-gram selected representing a prediction for an amount the n-gram will be trending in the future (¶[0050]: Figure 4: Step 430); Figure 6B illustrates a predicted topic report 540 of bi-grams that includes a first column of ‘Most Discussed’ (“a most mentioned list”) (¶[0059]: Figure 6B).
Concerning independent claims 1 and 10, Tiwari discloses all of the limitations with an exception of “outputting at least the most mentioned list of repeated n-gram instances in a format adapted for being appended to the at least one data file.” 

Concerning independent claims 1 and 10, Consul et al. teaches generation and use of an email frequent word list for generating a mailbox specific frequent word list and a universal frequent word list.  (Abstract)  A mailbox specific frequent word list 104 includes a list of frequent words found in a user’s mailbox and a corresponding frequency associated with each of the words.  The list of frequent words may be sorted in order of frequency so that the most frequent words may be shown at the top of the mailbox specific frequent word list 104, and the remaining words may be shown in descending order of frequency.  (Column 2, Lines 58 to 65: Figure 1)  A mailbox specific frequent word list 104 can be formatted in Extensible Markup Language (‘XML’).  An example of an XML data structure for an entry in the mailbox specific frequent word list 104 is <TopNWord=“____” Frequency= “_____”></TopNWord>.  Other forms for representing entries in mailbox specific frequent word list 104 may be contemplated by those skilled in the art.  (Column 3, Lines 4 to 18: Figure 1)  API 112 generates universal frequent word list 118 by counting the number of document identifiers associated with each of the words in search data 206.  If the word ‘apple’ is included in five emails, and the word ‘bear’ is included in three emails, ‘apple’ has a frequency of five and ‘bear’ has a frequency of three.  Search API 112 may filter universal frequent word list 118 for only words contained in emails associated with a specific mailbox.  Email server 100 maintains a mapping for each mailbox, and its corresponding emails.  The mailbox specific frequent word list 104 may be formatted in XML or other suitable representation.  The mailbox specific word list 104 may be stored as a folder associated item (‘FAI’) and may be represented by a data structure specifying a particular mailbox, which is identified by a mailbox identifier.  An exemplary XML representation of the mailbox specific frequent word list 104 is denoted ‘TopNWords’ along with a mailbox identifier ‘mailGuid’.  An exemplary XML representation of this data structure is denoted ‘WordFrequency’ and includes a ‘Word’ and its associated ‘Frequency’.  (Column 5, Line 24 to Column 5, Line 37: Figure 2)  Here, a mailbox specific frequent word list 104 that is stored in an XML format is “outputting at least the most mentioned list . . . in a format adapted for being appended to the at least one data file.”  That is, XML or other format is “a format adapted for being appended to the at least one data file.”  Broadly, Applicant’s limitation of “adapted for being appended” does not actually require that the most mentioned list is, in fact, appended, only that it is a format “adapted for being appended”.  However, Consul et al. provides a mapping between a particular mailbox and a mailbox specific frequent word list 104, which is equivalent to ‘appending’ a mailbox specific frequent word list 104 to emails of the mailbox.  An objective is to infer information about a user so that a user’s mailbox may be a valuable source of relevant information about the user for application programs that can utilize or benefit from this information.  (Column 1, Lines 16 to 27)  It would have been obvious to one having ordinary skill in the art to output a most mentioned list of repeated words in a format adapted for being appended to a data file as taught by Consul et al. to predict trending topics from n-grams of social media posts in Tiwari for a purpose of utilizing valuable information about a user from a user’s mailbox for application programs that can benefit from this information. 

Concerning claims 2 and 11, Tiwari discloses that identification of n-grams likely to be trending in the future can be provided, e.g., through interface 342 (¶[0038]: Figure 3); process 400 can select a top five n-grams (¶[0051]: Figure 4); n-grams selected can be surfaced to users in a variety of ways (¶[0052]: Figure 4); Figure 6B illustrates a predicted topic report 540 (“an output analysis data file”) that includes a first column of ‘Most Discussed’ (“the results including at least the most mentioned list”) (¶[0059]: Figure 6B).
Concerning claims 7 and 16, Tiwari discloses that n-grams can be sorted by their frequency score (¶[0036]: Figure 3); n-grams are sorted according to their computed prediction value, and top scoring n-grams, e.g., n-grams with a prediction value above a threshold, can be determined as likely to be trending; identifications of these top-scoring n-grams can be provided, e.g., through interface 342 (“an ascending order list starting from a largest number of repeated n-gram instances”) (¶[0038]: Figure 3); process 400 can determine a frequency value for each unique n-gram; the frequency value can be a total count of the occurrences of the n-gram (¶[0048]: Figure 4: Step 424); process 400 can sort the n-grams based on corresponding prediction values, and select a top five n-grams (¶[0051]: Figure 4: Step 432); Figure 6B illustrates a predicted topic report 540 that includes a first column of ‘Most Discussed’ (¶[0059]: Figure 6B); broadly, a list of n-grams sorted by frequency is ‘ascending’ from the bottom to the top.
Concerning claims 8 and 17, Tiwari discloses that n-grams are sorted by their frequency score, and n-grams with a frequency score above a threshold can be passed to prediction engine 352 (“having counts above a predetermined number”) (¶[0036]: Figure 3); n-grams are sorted according to their computed prediction value, and top scoring n-grams, e.g., n-grams with a prediction value above a threshold, can be determined as likely to be trending; identifications of these top-scoring n-grams can be provided, e.g., through interface 342 (¶[0038]: Figure 3); the frequency value can be a total count of the occurrences of the n-gram within the group (¶[0048]: Figure 4: Step 424); process 400 can select a top five n-grams (¶[0051]: Figure 4).
Concerning claim 9, Tiwari discloses n-gram generator 348 can remove from a cumulative set n-grams that contain certain specified stop words; a set of stop words can be words that appear above a threshold frequency in a language, words determined to be offensive, or manually selected words, e.g., words determined to be unhelpful for determining a trending topic; stop words can include numbers (¶[0034] - ¶[0035]: Figure 3); process 400 can remove n-grams from a cumulative set of n-grams that include one or more stop words; a defined set of stop words can include any words that appear above a threshold frequency in a language, e.g., the, a, she, etc., words determined to be offensive, or manually selected words, e.g., words determined to be unhelpful for determining a trending topic; stop words can include numbers (¶[0047]: Figure 4: Step 422); here, stop words that appear above a threshold frequency in a language, e.g., the, a, she, etc., are “basic stop words”; manually selected stop words can be construed as “stop words derived from a library”, i.e., a user defines which words are to be included in the ‘library’.

Claims 3 to 5 and 12 to 14 are rejected under 35 U.S.C. 103 as being unpatentable over Tiwari (U.S. Patent Publication 2019/0102374) in view of Consul et al. (U.S. Patent No. 9,165,056) as applied to claims 1 to 2 and 11 to 12  above, and further in view of Reynolds et al. (U.S. Patent Publication 2018/0210936).
Concerning claims 3 and 12, Tiwari discloses “the output analysis file further includes information concerning a . . . column pertaining to the results of the application of NLP to the plurality of textual data entries” as illustrated in Figure 6B.  Here, Tiwari discloses a predicted topic report 650 (“the output analysis data file”) has a column of ‘Most Discussed’ that is “a column pertaining to the results of the application of NLP to the plurality of textual data entries”.  (¶[0059]: Figure 6B)  That is, a column of ‘Most Discussed’ is what results from sorting n-gram in natural language processing.  However, Tiwari does not clearly disclose that a column of predicted topic report 650 include a data “narrative” column.  A ‘Most Discussed’ column in Figure 6B only appears to present a list of most frequent n-grams, but not narrative data about the n-grams.  
Concerning claims 3 and 12, Reynolds et al. teaches an interactive interface for presenting insight calculations that summarize data attributes.  (Abstract)  Specifically, Reynolds et al. teaches that trending datasets 2130 may be disposed in a portion of a user interface 2102 that presents user interface elements including text information 2152 describing trending dataset 2150.  A description may include a purpose of a dataset, a source of a dataset, and a field of applicability.  (¶[0170]: Figures 20 to 21)  Here, Figure 20 includes a column of recent discussion 2024 with “narrative” descriptions of a trending dataset, e.g., ‘Hate Crime Laws and Statistics’, ‘TED Talks Complete List’, etc.  An objective is to present summarization of dataset attributes to facilitate discovery, formation, and analysis of interrelated collaborative datasets.  (¶[0002])  It would have been obvious to one having ordinary skill in the art to provide a narrative description as taught by Reynolds et al. of a column pertaining to results of natural language processing in Tiwari for a purpose of facilitating discovery, formation, and analysis of interrelated collaborative datasets.
Concerning claims 4 and 13, Tiwari discloses “a plurality of textual entries”, but omits “a separate cell within the at least one data file”.  Here, Tiwari is directed to receiving textual entries as posts, but discloses a variety of sources of textual information.  (¶[0013])  Generally, it is known that electronic spreadsheets include textual entries in ‘cells’, so it would be merely a matter of an application to electronic spreadsheets in Tiwari.  Specifically, Reynolds et al. teaches a data file 601a may be received in a variety of formats, and dataset analyzer 630 may be configured to analyze data file 601a to detect and resolve if a cell contains useful data including a string in a column of a tabular data file.  (¶[0090] - ¶[0091])  Reynolds et al., then, teaches that tabular data could be presented within cells.  An objective is to present summarization of dataset attributes to facilitate discovery, formation, and analysis of interrelated collaborative datasets.  (¶[0002])  It would have been obvious to one having ordinary skill in the art to perform natural language processing on n-grams of Tiwari for tabular data in cells as taught by Reynolds et al. for a purpose of facilitating discovery, formation, and analysis of interrelated collaborative datasets.
Concerning claims 5 and 14, Reynolds et al. teaches that a data file 601a may be received in a variety of formats, and dataset analyzer 630 may be configured to analyze data file 601a to detect and resolve if a cell contains useful data including a string in a column of a tabular data file (“wherein each of the separate cells are part of a same data narrative column within the at least one data file”).  (¶[0090] - ¶[0091])  

Claims 6 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Tiwari (U.S. Patent Publication 2019/0102374) in view of Consul et al. (U.S. Patent No. 9,165,056) as applied to claims 1 to 2 and 11 to 12  above, and further in view of Udupa et al. (U.S. Patent Publication 2013/0151533).
Tiwari discloses “sorting the repeated n-gram instances based, in part, on the determined numbers of the identified textual data entries containing the repeated n-gram instances of the identified word level n-grams in the at least one data file occurring across the plurality of textual data entries”, but omits “searching for nouns and adjectives near each identified n-gram in a textual data entry of the plurality of textual data entries pertaining to the identified n-gram.”  However, Udupa et al. teaches extracting n-grams from a document corpus, where a size of n-grams extracted from each document can range between one and five.  A well-formedness score can be computed for each n-gram from documents, wherein the well-formedness score is indicative of parts of speech in the analyzed n-gram as well as the arrangement of parts of speech in the n-gram.  One or more natural language processing algorithms can be employed in connection with computing the well-formedness score, or a series of rules can be analyzed with respect to an n-gram to compute the well-formedness score.  Specifically, an n-gram that begins with a noun, a verb, or a participle and ends with a noun or a participle will be provided with a relatively high well-formedness score, while an n-gram that begins or ends with an adjective or adverb will be provided with a relatively low well-formedness score.  Each n-gram that has a well-formedness score above a predefined threshold can be retained as a candidate key phrase in a list of candidate key phrases.  An objective is to identify key phrases in documents.  (¶[0007] - ¶[0008])  It would have been obvious to one having ordinary skill in the art to analyze n-grams to determine nouns and adjectives near each identified n-gram in a textual data entry as taught by Udupa et al. to predict future trending topics from n-grams in Tiwari for a purpose of identifying key phrases in documents.  

Response to Arguments
Applicant's arguments filed 17 September 2025 have been fully considered but they are not persuasive.
Applicant amends independent claims 1 and 10 to include clarifying limitations of identifying textual data entries “of the plurality of textual data entries”, changing “counting respective number of the identified textual data entries” to “determining respective numbers of the identified textual data entries of the plurality of textual data entries” and changing “sorting the repeated n-grams instances based on the numbers of repeated n-gram instances across the plurality of data entries” to “sorting the repeated n-grams instances based, in part, on the determined respective numbers of the identified textual data entries containing the repeated n-gram instances of the identified word level n-grams in the at least one data file occurring across the plurality of textual data entries”.  
Then Applicant presents arguments traversing the prior rejection of the independent claims as being obvious under 35 U.S.C. §103 over Tiwari (U.S. Patent Publication 2019/0102374) in view of Narth et al. (U.S. Patent Publication 2021/0141790).  Applicant argues that Tiwari does not appear to actually be looking for and first counting repeated n-gram instances for each data entry, e.g., n-gram instances for each narrative in the narrative information, and then further counting the number of repeated n-gram instances across the plurality of data entries, e.g., the aggregate data entries in the data file or dataset.  Applicant cites ¶[0034] of Tiwari as disclosing storing ‘resulting n-gram . . . in a cumulative set for all the posts, with each n-gram associated with the region category, vertical category, and data information from the post that originated the n-gram’ after organizing the tokenized text into n-grams.  Applicant maintains that this does not involve a determination of the repetition of n-grams in specific data entries or narrative regardless of categorization in regions, vertical category, or dates.  Applicant argues that Tiwari does not appear to teach actually counting of those entries having repeated n-grams across the plurality of data entries comprising narrative information, but rather counting within a larger organization of subsets of categorized information.  Applicant concludes that Tiwari, given a broadest reasonable interpretation, is determining count frequency in each data entry class/vertical category, but does not determine counts across the data entries/classes/vertical categories as set forth by the independent claims.  Applicant states that the rejection posits a number of occurrences of an n-gram is counted in a cumulative set of posts and a number of occurrences of these n-grams are counted and accumulated for each of the plurality of individual posts.  However, Applicant argues that merely from knowing only a mere total count of n-grams in the accumulated or cumulative set of posts one could not know or recreate the number of posts that had repeated n-gram instances within each post.  Applicant maintains that the claim language requires an identification of data entries or posts having repeated n-grams.  Additionally, Applicant submits that these features are not taught by Narth et al.
Applicant’s amendment overcomes the rejection for indefiniteness under 35 U.S.C. §112(b).
New grounds of rejection are set forth as directed to independent claims 1 and 10 being obvious under 35 U.S.C. §103 over Tiwari (U.S. Patent Publication 2019/0102374) in view of Consul et al. (U.S. Patent No. 9,165,056).  Here, Consul et al. is being substituted as a secondary reference for Narth et al., and the rejection no longer relies upon this latter reference.  The rejection of certain dependent claims continues to rely upon Reynolds et al. (U.S. Patent Publication 2018/0210936) and Udupa et al. (U.S. Patent Publication 2013/0151533).
Consul et al. is maintained to better address the claim limitations of “outputting at least one most mentioned list of repeated n-gram instances in a format adapted for being appended to the at least one data file.”  Consul et al. teaches determining a frequent word list from emails as a mailbox specific frequent word list 104.  The mailbox specific frequent word list 104 is formatted in Extensible Markup Language (XML), or some equivalently known format, which is “outputting at least one most mentioned list . . . in a format adapted for being appended to the at least one file”.  Notably, Applicant’s claim language does not actually state that the most mentioned list is, in fact, appended to the data file, only that it is in a format “adapted to be appended”.  Broadly, a mailbox specific frequent word list 104 that is formatted in XML is “in a format adapted for being appended to the at least one file”.  However, Consul et al. teaches that that XML formatted file of frequent word list 104 is mapped to and stored in a folder with a mailbox, so this can be construed as ‘appending’ frequent word list 104 to a set of files comprising emails of a user’s mailbox.  Consul et al.’s most frequent word list is analogous to Applicant’s “most mentioned list of repeated n-grams”.  That is, Consul et al.’s list of most frequent words is analogous to a list of most frequent n-grams.  
Applicant’s argument is not persuasive because, even if one is to assume arguendo that a list of most frequent n-grams is only compiled cumulatively for a given category or vertical, e.g., a given geographic region or a given date value, in Tiwari, this reference still discloses the limitations directed to “determining respective numbers of the identified textual data entries containing the repeated n-gram instances” and “sorting the repeated n-gram instances based, in part, on the determined respective numbers of the identified textual data entries containing the repeated n-gram instances of the identified word level n-grams”.  That is, even if n-grams are only counted and sorted in one given category or vertical, and even if the counted n-gram are not added up across all of the categories or verticals, Tiwari still generates a cumulative count of n-grams and sorts those n-grams for at least one category or at least one vertical.  Applicant’s claim language does not provide any express distinguishing limitation over performing this procedure multiple times individually for multiple categories or verticals as is being performed by Tiwari.  
Here, Tiwari at ¶[0036] states:
Frequency computer 350 can receive the cumulative set of n-grams, and their associated data, and compute a frequency score for each unique n-gram. A frequency score within a set for a “unique” n-gram is an occurrence value for all n-grams within that set that have the same sequence of tokens. For example, in the set of n-grams “here we go,” “we're on our way,” “here we go,” and “here we go,” where the occurrence value is a total count, there are two unique n-grams: “here we go” with an occurrence value of three and “we're on our way” with an occurrence value of one. In some implementations, each n-gram can be grouped under a particular category defined by its region classification, vertical classification, or both. Frequency computer 350 can count, for each unique n-gram, the number of times that unique n-gram occurs total, or occurs within the n-gram's classification group. In some implementations, frequency computer 350 can provide the counts as the frequency score, or can compute the frequency score by dividing the counts by a total, which can be the total of the n-grams within that category or can be the total number of n-grams in the cumulative set. In some implementations, the n-grams can be sorted, or sorted within each category, by their frequency score. N-grams with a frequency above a threshold (i.e. “high frequency n-grams”) can be passed to prediction engine 352.  (emphasis added)   
 
Similarly, Tiwari at ¶[0048] states:
At block 424, process 400 can sort the n-grams into groups by the n-gram's associated region classification, vertical classification, or both. AT block 426, process 400 can determine, within each group, a frequency value for each unique n-gram. The frequency value can be a total count of the occurrences of the n-gram within the group, or a ratio of this count to either the total number of n-grams or to the n-grams within that group.  (emphasis added)   

	Moreover, Consul et al. is similar in this way to Tiwari.  Here, Consul et al. generates a universal frequent word list 118 based on a count of a number of document identifiers associated with each of a plurality of words from search data 206.  Then Consul et al. generates a mailbox specific frequent word list 104 by filtering universal frequent word list 118 for only those words contained in emails associated with a specific mailbox.  Mailbox specific frequent word list 104 includes a list of frequent words found in the user’s mailbox and a corresponding frequency associated with each of the words.  The list of frequent words may be sorted in order of frequency so that the most frequent words are shown at the top of the mailbox specific frequent word list 104 and the remaining words may be shown in descending order of frequency.  A mailbox specific frequent word list 104 may include a high frequency of baby-related words, e.g., ‘crib’, ‘diapers’, and ‘stroller’.  Specifically, Consul et al. teaches at Column 5, Lines 24 to 36:
The process flow 200 proceeds to 208, where the search API 112 receives the search data 206 in response to performing the index scan. Once the search API 112 receives the search data 206, the process 200 proceeds to 210, where the search API 112 generates the universal frequent word list 118 based on the search data 206. In one embodiment, the API 112 generates the universal frequent word list 118 by counting the number of document identifiers associated with each of the words in the search data 206. For example, in the example shown above, the word “apple” is included in five emails, while the word “bear” is included in three emails. As such, “apple” has a frequency of five, and “bear” has a frequency of three.  (emphasis added)   

Moreover, Consul et al. teaches at Column 7, Lines 6 to 47:
In one embodiment, the universal frequent word list 118 includes a mapping of the words to a frequency associated with each of the words across multiple mailboxes. The frequency may be determined by counting the number of email identifiers corresponding to each of the words. Upon generating the universal frequent word list 118, the routine 300 proceeds to operation 312. . . . If the universal frequent word list 118 is current, then the routine 300 proceeds to operation 312, where the search API 112 generates the mailbox specific frequent word list 104 based on the universal frequent word list 118. In one embodiment, the search API 112 filters the words and corresponding frequencies from the universal frequent word list 118 that are associated with only one mailbox. The filtered words and corresponding frequencies then form the mailbox specific frequent word list 104, which may be sorted according to the frequencies.  (emphasis added) 

	Accordingly, Tiwari and Consul et al. disclose and teach the limitation of “determining respective numbers of the identified textual data entries of the plurality of textual data entries containing the repeated n-gram instances of the identified word level n-grams in the at least one data file”.  Tiwari does this with n-grams and Consul et al. does this with words.  Determining a frequency of an n-gram or a word requires counting a number of those n-grams or words.  Both Tiwari and Consul et al. may be understood to generate their most mentioned lists based on only a subset of the total amount of data, i.e., categories or verticals in Tiwari and a mailbox specific to a given user in Consul et al.  Nevertheless, Tiwari and Consul et al. perform the procedure set forth by Applicant’s claims a multiplicity of times, at least once for each category or vertical of Tiwari and at least once for each personal mailbox of Consul et al.  Additionally, Consul et al. generates a universal frequent word list by counting and sorting most frequently used words based on search data that is not limited to a particular category or a particular personal email box.  Consequently, Tiwari and Consul et al. render obvious Applicant’s limitation of “determining respective numbers of identified textual data entries of the plurality of textual data entries containing repeated n-gram instances of the identified word level n-grams in the at least one data file.”  The data files are social media posts in Tiwari and emails and search data in Consul et al.
	Applicant’s arguments are not persuasive.  New grounds of rejection are set forth as directed to the independent claims.  This Office Action is NON-FINAL.
	
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant’s disclosure.
Boudreau et al. and Nicholas et al. disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2658                                                                                                                                                                                                        December 15, 2025
Read full office action
Prosecution Timeline

Show 2 earlier events
Jan 30, 2025
Examiner Interview Summary
Jan 30, 2025
Applicant Interview (Telephonic)
Feb 13, 2025
Response Filed
Mar 17, 2025
Final Rejection mailed — §103
Sep 17, 2025
Request for Continued Examination
Sep 19, 2025
Response after Non-Final Action
Dec 17, 2025
Non-Final Rejection mailed — §103
May 18, 2026
Response Filed
Precedent Cases

Applications granted by this same examiner with similar technology

18/272,516
Patent 12632656
TEXT GENERATION INCLUDING DE-DUPLICATION OF DECODED WORD INFORMATION TO SPLICE TARGET WORD INFORMATION INTO AN INFORMATION SEQUENCE
2y 10m to grant Granted May 19, 2026
17/770,177
Patent 12620404
DEEP SOURCE SEPARATION ARCHITECTURE
4y 0m to grant Granted May 05, 2026
18/365,535
Patent 12596880
DETERMINING CAUSALITY BETWEEN FACTORS FOR TARGET OBJECT BY ANALYZING TEXT
2y 8m to grant Granted Apr 07, 2026
17/882,447
Patent 12586592
METHODS AND APPARATUS FOR GENERATING AUDIO FINGERPRINTS FOR CALLS USING POWER SPECTRAL DENSITY VALUES
3y 7m to grant Granted Mar 24, 2026
18/336,831
Patent 12585680
CONTEXTUAL TITLES BASED ON TEMPORAL PROXIMITY AND SHARED TOPICS OF RELATED COMMUNICATION ITEMS WITH SENSITIVITY POLICY
2y 9m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

3-4
Expected OA Rounds
78%
Grant Probability
91%
With Interview (+13.3%)
2y 11m (~1m remaining)
Median Time to Grant
High
PTA Risk
Based on 988 resolved cases by this examiner. Grant probability derived from career allowance rate.