Last updated: May 29, 2026

Application No. 17/548,651

AUTOMATICALLY ASSIGN TERM TO TEXT DOCUMENTS

Non-Final OA §103

Filed

Dec 13, 2021

Examiner

PASHA, ATHAR N

Art Unit

2657

Tech Center

2600 — Communications

Assignee

International Business Machines Corporation

OA Round

3 (Non-Final)

Interview Optional

— +16.4% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 90% grant rate with +16.4% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 156 resolved cases, 2023–2026

Examiner Intelligence

PASHA, ATHAR N View full profile →

Grants 90% — above average

Career Allowance Rate

140 granted / 156 resolved

+27.7% vs TC avg

Strong +16% interview lift

Without

With

+16.4%

Interview Lift

resolved cases with interview

Typical timeline

2y 6m

Avg Prosecution

13 currently pending

Career history

174

Total Applications

across all art units

Statute-Specific Performance

§101

4.3%

-35.7% vs TC avg

§103

89.8%

+49.8% vs TC avg

§102

3.0%

-37.0% vs TC avg

§112

1.3%

-38.7% vs TC avg

Black line = Tech Center average estimate • Based on career data from 156 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 1/14/25 has been entered.


Response to Arguments
Applicant’s amendments filed on 1/5/26 have been entered. In view of the amendments, the rejections for claims are maintained and provided in the response below. 
With this Final Office Action, claims 1-16, 18-21 stand rejected.

Claim Rejections - 35 USC § 103
 In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1, 4, 7, 12, 15, 18 and 20   is/are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi (US 9483559 B2) in further view of Nagesh (US 20210294979 A1 ) and Conrad (US 20120078950 A1).
With respect to claims 1/12/20  Gollapudi teaches 
(claim 1) A computer-implemented method comprising 
(claim 12) A computer program product comprising
 (claim 12) one or more computer readable storage media, and program instructions stored on the one or more computer readable storage media to perform operations comprising (Col11ll62-Col4ll10 Computer storage media include volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Memory 504, removable storage 508, and non-removable storage 510 are all examples of computer storage media. Computer storage media include, but are not limited to, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing system 500. Any such computer storage media may be part of computing system 500.) 
(claim 20) a processor set; one or more computer readable storage media, and program instructions stored on the one or more computer readable storage media to cause  the processor set to perform operations comprising (Col11ll62-Col4ll10 Computer storage media include volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Memory 504, removable storage 508, and non-removable storage 510 are all examples of computer storage media. Computer storage media include, but are not limited to, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing system 500. Any such computer storage media may be part of computing system 500.)
receiving, one or more processors, an unstructured text document (Gollapudi¶ Col4ll15-24 The user may provide the query [unstructured text document]112 “fashion boots” to the search engine 150. `The search engine 150 and/or the provider 160 may then attempt to translate the terms of the query 112 into attribute values pairs that correspond to a category associated with the query 112.);
extracting, by one or more processors, at least one unrecognized token from the unstructured text document (Gollapudi¶ Col4ll32-40 Thus, for example, for a query 112 related to the category of shoes, the free token “fashion” [unrecognized token] may be replaced with the attribute values [BRAND: GUCCI] or [BRAND: PRADA], ¶ Col5 ll1-7 As discussed further with respect to FIG. 2, the reformulation engine 140 may generate the reformulated query 115 using reformulation data 145. The reformulation data 145, for each category, may identify which free tokens are modifiers for the category. Which free tokens are modifiers may be determined by the reformulation engine 140 based on the browse trails of the search history data 153);
identifying, by one or more processors, at least one structured data element in a predefined set of data sources, wherein the at least one structured data element is related to the at least one extracted unrecognized token from the unstructured text document (Gollapudi¶ Col4ll41-56 Thus, for example, for a query 112 related to the category of shoes, the free token “fashion” [unrecognized token] may be replaced with the attribute values [BRAND: GUCCI] [structured data element] or [BRAND: PRADA], ¶ Col5 ll1-7 As discussed further with respect to FIG. 2, the reformulation engine 140 may generate the reformulated query 115 using reformulation data 145. The reformulation data 145, for each category, may identify which free tokens are modifiers for the category. Which free tokens are modifiers may be determined by the reformulation engine 140 based on the browse trails of the search history data 153); 
relating, by one or more processors, a label associated with the identified at least one structured data element to the unstructured text document (Gollapudi¶ Col4ll41-56 Thus, for example, for a query 112 related to the category of shoes, the free token “fashion” [unrecognized token]may be replaced with the attribute values [BRAND: GUCCI] [structured data element]or [BRAND: PRADA], ¶ Col5 ll1-7 As discussed further with respect to FIG. 2, the reformulation engine 140 may generate the reformulated query 115 using reformulation data 145. The reformulation data 145, for each category, may identify which free tokens are modifiers for the category. Which free tokens are modifiers may be determined by the reformulation engine 140 based on the browse trails of the search history data 153.);
Gollapudi does not explicitly disclose however Nagesh teaches training, by one or more processors, a machine-learning based application using the unstructured text document and the label, ([0023] For instance, consider that during a training phase of generating an embedding model, there will almost certainly be tokens (words or phrases) that are missing from (or are “unobserved” or “unrecognized” in) the training corpus. Thereafter, when the trained embedding model is used to process new input text (so as to generate semantic vectors for the new input text), the trained embedding model may receive missing or unobserved tokens as input.
 It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi to include training of Nagesh in order to update and optimize models (Nagesh [0027] )
None of Gollapudi and Nagesh explicitly disclose, however Conrad teaches identifying the at least one structured data element comprises searching for at least one data element in the predefined set of data sources, the at least one data element comprising metadata of at least one extracted natural language token (Conrad ¶ [0031] For example, the values "three million" or "3,000,000" may be replaced by the parameter token "AMOUNT" in the sentence, and the metadata for the token may include the value of 3,000,000. Similarly, the "$" or "dollars" is replaced by CURRENCY in the sentence, and the metadata for the token includes the value of $US. Reporting periods, such as "Q4 2009" or "full year 2009" are replaced by PERIOD. In another implementation, the metadata may store a token to lookup the value in a data structure (e.g., table). For example, the date "Sep. 30, 2009," may be replaced by the parameter variable "DATE" in the sentence, and the metadata for the token may be an index such as "253", where "253" is used to lookup the date value for "DATE" in a table. )
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh to include metadata of Conrad  in order to  extract information with a high degree of accuracy and substantially faster speed (Conrad, [0043]).

With respect to claims 7/18 Gollapudi teaches (i) generating, by one or more processors, a structured data element comprising extracted non-natural-language tokens as values, (ii) determining, by one or more processors, domain characteristics for the generated data element, and (iii) searching, by one or more processors, in a predefined set of data sources, for the structured data elements that share the same domain characteristics (Gollapudi¶ Col1ll4142-60 In an implementation, search history data such as browse trails are collected over time. The browse trails, including associated queries and domains, are processed to identify free tokens of the queries that are also modifiers.);

With respect to claim 4 and 15 , Conrad further teaches wherein identifying the at least one structured data element further comprises searching for a value of at least one extracted non-natural-language token in the predefined set of data sources (Gollapudi¶ Col4ll15-24 The user may provide the query [112 “fashion boots” [non-natural language token] to the search engine 150. The search engine 150 and/or the provider 160 may then attempt to translate the terms of the query 112 into attribute values pairs that correspond to a category associated with the query 112.); 


Claims  2/13  are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view of Simske (US 7908279 B1 ).
With respect to claim 2/13  Gollapudi, Nagesh and Conrad do not explicitly disclose, however Simske teaches wherein extracting the at least one unrecognized token from the unstructured text document further comprises: determining, by one or more processors, natural language elements and non-natural- language elements (Simske¶ [0017] If the document is originally electronic or the zoning analysis and OCR tools do not prepare the document adequately, other software tools may be used to prepare the document for keyword analysis, i.e., software tools are needed to separate words and non-words and record document layout information. The words and all other information related to each word are stored in arrays generated by software. 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad  to include non-natural-language elements of Simske in order to increase speed and efficiency for accurate determination of recognizable terms. 

Claims  3/14 are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh, Conrad and Simske in further view of Tomobe (US 20190340670 A1).
With respect to claims 3/14 Gollapudi, Nagesh, Conrad and Simske do not explicitly disclose, however Tomobe teaches grouping, by one or more processors, non-natural-language tokens into groups of tokens with similar characteristics (Tomobe¶[0041] Note that, although clustering is actually performed of the customer IDs and the product IDs, the clustering may be described as “perform clustering of the customers”, and “perform clustering of the products”, for convenience. Similarly, a customer ID cluster (a cluster of the customer IDs) and a product ID cluster (a cluster of the product IDs) may be referred to as a customer cluster and a product cluster, respectively, for convenience.);
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad  in view of on-natural-language elements of Simske  to include the grouping of Tomobe in order to simplify  classification for non-natural tokens.

Claims  5/16 are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view in further view of Evermann  (US 20140365209 A1).
With respect to claim 5/15 Gollapudi, Nagesh and Conrad do not explicitly disclose however Evermann teaches further comprising: determining, by one or more processors, a matching score value based on a number of the at least one unrecognized tokens and recognized tokens extracted from the unstructured text document that have been found in the data element and a specificity of the extracted tokens ([0128] Returning to FIG. 4, the natural language processor determines that the first candidate text string "show times for our go" corresponds to a "movie" domain [specificity], for example, because the words "show times" match words that are particularly salient to the "movie" domain. However, because it cannot determine the movie for which the user is requesting show times (e.g., because "our go" does not match any known movies), the intent deduction confidence score is relatively low (40%). On the other hand, the second candidate text string "show times for Argo" also corresponds to a movie domain, but has a high intent deduction confidence score (99%), for example, because the natural language processor recognized the word "Argo" as a known movie name.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad  to include specificity of Evermann in order to  properly tag unstructured data.


 Claims  6  are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad, Evermann in further view in further view of Quadracci  (US 20130006610 A1).
With respect to claim 6 Gollapudi, Nagesh and Conrad do not explicitly disclose however Evermann teaches further comprising: selecting, by one or more processors, the data element having a highest score value as the label for the unstructured text document ([0057] To process unstructured text and/or partially structured text from main data source 902, text processing tool 906 queries an associative memory application and/or applies at least one source regular expression pattern to the unstructured text and/or partially structured text. For example, in one embodiment, text processing tool 906 processes the unstructured text and/or partially structured text by querying the associative memory application with a segment of unstructured text and/or partially structured text, calculating a similarity score, and determining whether to tag the segment of unstructured text and/or partially structured text based on the similarity score.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad  in view of  specificity of Evermann to include  scoring of Quadracci in order to  effectively organize unstructured data ([0003] Quadracci)



Claims  8/19  are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view in further view of Gupta  (US 20210383453 A1).
With respect to claim 8/19 Gollapudi, Nagesh and Conrad do not explicitly disclose however Gupta teaches outputting, by one or more processors, the related label as a label suggestion for the unstructured text document (Gupta¶ [0060] Suggested catalog labels 410 [suggested label] ]are associated with respective groups of non-normalized item descriptors [unstructured text] assigned to the corresponding suggested catalog label by enterprise data management system 410. In FIG. 4 GUI 400 includes a suggested catalog label 410 for “Butter Chicken” that is currently selected (as indicated by the grey highlight) and non-normalized item descriptor group 420 displayed based on the selection. As described above in relation to FIG. 3, enterprise data management system 130 may assign groups of non-normalized item descriptors 420 to the suggested catalog label “butter chicken” by inputting each of the non-normalized item descriptors into a machine learning pipeline including one or more models that classify the non-normalized item descriptors as one or more enterprise catalog items );
and receiving, by one or more processors, a confirmation signal confirming the label suggestion as the confirmed label for the unstructured text document (Gupta ¶[0063] In this case, weights may be adjusted to indicate a more positive correlation between nodes and sub-nodes if a user approves [confirmation signal] of a suggested label for a non-normalized item descriptor group, or alternatively weights may be adjusted to indicate a less positive correlation or a more negative correlation if a user rejects a suggested label for a non-normalized object descriptor group. As an example of generating a dataset, enterprise data management system 130 may generate a training data set using suggested catalog labels and corresponding groups of non-normalized item descriptors that were approved by the user); 
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad  to include the label suggestion of Gupta in order to  improves efficiency and reduces computation cost for determining candidate item descriptors ([0001], Gupta); 

Claims  9  are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view of Ashtiani (US 10380093 B2).
With respect to claim 9 Gollapudi, Nagesh and Conrad do not explicitly disclose, however Ashtiani teaches a database table, a data dictionary and a data catalog, a structured file in a file system a no Structured Query Language (SQL) database, and a graph database (Ashtiani¶ claim 12. The processor-implemented method of claim 1, wherein the multiple predetermined data sources comprise a database table, a set of survey forms, a set of customer reports, and a set of social media postings.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad  to include the grouping of Ashtiani in order to reduce latency in data retrieval time (Col12ll16-21, Ashtiani); 

Claim(s)  10 is(are) rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh,  Conrad, Evermann, Quadracci  and Cohen in further view in further view of Cohen  (US 20140372248 A1).
With respect to claim 10 Gollapudi, Nagesh, Conrad Evermann and Quadracci and do not explicitly disclose, however Cohen teaches wherein the selected label is further ranked based on context extracted from the unstructured text document (Claim 1: weighting the influence of the unstructured text on the ranking based on a reputation of the identity of the consumer that submitted the unstructured text; and providing, for display on a document associated with the first product or the second product, a list that indicates the ranking of at least some of the plurality of products, wherein the list indicates the ranking of at least the first product, the second product, and the third product, wherein each of the above are performed by one or more processing devices.).
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad  in view of  specificity of Evermann in view of  scoring of Quadracci to include ranking of Cohen in order to  improve accuracy over simple scoring ([0080] Cohen)


Claim(s)  11 is(are) rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh,  Conrad, Evermann,  Quadracci  in further view in further view of Brill  (US 20050234904 A1).
With respect to claim 11 Gollapudi, Nagesh, Conrad Evermann, Quadracci do not explicitly disclose, however Brill teaches further comprising: sorting, by one or more processors, the data elements by a search score associated with each of the data elements and keeping only the data elements with a search score value above a search score threshold value ([0114] The server device can sort each search result based on their modified scores, and select one or more of the corresponding action datasets based on a defined threshold.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad  in view of  specificity of Evermann in view of  scoring of Quadracci to include threshold of Brill in order to  efficiently and effectively rank ([0011], Brill)



Claims  21 is rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view of Carlgren (US 20060229865 A1).
With respect to claim 21  Gollapudi teaches wherein the at least one structured data element is related to the at least one extracted unrecognized token from the unstructured text document based on [[matching an arrangement of alphanumeric character types and punctuation format]].
None of Gollapudi, Nagesh and Conrad explicitly disclose however Carlgren teaches matching an arrangement of alphanumeric character types and punctuation format ([0073] The last accepting state defines the boundary of the first token (letter g) and the type of the token: state 302 is for alphabetic sequences, state 303 alphanumeric sequences, state 304 integer numbers, and state 305 floating point numbers. Unmatched characters, such as punctuation, are separated by state 306.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad  to include arrangement of Carlgren in order to make accurate segmentation of text (Carlgren [0075] )

.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675. The examiner can normally be reached Monday-Thursday Alternate Fridays, 7:30-4:30 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ATHAR N PASHA/Primary Examiner, Art Unit 2657

Read full office action

Prosecution Timeline

Show 8 earlier events

Jan 05, 2026

Response after Non-Final Action

Jan 14, 2026

Request for Continued Examination

Jan 21, 2026

Response after Non-Final Action

Jan 28, 2026

Non-Final Rejection mailed — §103

Apr 06, 2026

Interview Requested

Apr 13, 2026

Applicant Interview (Telephonic)

Apr 14, 2026

Examiner Interview Summary

Apr 20, 2026

Response Filed

Precedent Cases

Applications granted by this same examiner with similar technology

18/573,622

Patent 12639516

CLASSIFICATION AND AUGMENTATION OF UNSTRUCTURED DATA FOR AUTOFILL

2y 5m to grant Granted May 26, 2026

17/749,578

Patent 12632445

Applied Artificial Intelligence Technology for Natural Language Generation Using a Story Graph and Configurable Structurer Code

4y 0m to grant Granted May 19, 2026

18/264,595

Patent 12614040

SIMULTANEOUS TRANSLATION DEVICE AND COMPUTER PROGRAM

2y 8m to grant Granted Apr 28, 2026

18/747,081

Patent 12608556

INTENTION RECOGNITION METHOD, DEVICE, ELECTRONIC DEVICE AND STORAGE MEDIUM BASED ON LARGE MODEL

1y 10m to grant Granted Apr 21, 2026

18/747,499

Patent 12608557

CHINESE DIALOGUE SYSTEM FOR COGNITIVELY IMPAIRED ADULTS BASED ON COGNITIVE STIMULATION THERAPY PRINCIPLES

1y 10m to grant Granted Apr 21, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

90%

Grant Probability

99%

With Interview (+16.4%)

2y 6m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 156 resolved cases by this examiner. Grant probability derived from career allowance rate.