Prosecution Insights
Last updated: April 19, 2026
Application No. 17/548,651

AUTOMATICALLY ASSIGN TERM TO TEXT DOCUMENTS

Non-Final OA §103
Filed
Dec 13, 2021
Examiner
PASHA, ATHAR N
Art Unit
2657
Tech Center
2600 — Communications
Assignee
International Business Machines Corporation
OA Round
3 (Non-Final)
90%
Grant Probability
Favorable
3-4
OA Rounds
2y 8m
To Grant
99%
With Interview

Examiner Intelligence

Grants 90% — above average
90%
Career Allow Rate
138 granted / 154 resolved
+27.6% vs TC avg
Strong +17% interview lift
Without
With
+17.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
18 currently pending
Career history
172
Total Applications
across all art units

Statute-Specific Performance

§101
21.9%
-18.1% vs TC avg
§103
49.4%
+9.4% vs TC avg
§102
16.9%
-23.1% vs TC avg
§112
5.2%
-34.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 154 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 1/14/25 has been entered. Response to Arguments Applicant’s amendments filed on 1/5/26 have been entered. In view of the amendments, the rejections for claims are maintained and provided in the response below. With this Final Office Action, claims 1-16, 18-21 stand rejected. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1, 4, 7, 12, 15, 18 and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi (US 9483559 B2) in further view of Nagesh (US 20210294979 A1 ) and Conrad (US 20120078950 A1). With respect to claims 1/12/20 Gollapudi teaches (claim 1) A computer-implemented method comprising (claim 12) A computer program product comprising (claim 12) one or more computer readable storage media, and program instructions stored on the one or more computer readable storage media to perform operations comprising (Col11ll62-Col4ll10 Computer storage media include volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Memory 504, removable storage 508, and non-removable storage 510 are all examples of computer storage media. Computer storage media include, but are not limited to, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing system 500. Any such computer storage media may be part of computing system 500.) (claim 20) a processor set; one or more computer readable storage media, and program instructions stored on the one or more computer readable storage media to cause the processor set to perform operations comprising (Col11ll62-Col4ll10 Computer storage media include volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Memory 504, removable storage 508, and non-removable storage 510 are all examples of computer storage media. Computer storage media include, but are not limited to, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing system 500. Any such computer storage media may be part of computing system 500.) receiving, one or more processors, an unstructured text document (Gollapudi¶ Col4ll15-24 The user may provide the query [unstructured text document]112 “fashion boots” to the search engine 150. `The search engine 150 and/or the provider 160 may then attempt to translate the terms of the query 112 into attribute values pairs that correspond to a category associated with the query 112.); extracting, by one or more processors, at least one unrecognized token from the unstructured text document (Gollapudi¶ Col4ll32-40 Thus, for example, for a query 112 related to the category of shoes, the free token “fashion” [unrecognized token] may be replaced with the attribute values [BRAND: GUCCI] or [BRAND: PRADA], ¶ Col5 ll1-7 As discussed further with respect to FIG. 2, the reformulation engine 140 may generate the reformulated query 115 using reformulation data 145. The reformulation data 145, for each category, may identify which free tokens are modifiers for the category. Which free tokens are modifiers may be determined by the reformulation engine 140 based on the browse trails of the search history data 153); identifying, by one or more processors, at least one structured data element in a predefined set of data sources, wherein the at least one structured data element is related to the at least one extracted unrecognized token from the unstructured text document (Gollapudi¶ Col4ll41-56 Thus, for example, for a query 112 related to the category of shoes, the free token “fashion” [unrecognized token] may be replaced with the attribute values [BRAND: GUCCI] [structured data element] or [BRAND: PRADA], ¶ Col5 ll1-7 As discussed further with respect to FIG. 2, the reformulation engine 140 may generate the reformulated query 115 using reformulation data 145. The reformulation data 145, for each category, may identify which free tokens are modifiers for the category. Which free tokens are modifiers may be determined by the reformulation engine 140 based on the browse trails of the search history data 153); relating, by one or more processors, a label associated with the identified at least one structured data element to the unstructured text document (Gollapudi¶ Col4ll41-56 Thus, for example, for a query 112 related to the category of shoes, the free token “fashion” [unrecognized token]may be replaced with the attribute values [BRAND: GUCCI] [structured data element]or [BRAND: PRADA], ¶ Col5 ll1-7 As discussed further with respect to FIG. 2, the reformulation engine 140 may generate the reformulated query 115 using reformulation data 145. The reformulation data 145, for each category, may identify which free tokens are modifiers for the category. Which free tokens are modifiers may be determined by the reformulation engine 140 based on the browse trails of the search history data 153.); Gollapudi does not explicitly disclose however Nagesh teaches training, by one or more processors, a machine-learning based application using the unstructured text document and the label, ([0023] For instance, consider that during a training phase of generating an embedding model, there will almost certainly be tokens (words or phrases) that are missing from (or are “unobserved” or “unrecognized” in) the training corpus. Thereafter, when the trained embedding model is used to process new input text (so as to generate semantic vectors for the new input text), the trained embedding model may receive missing or unobserved tokens as input. It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi to include training of Nagesh in order to update and optimize models (Nagesh [0027] ) None of Gollapudi and Nagesh explicitly disclose, however Conrad teaches identifying the at least one structured data element comprises searching for at least one data element in the predefined set of data sources, the at least one data element comprising metadata of at least one extracted natural language token (Conrad ¶ [0031] For example, the values "three million" or "3,000,000" may be replaced by the parameter token "AMOUNT" in the sentence, and the metadata for the token may include the value of 3,000,000. Similarly, the "$" or "dollars" is replaced by CURRENCY in the sentence, and the metadata for the token includes the value of $US. Reporting periods, such as "Q4 2009" or "full year 2009" are replaced by PERIOD. In another implementation, the metadata may store a token to lookup the value in a data structure (e.g., table). For example, the date "Sep. 30, 2009," may be replaced by the parameter variable "DATE" in the sentence, and the metadata for the token may be an index such as "253", where "253" is used to lookup the date value for "DATE" in a table. ) It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh to include metadata of Conrad in order to extract information with a high degree of accuracy and substantially faster speed (Conrad, [0043]). With respect to claims 7/18 Gollapudi teaches (i) generating, by one or more processors, a structured data element comprising extracted non-natural-language tokens as values, (ii) determining, by one or more processors, domain characteristics for the generated data element, and (iii) searching, by one or more processors, in a predefined set of data sources, for the structured data elements that share the same domain characteristics (Gollapudi¶ Col1ll4142-60 In an implementation, search history data such as browse trails are collected over time. The browse trails, including associated queries and domains, are processed to identify free tokens of the queries that are also modifiers.); With respect to claim 4 and 15 , Conrad further teaches wherein identifying the at least one structured data element further comprises searching for a value of at least one extracted non-natural-language token in the predefined set of data sources (Gollapudi¶ Col4ll15-24 The user may provide the query [112 “fashion boots” [non-natural language token] to the search engine 150. The search engine 150 and/or the provider 160 may then attempt to translate the terms of the query 112 into attribute values pairs that correspond to a category associated with the query 112.); Claims 2/13 are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view of Simske (US 7908279 B1 ). With respect to claim 2/13 Gollapudi, Nagesh and Conrad do not explicitly disclose, however Simske teaches wherein extracting the at least one unrecognized token from the unstructured text document further comprises: determining, by one or more processors, natural language elements and non-natural- language elements (Simske¶ [0017] If the document is originally electronic or the zoning analysis and OCR tools do not prepare the document adequately, other software tools may be used to prepare the document for keyword analysis, i.e., software tools are needed to separate words and non-words and record document layout information. The words and all other information related to each word are stored in arrays generated by software. It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad to include non-natural-language elements of Simske in order to increase speed and efficiency for accurate determination of recognizable terms. Claims 3/14 are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh, Conrad and Simske in further view of Tomobe (US 20190340670 A1). With respect to claims 3/14 Gollapudi, Nagesh, Conrad and Simske do not explicitly disclose, however Tomobe teaches grouping, by one or more processors, non-natural-language tokens into groups of tokens with similar characteristics (Tomobe¶[0041] Note that, although clustering is actually performed of the customer IDs and the product IDs, the clustering may be described as “perform clustering of the customers”, and “perform clustering of the products”, for convenience. Similarly, a customer ID cluster (a cluster of the customer IDs) and a product ID cluster (a cluster of the product IDs) may be referred to as a customer cluster and a product cluster, respectively, for convenience.); It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad in view of on-natural-language elements of Simske to include the grouping of Tomobe in order to simplify classification for non-natural tokens. Claims 5/16 are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view in further view of Evermann (US 20140365209 A1). With respect to claim 5/15 Gollapudi, Nagesh and Conrad do not explicitly disclose however Evermann teaches further comprising: determining, by one or more processors, a matching score value based on a number of the at least one unrecognized tokens and recognized tokens extracted from the unstructured text document that have been found in the data element and a specificity of the extracted tokens ([0128] Returning to FIG. 4, the natural language processor determines that the first candidate text string "show times for our go" corresponds to a "movie" domain [specificity], for example, because the words "show times" match words that are particularly salient to the "movie" domain. However, because it cannot determine the movie for which the user is requesting show times (e.g., because "our go" does not match any known movies), the intent deduction confidence score is relatively low (40%). On the other hand, the second candidate text string "show times for Argo" also corresponds to a movie domain, but has a high intent deduction confidence score (99%), for example, because the natural language processor recognized the word "Argo" as a known movie name.) It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad to include specificity of Evermann in order to properly tag unstructured data. Claims 6 are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad, Evermann in further view in further view of Quadracci (US 20130006610 A1). With respect to claim 6 Gollapudi, Nagesh and Conrad do not explicitly disclose however Evermann teaches further comprising: selecting, by one or more processors, the data element having a highest score value as the label for the unstructured text document ([0057] To process unstructured text and/or partially structured text from main data source 902, text processing tool 906 queries an associative memory application and/or applies at least one source regular expression pattern to the unstructured text and/or partially structured text. For example, in one embodiment, text processing tool 906 processes the unstructured text and/or partially structured text by querying the associative memory application with a segment of unstructured text and/or partially structured text, calculating a similarity score, and determining whether to tag the segment of unstructured text and/or partially structured text based on the similarity score.) It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad in view of specificity of Evermann to include scoring of Quadracci in order to effectively organize unstructured data ([0003] Quadracci) Claims 8/19 are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view in further view of Gupta (US 20210383453 A1). With respect to claim 8/19 Gollapudi, Nagesh and Conrad do not explicitly disclose however Gupta teaches outputting, by one or more processors, the related label as a label suggestion for the unstructured text document (Gupta¶ [0060] Suggested catalog labels 410 [suggested label] ]are associated with respective groups of non-normalized item descriptors [unstructured text] assigned to the corresponding suggested catalog label by enterprise data management system 410. In FIG. 4 GUI 400 includes a suggested catalog label 410 for “Butter Chicken” that is currently selected (as indicated by the grey highlight) and non-normalized item descriptor group 420 displayed based on the selection. As described above in relation to FIG. 3, enterprise data management system 130 may assign groups of non-normalized item descriptors 420 to the suggested catalog label “butter chicken” by inputting each of the non-normalized item descriptors into a machine learning pipeline including one or more models that classify the non-normalized item descriptors as one or more enterprise catalog items ); and receiving, by one or more processors, a confirmation signal confirming the label suggestion as the confirmed label for the unstructured text document (Gupta ¶[0063] In this case, weights may be adjusted to indicate a more positive correlation between nodes and sub-nodes if a user approves [confirmation signal] of a suggested label for a non-normalized item descriptor group, or alternatively weights may be adjusted to indicate a less positive correlation or a more negative correlation if a user rejects a suggested label for a non-normalized object descriptor group. As an example of generating a dataset, enterprise data management system 130 may generate a training data set using suggested catalog labels and corresponding groups of non-normalized item descriptors that were approved by the user); It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad to include the label suggestion of Gupta in order to improves efficiency and reduces computation cost for determining candidate item descriptors ([0001], Gupta); Claims 9 are rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view of Ashtiani (US 10380093 B2). With respect to claim 9 Gollapudi, Nagesh and Conrad do not explicitly disclose, however Ashtiani teaches a database table, a data dictionary and a data catalog, a structured file in a file system a no Structured Query Language (SQL) database, and a graph database (Ashtiani¶ claim 12. The processor-implemented method of claim 1, wherein the multiple predetermined data sources comprise a database table, a set of survey forms, a set of customer reports, and a set of social media postings.) It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad to include the grouping of Ashtiani in order to reduce latency in data retrieval time (Col12ll16-21, Ashtiani); Claim(s) 10 is(are) rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh, Conrad, Evermann, Quadracci and Cohen in further view in further view of Cohen (US 20140372248 A1). With respect to claim 10 Gollapudi, Nagesh, Conrad Evermann and Quadracci and do not explicitly disclose, however Cohen teaches wherein the selected label is further ranked based on context extracted from the unstructured text document (Claim 1: weighting the influence of the unstructured text on the ranking based on a reputation of the identity of the consumer that submitted the unstructured text; and providing, for display on a document associated with the first product or the second product, a list that indicates the ranking of at least some of the plurality of products, wherein the list indicates the ranking of at least the first product, the second product, and the third product, wherein each of the above are performed by one or more processing devices.). It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad in view of specificity of Evermann in view of scoring of Quadracci to include ranking of Cohen in order to improve accuracy over simple scoring ([0080] Cohen) Claim(s) 11 is(are) rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh, Conrad, Evermann, Quadracci in further view in further view of Brill (US 20050234904 A1). With respect to claim 11 Gollapudi, Nagesh, Conrad Evermann, Quadracci do not explicitly disclose, however Brill teaches further comprising: sorting, by one or more processors, the data elements by a search score associated with each of the data elements and keeping only the data elements with a search score value above a search score threshold value ([0114] The server device can sort each search result based on their modified scores, and select one or more of the corresponding action datasets based on a defined threshold.) It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad in view of specificity of Evermann in view of scoring of Quadracci to include threshold of Brill in order to efficiently and effectively rank ([0011], Brill) Claims 21 is rejected under 35 U.S.C. 103 as being unpatentable over Gollapudi, Nagesh and Conrad in further view of Carlgren (US 20060229865 A1). With respect to claim 21 Gollapudi teaches wherein the at least one structured data element is related to the at least one extracted unrecognized token from the unstructured text document based on [[matching an arrangement of alphanumeric character types and punctuation format]]. None of Gollapudi, Nagesh and Conrad explicitly disclose however Carlgren teaches matching an arrangement of alphanumeric character types and punctuation format ([0073] The last accepting state defines the boundary of the first token (letter g) and the type of the token: state 302 is for alphabetic sequences, state 303 alphanumeric sequences, state 304 integer numbers, and state 305 floating point numbers. Unmatched characters, such as punctuation, are separated by state 306.) It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the invention to modify labeling of Gollapudi in view of training of Nagesh in view of metadata of Conrad to include arrangement of Carlgren in order to make accurate segmentation of text (Carlgren [0075] ) . Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to ATHAR N PASHA whose telephone number is (408)918-7675. The examiner can normally be reached Monday-Thursday Alternate Fridays, 7:30-4:30 PT. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ATHAR N PASHA/Primary Examiner, Art Unit 2657
Read full office action

Prosecution Timeline

Dec 13, 2021
Application Filed
Oct 10, 2023
Response after Non-Final Action
Feb 22, 2025
Non-Final Rejection — §103
May 15, 2025
Interview Requested
May 22, 2025
Applicant Interview (Telephonic)
Aug 27, 2025
Response Filed
Oct 28, 2025
Examiner Interview Summary
Oct 30, 2025
Final Rejection — §103
Jan 05, 2026
Response after Non-Final Action
Jan 14, 2026
Request for Continued Examination
Jan 21, 2026
Response after Non-Final Action
Jan 24, 2026
Non-Final Rejection — §103
Apr 06, 2026
Interview Requested
Apr 13, 2026
Applicant Interview (Telephonic)
Apr 14, 2026
Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12596882
COMPLIANCE DETECTION USING NATURAL LANGUAGE PROCESSING
2y 5m to grant Granted Apr 07, 2026
Patent 12586563
Method, System and Apparatus for Understanding and Generating Human Conversational Cues
2y 5m to grant Granted Mar 24, 2026
Patent 12579173
SYSTEMS AND METHODS FOR DYNAMICALLY PROVIDING INTELLIGENT RESPONSES
2y 5m to grant Granted Mar 17, 2026
Patent 12566921
GAZETTEER INTEGRATION FOR NEURAL NAMED ENTITY RECOGNITION
2y 5m to grant Granted Mar 03, 2026
Patent 12547844
INTELLIGENT MODEL SELECTION SYSTEM FOR STYLE-SPECIFIC DIGITAL CONTENT GENERATION
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
90%
Grant Probability
99%
With Interview (+17.0%)
2y 8m
Median Time to Grant
High
PTA Risk
Based on 154 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month