Last updated: May 29, 2026
Application No. 18/501,982
ZERO-SHOT FORM ENTITY QUERY FRAMEWORK

Non-Final OA §103
Filed
Nov 03, 2023
Priority
Nov 07, 2022 — provisional 63/382,593
Examiner
BARNES JR, CARL E
Art Unit
2178
Tech Center
2100 — Computer Architecture & Software
Assignee
Google LLC
OA Round
1 (Non-Final)
This examiner grants 32% of cases after interview

— +24.2% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 205 resolved cases, 2023–2026
Examiner Intelligence

BARNES JR, CARL E View full profile →
Grants only 32% of cases
Career Allowance Rate
66 granted / 205 resolved
-22.8% vs TC avg
Strong +24% interview lift
Without
With
+24.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 10m
Avg Prosecution
23 currently pending
Career history
238
Total Applications
across all art units
Statute-Specific Performance

§101
0.2%
-39.8% vs TC avg
§103
96.7%
+56.7% vs TC avg
§102
2.3%
-37.7% vs TC avg
§112
0.4%
-39.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 205 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 08/08/2024 was filed.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1-2, 9-12 and 19-22 are rejected under 35 U.S.C. 103 as being unpatentable over Meng (US 20230022845 A1, Filed Date: Jul. 13, 2021) in view RODRIGUEZ (US 20210357409 A1, Filed Date: May 18, 2020).
Regarding independent claim 1, Meng teaches: A computer-implemented method executed by data processing hardware that causes the data processing hardware to perform operations comprising: (Meng − [0130] With reference to FIG. 15, computing device 1500 includes bus 10 that directly or indirectly couples the following devices: memory 12, one or more processors 14, one or more presentation components 16, input/output (I/O) ports 18, input/output components 20, and illustrative power supply 22.)
obtaining a document comprising a series of textual fields, (Meng − [0109] FIG. 10 is a schematic diagram of a system 1000 illustrating document extraction, according to some embodiments. The consumer application inbox 1015 receives one or more user uploaded documents 1003 via a mobile device 1005 (based on a user taking a picture of a document), a scanner 1007, accounting software APIs 1009 (e.g., using a web application or app), email 1011, and/or any other suitable method 1013 (e.g., via a smartwatch, sensor, etc.).) 
the series of textual fields comprising a plurality of entities, each entity of the plurality of entities representing information associated with a predefined category; (Meng − [0002] a document (e.g., an invoice) correspond to. For example, some embodiments employ Question Answering systems to predict that a particular number value corresponds to a date, a billing amount, a name of business entity, an invoice number, or the like. [0035]  A “document” as described herein refers to entire object or set of pages that are associated with or belong to a particular event (e.g., a work duty job or series of tasks) or entity (e.g., a company).  [0054] Named Entity Recognition (NER). NER is an information extraction technique that identifies and classifies elements or “entities” in natural language text into predefined categories. Such predefined categories may be indicated in corresponding tags or labels. Entities can be, for example, names of people, specific organizations, specific locations, specific times, specific quantities, specific monetary price values, specific percentages, specific pages, and the like Likewise, the corresponding tags or labels can be specific people, organizations, location, time, price (or other invoice data) and the like.) An invoice document with series of entity extracted using NER to identify and classify in predefined categories.
generating, using the document, a series of tokens representing the series of textual fields; (Meng – [0039-0040] In some embodiments, the object recognition component 104 includes an Object Character Recognition (OCR) component that is configured to detect natural language characters and covert such characters into a machine-readable format (e.g., so that it can be processed via a machine learning model). [0052] In some embodiments, the pre-training component 108 uses NLP by tokenizing text (e.g., blocks) on pages into their constituent words, numbers, symbols, and some or each of the words are tagged with a part-of-speech (POS) identifier. “Tokenization” or parsing in various embodiments corresponds to a computer-implemented process that segments the content into words, sentences, symbols, character sequence, and/or other elements of the content. [0094] Each word is represented as a token,). Examiner Note: OCR produce text tokens,
generating an entity prompt comprising the series of tokens and one of the plurality of entities; (Meng – [0025-0026] In Question Answering tasks, models receive a question regarding text content (e.g., what date is the invoice amount due?”), and mark or tag the beginning and end of the answer (e.g., underline the value “$13,500”) in a document. . [0062] the context, question, and/or answer pair generator 112 builds context question pairs [0064] For example, the inference component 114 can take, as input, the context-question pairs generated by the context, question and then predict answers to the particular questions via the answer generator 114-1.) Examiner Note: inference component is entity prompt to take inputs (queries) and builds context question pairs
generating a schema prompt comprising a schema associated with the document; (Meng – [0039] [0112] the document image and place the extracted characters in another format, such as JSON. At step 3, the OCR engine returns a JSON containing the words position related information of each word. The JSON output also contains larger semantic structures (e.g., phrases, paragraphs, blocks) as well as smaller segments, such as letters and break types (e.g., spaces, tabs, etc.).) Examiner Note: JSON output of structure of the document is a schema prompt
determining, using an entity extraction model a location of the one of the plurality of entities among the series of tokens; ()
and extracting, from the document, the one of the plurality of entities using the location of the one of the plurality of entities. (Meng – [0045] The coordinate module 106-2 is generally responsible for sorting each token in each block based on the coordinates of each token within a corresponding document. A “token” as described herein refers to an individual element of a document, such as a word, number, sign, symbol, and/or the like. For example, the coordinate module 106-2 can sort the tokens in each block based on the X (left/right) and Y (top/bottom) coordinates of each token (each token can be represented as [‘word,’ xmin, xmax, ymin, ymax]) to make sure the tokens in the same line in the block will appear together as the order in the document. [0105] FIG. 8 illustrates the prediction 826 under the “due date” field and an arrow 828 pointing to the location of the prediction within the invoice 801. Similar functionality is performed for the predictions 820, 814, 808, and 806, via the arrows 822, 816, 810, and 804 respectively, which point to the answers 824, 818, 812, and 802 respectively.)
Meng does not explicitly teach: generating a model query
However, RODRIGUEZ teaches: generating an entity prompt comprising the series of tokens and one of the plurality of entities; (RODRIGUEZ – [0032] accessing the data includes the scenario where the user speaks or types one or more queries. For example, the user could say “my open accounts for this month in California” and the database system translates these natural language utterances of the user into one or more queries applied to the database to get the results of the user queries. [0077] FIG. 4 is a diagram of an example of a conceptual query according to some embodiments. In this example, user query 402 is received by a natural language search system) 
generating a model query comprising the entity prompt and the schema prompt; (RODRIGUEZ – [0046] Each table generally contains one or more data categories logically arranged as columns or fields in a viewable schema. [0077] natural language search system translates the concepts into SQL statements 416 that can be applied to the database to generate results of the query. [0081] Fig. 4, SQL generator 512 reads the tagged entity list, resolves one or more tags from the tagged entity list (such as account ID), and generates SQL statements 514 representing user query 502) Examiner Note: SQL generator (model) generate the text to sql statement using schema and entity prompts.
determining, using an entity extraction model and the model query, a location of the one of the plurality of entities among the series of tokens; (RODRIGUEZ – [0077] For example, a company 406 field for “Acme” may be resolved to a specific account ID for Acme (e.g., account ID 0000012345) 414. [0078] Fig. 5 SQL statements 514 are input to database management system 516 to query database 518. Query results 520 are then returned. Thus, application of the spoken user query 502 to database 518 produces query results 520. 0079] In an embodiment, natural language search system 530 includes multiple components. Preprocessor 504 accepts user query 504, parses the query, and applies preprocessing logic that includes: tokenization, For example, preprocessor 504 may translate a query such as “(my) high value! marcus steele cases in san francisco . . . ” into a translated query such as “my high value marcus steele cases in San Francisco.”) 
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng and RODRIGUEZ as each inventions relates to document extraction system. Adding the teaching of RODRIGUEZ provides Meng with a SQL generator. One of ordinary skill in the art would have been motivated to improve and reduce time consuming task such as of generating large amounts NER model queries.
Regarding dependent claim 2, depends on claim 1, Meng teaches: wherein the operations further comprise, prior to determining the location of the one of the plurality of entities among the series of tokens: pre-training the entity extraction model using generalized training samples; (Meng – [0114] the context is derived from user-identified documents. As ground truth data) and after pre-training the entity extraction model, fine-turning the entity extraction model using a plurality of training documents. (Meng – [0016] FIG. 12 is a flow diagram of an example process for fine-tuning a machine learning model. Fine-tuning takes a model that has already been trained (e.g., via the pre-training component 108) for a particular task and then fine-tunes or tweaks it to make it perform a second similar task. For example, a deep learning network that has been trained to understand natural language and context can be fine-tuned by training using a Question Answer system on invoice documents, which is described in more detail below.)
Regarding dependent claim 9, depends on claim 1, Meng teaches: wherein generating the series of tokens representing the series of textual fields comprises determining the series of tokens using an optical character recognition (OCR) model (Meng – [0039-0040] In some embodiments, the object recognition component 104 includes an Object Character Recognition (OCR) component that is configured to detect natural language characters and covert such characters into a machine-readable format (e.g., so that it can be processed via a machine learning model). [0052] In some embodiments, the pre-training component 108 uses NLP by tokenizing text (e.g., blocks) on pages into their constituent words, numbers, symbols, and some or each of the words are tagged with a part-of-speech (POS) identifier. “Tokenization” or parsing in various embodiments corresponds to a computer-implemented process that segments the content into words, sentences, symbols, character sequence, and/or other elements of the content. [0094] Each word is represented as a token,). Examiner Note: OCR produce text tokens,
Regarding dependent claim 10, depends on claim 1, Meng teaches: where the operations further comprise, determining, using the location of the one of the plurality of entities, a value associated with the one of the plurality of entities. (Meng – [0045] The coordinate module 106-2 is generally responsible for sorting each token in each block based on the coordinates of each token within a corresponding document. A “token” as described herein refers to an individual element of a document, such as a word, number, sign, symbol, and/or the like. For example, the coordinate module 106-2 can sort the tokens in each block based on the X (left/right) and Y (top/bottom) coordinates of each token (each token can be represented as [‘word,’ xmin, xmax, ymin, ymax]) to make sure the tokens in the same line in the block will appear together as the order in the document. [0105] FIG. 8 illustrates the prediction 826 under the “due date” field and an arrow 828 pointing to the location of the prediction within the invoice 801. Similar functionality is performed for the predictions 820, 814, 808, and 806, via the arrows 822, 816, 810, and 804 respectively, which point to the answers 824, 818, 812, and 802 respectively.)
Regarding independent claim 11, is directed to a system. Claim 11 have similar/same technical features/limitations as claim 1. Claim 11 is rejected under the same rational.
Regarding dependent claim 12, depends on claim 11, Meng teaches: wherein the operations further comprise, prior to determining the location of the one of the plurality of entities among the series of tokens: pre-training the entity extraction model using generalized training samples; (Meng – [0114] the context is derived from user-identified documents. As ground truth data) and after pre-training the entity extraction model, fine-turning the entity extraction model using a plurality of training documents. (Meng – [0016] FIG. 12 is a flow diagram of an example process for fine-tuning a machine learning model. Fine-tuning takes a model that has already been trained (e.g., via the pre-training component 108) for a particular task and then fine-tunes or tweaks it to make it perform a second similar task. For example, a deep learning network that has been trained to understand natural language and context can be fine-tuned by training using a Question Answer system on invoice documents, which is described in more detail below.)
Regarding dependent claim 19, depends on claim 11, Meng teaches: wherein generating the series of tokens representing the series of textual fields comprises determining the series of tokens using an optical character recognition (OCR) model (Meng – [0039-0040] In some embodiments, the object recognition component 104 includes an Object Character Recognition (OCR) component that is configured to detect natural language characters and covert such characters into a machine-readable format (e.g., so that it can be processed via a machine learning model). [0052] In some embodiments, the pre-training component 108 uses NLP by tokenizing text (e.g., blocks) on pages into their constituent words, numbers, symbols, and some or each of the words are tagged with a part-of-speech (POS) identifier. “Tokenization” or parsing in various embodiments corresponds to a computer-implemented process that segments the content into words, sentences, symbols, character sequence, and/or other elements of the content. [0094] Each word is represented as a token,). Examiner Note: OCR produce text tokens,
Regarding dependent claim 20, depends on claim 11, Meng teaches: where the operations further comprise, determining, using the location of the one of the plurality of entities, a value associated with the one of the plurality of entities. (Meng – [0045] The coordinate module 106-2 is generally responsible for sorting each token in each block based on the coordinates of each token within a corresponding document. A “token” as described herein refers to an individual element of a document, such as a word, number, sign, symbol, and/or the like. For example, the coordinate module 106-2 can sort the tokens in each block based on the X (left/right) and Y (top/bottom) coordinates of each token (each token can be represented as [‘word,’ xmin, xmax, ymin, ymax]) to make sure the tokens in the same line in the block will appear together as the order in the document. [0105] FIG. 8 illustrates the prediction 826 under the “due date” field and an arrow 828 pointing to the location of the prediction within the invoice 801. Similar functionality is performed for the predictions 820, 814, 808, and 806, via the arrows 822, 816, 810, and 804 respectively, which point to the answers 824, 818, 812, and 802 respectively.)
Regarding independent claim 21, is directed to a user device comprising: (Meng − Fig. 10, [0134] I/O ports 18 allow computing device 800;)
a display; data processing hardware in communication with the display; (Meng – [0134] displays on the computing device 1500.) and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising: (Meng − [0130] With reference to FIG. 15, computing device 1500 includes bus 10 that directly or indirectly couples the following devices: memory 12, one or more processors 14, one or more presentation components 16, input/output (I/O) ports 18, input/output components 20, and illustrative power supply 22.) Claim 21 have similar/same technical features/limitations as claim 1. Claim 21 is rejected under the same rational.
Regarding dependent claim 22, depends on claim 21, Meng teaches: wherein the operations further comprise, prior to determining the location of the one of the plurality of entities among the series of tokens: pre-training the entity extraction model using generalized training samples; (Meng – [0114] the context is derived from user-identified documents. As ground truth data) and after pre-training the entity extraction model, fine-turning the entity extraction model using a plurality of training documents. (Meng – [0016] FIG. 12 is a flow diagram of an example process for fine-tuning a machine learning model. Fine-tuning takes a model that has already been trained (e.g., via the pre-training component 108) for a particular task and then fine-tunes or tweaks it to make it perform a second similar task. For example, a deep learning network that has been trained to understand natural language and context can be fine-tuned by training using a Question Answer system on invoice documents, which is described in more detail below.)

Claim(s) 3-6, 13-16, and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over Meng and RODRIGUEZ as applied to claim 2, 12  and 22 above, and further in view of Bangalore (US 8566102 B1, Filed Date: Nov. 6, 2002)
Regarding dependent claim 3, depends on claim 2, Meng does not explicitly teach: generalized training samples comprise data from public websites
However, Bangalore teaches: wherein the generalized training samples comprise data from public websites. (Bangalore – [Col. 4 ll. 35-37  using the prior knowledge contained in a company web-site or elsewhere, to deploy a spoken dialog service. Elsewhere is public web-site)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 4, depends on claim 3, Meng does not explicitly teach: generalized training samples comprise data from public websites
However, Bangalore teaches: wherein each respective generalized training sample comprises: a respective training entity prompt associated with a respective public website; and a respective training schema prompt associated with the respective public website. (Bangalore – [Col. 4 ll. 35-37  using the prior knowledge contained in a company web-site or elsewhere, to deploy a spoken dialog service. Elsewhere is public web-site)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 5, depends on claim 4, Meng does not explicitly teach: an HTML tag of the respective public website
However, Bangalore teaches: wherein: each respective training entity prompt comprises an HTML tag of the respective public website; and each respective training schema prompt comprises a domain of the respective public website. (Bangalore – [Col. 7 ll. 35-38, 45-46] Web documents enclose all texts in a hierarchy of tags that determine the appearance, attributes, functionalities, importance, degrees and mutual relationship of text within the web-page. a web-page is represented with 7 features: (1) structure_code, (2) tag, (3) parent_tag, (4) text, (5) color, (6) size, and (7) link.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 6, depends on claim 3, Meng does not explicitly teach: wherein the operations further comprise: extracting, from the public websites, entity data and schema data; generating, from the entity data, each respective training entity prompt; and generating, from the schema data, each respective training schema prompt.
However, Bangalore teaches: extracting, from the public websites, entity data and schema data; generating, from the entity data, each respective training entity prompt; and generating, from the schema data, each respective training schema prompt. (Bangalore – [Col. 7 ll. 35-38, 45-46] The first step may comprise extracting three consequent text segments T1, T2, T3 from a web-page.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 13, depends on claim 12, Meng does not explicitly teach: generalized training samples comprise data from public websites
However, Bangalore teaches: wherein the generalized training samples comprise data from public websites. (Bangalore – [Col. 4 ll. 35-37  using the prior knowledge contained in a company web-site or elsewhere, to deploy a spoken dialog service. Elsewhere is public web-site)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 14, depends on claim 13, Meng does not explicitly teach: generalized training samples comprise data from public websites
However, Bangalore teaches: wherein each respective generalized training sample comprises: a respective training entity prompt associated with a respective public website; and a respective training schema prompt associated with the respective public website. (Bangalore – [Col. 4 ll. 35-37  using the prior knowledge contained in a company web-site or elsewhere, to deploy a spoken dialog service. Elsewhere is public web-site)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 15, depends on claim 14, Meng does not explicitly teach: an HTML tag of the respective public website
However, Bangalore teaches: wherein: each respective training entity prompt comprises an HTML tag of the respective public website; and each respective training schema prompt comprises a domain of the respective public website. (Bangalore – [Col. 7 ll. 35-38, 45-46] Web documents enclose all texts in a hierarchy of tags that determine the appearance, attributes, functionalities, importance, degrees and mutual relationship of text within the web-page. a web-page is represented with 7 features: (1) structure_code, (2) tag, (3) parent_tag, (4) text, (5) color, (6) size, and (7) link.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 16, depends on claim 13, Meng does not explicitly teach: wherein the operations further comprise: extracting, from the public websites, entity data and schema data; generating, from the entity data, each respective training entity prompt; and generating, from the schema data, each respective training schema prompt.
However, Bangalore teaches: extracting, from the public websites, entity data and schema data; generating, from the entity data, each respective training entity prompt; and generating, from the schema data, each respective training schema prompt. (Bangalore – [Col. 7 ll. 35-38, 45-46] The first step may comprise extracting three consequent text segments T1, T2, T3 from a web-page.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 23, depends on claim 22, Meng does not explicitly teach: generalized training samples comprise data from public websites
However, Bangalore teaches: wherein the generalized training samples comprise data from public websites. (Bangalore – [Col. 4 ll. 35-37  using the prior knowledge contained in a company web-site or elsewhere, to deploy a spoken dialog service. Elsewhere is public web-site)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 24, depends on claim 23, Meng does not explicitly teach: generalized training samples comprise data from public websites
However, Bangalore teaches: wherein each respective generalized training sample comprises: a respective training entity prompt associated with a respective public website; and a respective training schema prompt associated with the respective public website. (Bangalore – [Col. 4 ll. 35-37  using the prior knowledge contained in a company web-site or elsewhere, to deploy a spoken dialog service. Elsewhere is public web-site)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Bangalore as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.

Claim(s) 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Meng and RODRIGUEZ as applied to claims 2 and 12 above, and further in view of Chen (US 11514321 B1, Filed Date: Jun. 12, 2020).
Regarding dependent claim 7, depends on claim 2, Meng teaches: training data, but does not explicitly teach: training samples are not human annotated; training documents are human annotated.
However, Chen teaches: wherein: the generalized training samples are not human annotated; and the plurality of training documents are human annotated. (Chen − [Col. 7 ll. 35-39] provide a data source 140; Labels for at least a subset of the candidate entity pairs may be obtained in various embodiments, e.g., from human annotators and/or automated annotators.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Chen as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.
Regarding dependent claim 17, depends on claim 12, Meng teaches: training data, but does not explicitly teach: training samples are not human annotated; training documents are human annotated.
However, Chen teaches: wherein: the generalized training samples are not human annotated; and the plurality of training documents are human annotated. (Chen − [Col. 7 ll. 35-39] provide a data source 140; Labels for at least a subset of the candidate entity pairs may be obtained in various embodiments, e.g., from human annotators and/or automated annotators.)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Chen as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve labor-intensive manual task when generating dialog service.

Claim(s) 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Meng and RODRIGUEZ as applied to claims 1 and 11 above, and further in view of Johnson (US 20210082425 A1, Filed Date: Aug. 3, 2020).
Regarding dependent claim 8, depends on claim 1, Meng does not explicitly teach: machine learning model but does not explicitly teach zero-shot machine learning model
However, Johnson teaches: wherein the entity extraction model comprises a zero-shot machine learning model. (Johnson − [0071] The dialog system uses zero-shot learning techniques to train the machine learning model(s).)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Johnson as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve the efficiency of training and applying machine learning models for dialog processing tasks.
Regarding dependent claim 18, depends on claim 11, Meng does not explicitly teach: machine learning model but does not explicitly teach zero-shot machine learning model
However, Johnson teaches: wherein the entity extraction model comprises a zero-shot machine learning model. (Johnson − [0071] The dialog system uses zero-shot learning techniques to train the machine learning model(s).)
Accordingly, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have combined the teaching of Meng, RODRIGUEZ and Johnson as each inventions relates to document extraction system. One of ordinary skill in the art would have been motivated to improve the efficiency of training and applying machine learning models for dialog processing tasks.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Wanjun Zhong, Improving Task Generalization via Unified Schema Prompt.
Srinivasan Iyer, Learning a Neural Semantic Parser from User Feedback.
Zifeng Wang, Learning to Prompt for Continual Learning.
Tianyu Gao, Making Pre-trained Language Models better Few-Shot.
Geewook Kim, OCR-free document understanding transformer.
Zifeng Wang, QueryForm, A Simple Zero-shot Form Query Framework.
Naihaw Deng, Recent Advances in Text-to-SQL
CRISTESCU, US 20210012102 A1, Automatic Data Extraction from Document Image.
TORRES, US 20210034700 A1, Automated Bounding Box Detection and Text Segmentation.
Tata, US 20210374395 A1, Information Extraction From Form-Like Documents.
Avadhani, US 20230267274 A1, Mapping Entities in Unstructured Text Document via Entity Correction and Entity Resolution.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CARL E BARNES JR whose telephone number is (571)270-3395. The examiner can normally be reached Monday-Friday 9am-6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached at (571) 272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/CARL E BARNES JR/Examiner, Art Unit 2178                                                                                                                                                                                                        
/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178
Read full office action
Prosecution Timeline

Nov 03, 2023
Application Filed
Jan 13, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/898,903
Patent 12639806
MEDICAL SYSTEM, INFORMATION PROCESSING METHOD, AND COMPUTER-READABLE MEDIUM
3y 9m to grant Granted May 26, 2026
17/289,673
Patent 12614280
SYSTEM FOR ESTIMATING PRIMARY OPEN-ANGLE GLAUCOMA LIKELIHOOD
5y 0m to grant Granted Apr 28, 2026
17/953,132
Patent 12584932
SLIDE IMAGING APPARATUS AND A METHOD FOR IMAGING A SLIDE
3y 6m to grant Granted Mar 24, 2026
16/871,512
Patent 12541640
COMPUTING DEVICE FOR MULTIPLE CELL LINKING
5y 8m to grant Granted Feb 03, 2026
16/262,443
Patent 12536464
SYSTEM FOR CONSTRUCTING EFFECTIVE MACHINE-LEARNING PIPELINES WITH OPTIMIZED OUTCOMES
6y 12m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
32%
Grant Probability
56%
With Interview (+24.2%)
3y 10m (~1y 4m remaining)
Median Time to Grant
Low
PTA Risk
Based on 205 resolved cases by this examiner. Grant probability derived from career allowance rate.