Detailed Action
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The following action is in response to the communication(s) received on 09/30/2025 after a final rejection.
Claims 1, 11, and 20 have been amended.
Claims 1, 6, 11, 16, and 20 are pending.
Claims 1, 11, and 20 are independent claims.
Response to Arguments
Applicant’s arguments filed 09/30/2025 have been fully considered, but are not fully persuasive.
The rejections under 35 USC § 112 is withdrawn in view of the amendments to the claims.
With respect to the rejection under 35 USC § 103:
Applicant asserts that the “shared output layer” in the present invention is different from Chen’s joint layer and further asserts Chen does not teach the synchronous training of the intent pre-training task and the slot pre-training task, i.e., the training is done for one or the other, but not for both. Examiner respectfully disagrees, as [p.3 1st col 2nd ¶] clearly teaches both the intent prediction and slot prediction training is done using eq. (3), thus teaching the synchronous training and the “shared output layer” of the present invention.
Applicant’s argument regarding Pitler not teaching the above is unpersuasive in view of the reasons provided above.
The independent claims 11 and 20 are rejected for the same reasons provided above.
The dependent claims are rejected at least by virtue of dependency to their respective parent claims.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 6, 11, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al, “BERT for Joint Intent Classification and Slot Filling” (hereinafter Chen) in view of Pitler et al., “Using Word-Sense Disambiguation Methods to Classify Web Queries by Intent” (hereinafter Pitler)
Regarding Claim 1, Chen teaches:
A computer-implemented method for training a dialogue understanding model of natural language processing, comprising: (Chen [4.2 Training Details] We use English uncased BERT-Base model--, which has 12 layers, 768 hidden states, and 12 heads. BERT is pre-trained on BooksCorpus (800M words) and English Wikipedia (2,500M words)… [p.3 r. footnote] https://github.com/google-research/bert) (Note: the GitHub site gives directions to train and run the model on a computer, thus making this a computer-implemented method.)
obtaining a query input by a user into a search engine in the form of natural language, (Chen [p.1 left last ¶] Table 1 shows an example of intent classification and slot filling for user query “Find me a movie by Steven Spielberg”.
PNG
media_image1.png
171
378
media_image1.png
Greyscale
)
and performing joint training for a dialogue understanding pre-training task and a general pre-training task by using the dialogue understanding training data, to obtain a dialogue understanding model,
(Chen [3.2 Joint Intent Classification and Slot Filling] BERT can be easily extended to a joint intent classification and slot filling model. Based on the hidden state of the first special token ([CLS]), denoted h1, the intent is predicted as: (1)
PNG
media_image2.png
42
225
media_image2.png
Greyscale
For slot filling, we feed the final hidden states of other tokens h2, . . . , hT into a softmax layer to classify over the slot filling labels. To make this procedure compatible with the WordPiece tokenization, we feed each tokenized input word into a WordPiece tokenizer and use the hidden state corresponding to the first sub-token as input to the softmax classifier. (2)
PNG
media_image3.png
42
308
media_image3.png
Greyscale
where hn is the hidden state corresponding to the first sub-token of word xn. To jointly model intent classification and slot filling, the objective is formulated as: (3)
PNG
media_image4.png
65
267
media_image4.png
Greyscale
The learning objective is to maximize the conditional probability p(yi, ys|x). The model is finetuned end-to-end via minimizing the cross-entropy loss.)
wherein the dialogue understanding model includes: an input layer, comprising a part-of-speech vector layer, and a named entity vector layer; (Chen
PNG
media_image5.png
313
477
media_image5.png
Greyscale
) (Note: bottom layer correspond to the input layer; the top layer corresponds to the part-of-speech vector layer; the square nodes convert the parts of speech in the lowest layer to entities and thus correspond to the input entity vector layer and the part-of-speech vector layer)
a general pre-training layer, and an output layer, (Chen
PNG
media_image5.png
313
477
media_image5.png
Greyscale
)
(Note: middle (circular) layers = general pre-training layer, top layer = output layer),
the dialogue understanding training data includes: corpus data, and tag data corresponding to the corpus data, (Chen [4.1 Data] The ATIS dataset… is widely used in NLU research, which includes audio recordings of people making flight reservations… The training, development and test sets contain 4,478,500 and 893 utterances, respectively. There are 120 slot labels and 21 intent types for the training set.) (Note: slot labels and intent types are the tag data corresponding to the corpus data)
and the step of performing joint training for a dialogue understanding pre-training task and a general pre-training task by using the dialogue understanding training data, to obtain a dialogue understanding model comprises: converting the corpus data into input vectors by using the input layer; processing the input vectors by using the general pre-training layer to obtain hidden layer output vectors; (Chen
PNG
media_image5.png
313
477
media_image5.png
Greyscale
)
(Square nodes = converting input layers, layer above square nodes = processing input vectors)
processing the hidden layer output vectors by using the output layer to obtain prediction data; (Chen [3.2 Joint Intent Classification and Slot Filling] BERT can be easily extended to a joint intent classification and slot filling model. Based on the hidden state of the first special token ([CLS]), denoted h1, the intent is predicted as: (1) For slot filling, we feed the final hidden states of other tokens h2, . . . , hT into a softmax layer to classify over the slot filling labels.) (Note: Softmax layer = hidden layer output vectors)
calculating a loss function of the dialogue understanding pre-training task and a loss function of the general pre-training task according to the prediction data and corresponding tag data; (Chen [3.2 Joint Intent Classification and Slot Filling] To jointly model intent classification and slot filling, the objective is formulated as:
PNG
media_image4.png
65
267
media_image4.png
Greyscale
The learning objective is to maximize the conditional probability p(yi, ys|x))
(Note: i = intent, s = slot)
calculating a total loss function according to the loss function of the dialogue understanding pre-training task and the loss function of the general pre-training task; (Chen [3.2 Joint Intent Classification and Slot Filling] To jointly model intent classification and slot filling, the objective is formulated as:
PNG
media_image4.png
65
267
media_image4.png
Greyscale
The learning objective is to maximize the conditional probability p(yi, ys|x). The model is finetuned end-to-end via minimizing the cross-entropy loss.) (Note: end-to-end finetuning via cross-entropy loss is interpreted as calculating the total loss function of the model)
completing the training of the dialogue understanding model if the total loss function satisfies a preset convergence condition, (Chen [4.2 Training Details] For fine-tuning, all hyper-parameters are tuned on the development set. The maximum length is 50. The batch size is 128. Adam is used for optimization with an initial learning rate of 5e-5. The dropout probability is 0.1. The maximum number of epochs is selected from [1, 5, 10, 20, 30, 40].) (Note: maximum number of epochs is a parameter which sets the completion of the training of the dialogue understanding model)
wherein the dialogue understanding pre-training task includes: an intent pre-training task…; and a slot pre-training task in which the corpus data includes a second query and the tag data includes a corresponding hypernym of each character in the second query in a knowledge graph, (Chen
PNG
media_image6.png
246
494
media_image6.png
Greyscale
)
implementing a synchronous training of the intent pre-training task and the slot pre-training task using a shared output layer of the intent pre-training task and the slot pre-training task to achieve optimization of an intent classification and a slot labeling synchronously, (Chen
PNG
media_image5.png
313
477
media_image5.png
Greyscale
(top output nodes)
[p.3 1st col 2nd ¶]
PNG
media_image7.png
211
359
media_image7.png
Greyscale
)
(Note: the top output nodes correspond to the shared layer of the intent pre-training task and the slot pre-training task. Examiner interprets that the utilization of the shared output layer corresponds to achieving synchronous intent classification and slot labeling. Thus, the shared layer taught by Chen cited above corresponds to the synchronous training of the intent and slot pre-training tasks. In addition, the jointly model intent classification and slot filling objective to maximize the conditional probability corresponds to achieving optimization.)
wherein in the synchronous training, an output data of the shared output layer includes: intent data corresponding to a first hidden layer output vector of the output layer which uses the [CLS] position for the intent classification; (Chen [p.2 2nd col 3rd ¶]
PNG
media_image8.png
169
353
media_image8.png
Greyscale
)
and slot data corresponding to a second hidden layer output vector of the output layer which is used for the slot labelling after being subjected to Conditional Random Field (CRF) processing. (Chen [p.3 1st col 3rd ¶] Slot label predictions are dependent on predictions for surrounding words. It has been shown that structured prediction models can improve the slot filling performance, such as conditional random fields (CRF). Zhou and Xu (2015) improves semantic role labeling by adding a CRF layer for a BiLSTM encoder. Here we investigate the efficacy of adding CRF for modeling slot label dependencies, on top of the joint BERT model.) (Note: adding a CRF layer for a BiLSTM encoder corresponds to processing the second hidden layer output vector)
and wherein the output data of the shared output layer of the dialogue understanding model have different types of data in different stages of the dialogue understanding model, wherein: in the training stage of the dialogue understanding model, the output data is prediction data comprising intent prediction data or slot prediction data; in the application stage of the dialogue understanding model, the output data is a task processing result comprising an intent classification result or a slot labelling result. (Chen [p.3 left last ¶] The training, development and test sets contain 4,478, 500 and 893 utterances, respectively… [right 2nd ¶] For fine-tuning, all hyper-parameters are tuned on the development set.) (Note: the development set to fine-tune the model corresponds to the training stage with prediction output data comprising intent and slot prediction data; the test set to test the performance of the model corresponds to the application stage with the output data comprising intent and slot prediction result.)
Chen does not teach, but Pitler teaches:
obtaining at least one website link from the search engine as a search result; (Pitler [p1432 r. ¶5] The click graph was computed on a very large sample of logs computed well before the training period. There is an edge from a query q to a URL u if at least 10 users issued q and then clicked on u.) (Note: Url U corresponds to the at least one website link from the search engine as a search result)
in which the corpus data includes a first query and the tag data includes a name of a website clicked by the user and corresponding to the first query;
(Pitler [3 Using Click Logs as a Substitute for Annotation] Here we use the click logs as a large-scale source of intents. Logs from Microsoft’s Live Search are used for training and test purposes. Logs from May 2008 were used for training, and logs from June 2008 were used for testing. The logs distinguish four types of clicks: (a) search results, (b) ads, (c) spelling suggestions and (d) query suggestions. Some prototypical queries of each type are shown in Table 1. As mentioned above, clicks on ads are evidence for commercial intent; other types of clicks are evidence for other intents. The query, ebay official, is assumed to be commercial intent, because a large fraction of the clicks are on ads. In contrast, typos tend to have relatively more clicks on “did-you-mean” spelling suggestions.) (Note: the four types of clicks are the labeled intents corresponding to the first query, e.g., “ebay official”.)
PNG
media_image9.png
195
729
media_image9.png
Greyscale
)
obtaining dialogue understanding training data from the search result, wherein the dialogue understanding training data are obtained based on search engine data…, and the search engine data comprise a name of a website in a search result clicked by the user and corresponding to the query; (Pitler [p1432 r. ¶5] The click graph was computed on a very large sample of logs computed well before the training period. There is an edge from a query q to a URL u if at least 10 users issued q and then clicked on u.) (Note: the click graph comprising of an edge connecting query q to a URL (website) corresponds to the search engine data. If each edge is formed when at least 10 users make the query and click, then there exists a user that was clicked by the user in this datapoint.)
obtaining dialogue understanding training data from the search result, wherein the dialogue understanding training data are obtained based on search engine data, and the search engine data comprise a name of a website in the search result clicked by the user and corresponding to the query.)
Pitler and Chen are analogous because both are from the same field of endeavor of disambiguating a search user’s intent. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to analyze Pitler’s dataset, in particular, using the NLP model of Chen. The motivation to do so is that Chen provides good NLP performance (Chen [Conclusion] Experimental results show that our proposed joint BERT model outperforms BERT models modeling intent classification and slot filling separately, demonstrating the efficacy of exploiting the relationship between the two tasks), and Pitler’s research to analyze user intent from the queries benefits from more powerful NLP models (Pitler, [1.Classify Queries By Intent (CQI)] In this paper we consider the task of: given a class of queries, which types of answer… are likely to be clicked on? [4.2 Method 2: Using Click Graph Context to Generalize Beyond the Queries in the Training Set] To address the generalization concern, we propose a method inspired by Yarowsky (1994). Word sense disambiguation is a classic problem in natural language processing.)
Chen, via Chen/Pitler, further teaches:
and a knowledge graph in the form of a triplet showing a hypernym-hyponym relationship (Chen
PNG
media_image6.png
246
494
media_image6.png
Greyscale
) (Note: each slot (e.g. genre; directed_by) correspond to each hypernym; the matching semantic frame (e.g. movie; Steven Spielberg) correspond to the hyponym. Pitler teaches search engine data required for training the model in Chen, and Chen’s method of obtaining the knowledge graph for the dialogue understanding training data is based on the search engine data.)
Regarding Claim 6, the Chen/Pitler combination of Claim 1 teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Chen further teaches:
the method further comprises: obtaining dialogue understanding training data of domains in at least one domain of dialogue understanding; performing fine-tuning for the dialogue understanding model by using the dialogue understanding training data of the domains, to obtain dialogue understanding models of the domains. (Chen [p 1-2] Pre-trained models can be fine-tuned on NLP tasks and have achieved significant improvement over training on task-specific annotated data. More recently, a pretraining technique, Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018), was proposed and has created stateof-the-art models for a wide variety of NLP tasks, including question answering (SQuAD v1.1), natural language inference, and others.) (Note: question-answering task is a domain of dialogue understanding; SQuAD v1.1 for example has training data of such domain)
Independent Claim 11 recites an electronic device, comprising: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor (Chen [4.2 Training details] We use English uncased BERT-Base model, which has 12 layers, 768 hidden states, and 12 heads. BERT is pre-trained on BooksCorpus (800M words) … and English Wikipedia (2,500M words). https://github.com/google-research/bert) to perform precisely the methods of Claim 1. Thus, Claim 11 is rejected for reasons set forth in Claims 1. (Note: the GitHub site gives directions to train and run the model on a computer, which requires at least a processor, a memory, and instructions to run the code to train the model.)
Claim(s) 16, dependent on Claim 11, also recite the device configured to perform precisely the methods of Claim(s) 6, respectively, and thus are rejected for reasons set forth in these claim(s).
Independent Claim 20 recites a non-transitory computer readable storage medium with computer instructions stored thereon (Chen [4.2 Training details] We use English uncased BERT-Base model, which has 12 layers, 768 hidden states, and 12 heads. BERT is pre-trained on BooksCorpus (800M words) … and English Wikipedia (2,500M words). https://github.com/google-research/bert)) to perform precisely the methods of Claim 1, and thus it is rejected for reasons set forth in Claim 1. (Note: the GitHub site gives directions to train and run the model on a computer, which has at least a processor, a memory, and instructions to run the code to train the model.)
Conclusion
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSEP HAN whose telephone number is (703)756-1346. The examiner can normally be reached Mon-Fri 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.H./Examiner, Art Unit 2122 /KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122