Last updated: May 29, 2026

Application No. 17/348,270

METHOD, APPARATUS, DEVICE AND STORAGE MEDIUM FOR TRAINING DIALOGUE UNDERSTANDING MODEL

Non-Final OA §103

Filed

Jun 15, 2021

Priority

Dec 18, 2020 — CN 202011503354.X

Examiner

HAN, JOSEP

Art Unit

2122

Tech Center

2100 — Computer Architecture & Software

Assignee

Beijing Baidu Netcom Science And Technology Co. Ltd.

OA Round

6 (Non-Final)

Interview Optional

— +25.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 38% grant rate with +25.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 16 resolved cases, 2023–2026

Examiner Intelligence

HAN, JOSEP View full profile →

Grants only 38% of cases

Career Allowance Rate

6 granted / 16 resolved

-17.5% vs TC avg

Strong +25% interview lift

Without

With

+25.0%

Interview Lift

resolved cases with interview

Typical timeline

4y 2m

Avg Prosecution

14 currently pending

Career history

Total Applications

across all art units

Statute-Specific Performance

§101

6.8%

-33.2% vs TC avg

§103

81.1%

+41.1% vs TC avg

§102

9.9%

-30.1% vs TC avg

§112

0.8%

-39.2% vs TC avg

Black line = Tech Center average estimate • Based on career data from 16 resolved cases

Office Action

§103

Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

The following action is in response to the communication(s) received on 09/30/2025 after a final rejection.
Claims 1, 11, and 20 have been amended.
Claims 1, 6, 11, 16, and 20 are pending.
Claims 1, 11, and 20 are independent claims. 


Response to Arguments
Applicant’s arguments filed 09/30/2025 have been fully considered, but are not fully persuasive. 
The rejections under 35 USC § 112 is withdrawn in view of the amendments to the claims.

With respect to the rejection under 35 USC § 103:
Applicant asserts that the “shared output layer” in the present invention is different from Chen’s joint layer and further asserts Chen does not teach the synchronous training of the intent pre-training task and the slot pre-training task, i.e., the training is done for one or the other, but not for both. Examiner respectfully disagrees, as [p.3 1st col 2nd ¶] clearly teaches both the intent prediction and slot prediction training is done using eq. (3), thus teaching the synchronous training and the “shared output layer” of the present invention.
Applicant’s argument regarding Pitler not teaching the above is unpersuasive in view of the reasons provided above.
The independent claims 11 and 20 are rejected for the same reasons provided above.
The dependent claims are rejected at least by virtue of dependency to their respective parent claims.



Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1, 6, 11, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al, “BERT for Joint Intent Classification and Slot Filling” (hereinafter Chen) in view of Pitler et al., “Using Word-Sense Disambiguation Methods to Classify Web Queries by Intent” (hereinafter Pitler)

Regarding Claim 1, Chen teaches:
A computer-implemented method for training a dialogue understanding model of natural language processing, comprising: (Chen [4.2 Training Details] We use English uncased BERT-Base model--, which has 12 layers, 768 hidden states, and 12 heads. BERT is pre-trained on BooksCorpus (800M words) and English Wikipedia (2,500M words)… [p.3 r. footnote] https://github.com/google-research/bert) (Note: the GitHub site gives directions to train and run the model on a computer, thus making this a computer-implemented method.)
obtaining a query input by a user into a search engine in the form of natural language, (Chen [p.1 left last ¶] Table 1 shows an example of intent classification and slot filling for user query “Find me a movie by Steven Spielberg”.
 
    PNG
    media_image1.png
    171
    378
    media_image1.png
    Greyscale
) 
and performing joint training for a dialogue understanding pre-training task and a general pre-training task by using the dialogue understanding training data, to obtain a dialogue understanding model,
(Chen [3.2 Joint Intent Classification and Slot Filling] BERT can be easily extended to a joint intent classification and slot filling model. Based on the hidden state of the first special token ([CLS]), denoted h1, the intent is predicted as: (1) 
    PNG
    media_image2.png
    42
    225
    media_image2.png
    Greyscale
For slot filling, we feed the final hidden states of other tokens h2, . . . , hT into a softmax layer to classify over the slot filling labels. To make this procedure compatible with the WordPiece tokenization, we feed each tokenized input word into a WordPiece tokenizer and use the hidden state corresponding to the first sub-token as input to the softmax classifier. (2) 
    PNG
    media_image3.png
    42
    308
    media_image3.png
    Greyscale
 where hn is the hidden state corresponding to the first sub-token of word xn. To jointly model intent classification and slot filling, the objective is formulated as: (3) 
    PNG
    media_image4.png
    65
    267
    media_image4.png
    Greyscale
The learning objective is to maximize the conditional probability p(yi, ys|x). The model is finetuned end-to-end via minimizing the cross-entropy loss.)
 wherein the dialogue understanding model includes: an input layer, comprising a part-of-speech vector layer, and a named entity vector layer; (Chen

    PNG
    media_image5.png
    313
    477
    media_image5.png
    Greyscale
) (Note: bottom layer correspond to the input layer; the top layer corresponds to the part-of-speech vector layer; the square nodes convert the parts of speech in the lowest layer to entities and thus correspond to the input entity vector layer and the part-of-speech vector layer)
a general pre-training layer, and an output layer, (Chen 
 
    PNG
    media_image5.png
    313
    477
    media_image5.png
    Greyscale
) 
(Note: middle (circular) layers = general pre-training layer, top layer = output layer),
the dialogue understanding training data includes: corpus data, and tag data corresponding to the corpus data, (Chen [4.1 Data] The ATIS dataset… is widely used in NLU research, which includes audio recordings of people making flight reservations… The training, development and test sets contain 4,478,500 and 893 utterances, respectively. There are 120 slot labels and 21 intent types for the training set.) (Note: slot labels and intent types are the tag data corresponding to the corpus data)
and the step of performing joint training for a dialogue understanding pre-training task and a general pre-training task by using the dialogue understanding training data, to obtain a dialogue understanding model comprises: converting the corpus data into input vectors by using the input layer; processing the input vectors by using the general pre-training layer to obtain hidden layer output vectors; (Chen

    PNG
    media_image5.png
    313
    477
    media_image5.png
    Greyscale
)
(Square nodes = converting input layers, layer above square nodes = processing input vectors)
processing the hidden layer output vectors by using the output layer to obtain prediction data; (Chen [3.2 Joint Intent Classification and Slot Filling] BERT can be easily extended to a joint intent classification and slot filling model. Based on the hidden state of the first special token ([CLS]), denoted h1, the intent is predicted as: (1) For slot filling, we feed the final hidden states of other tokens h2, . . . , hT into a softmax layer to classify over the slot filling labels.) (Note: Softmax layer = hidden layer output vectors)
calculating a loss function of the dialogue understanding pre-training task and a loss function of the general pre-training task according to the prediction data and corresponding tag data; (Chen [3.2 Joint Intent Classification and Slot Filling] To jointly model intent classification and slot filling, the objective is formulated as:

    PNG
    media_image4.png
    65
    267
    media_image4.png
    Greyscale

The learning objective is to maximize the conditional probability p(yi, ys|x))
(Note: i = intent, s = slot)
calculating a total loss function according to the loss function of the dialogue understanding pre-training task and the loss function of the general pre-training task; (Chen [3.2 Joint Intent Classification and Slot Filling] To jointly model intent classification and slot filling, the objective is formulated as:
 
    PNG
    media_image4.png
    65
    267
    media_image4.png
    Greyscale

The learning objective is to maximize the conditional probability p(yi, ys|x). The model is finetuned end-to-end via minimizing the cross-entropy loss.) (Note: end-to-end finetuning via cross-entropy loss is interpreted as calculating the total loss function of the model)
completing the training of the dialogue understanding model if the total loss function satisfies a preset convergence condition, (Chen [4.2 Training Details] For fine-tuning, all hyper-parameters are tuned on the development set. The maximum length is 50. The batch size is 128. Adam is used for optimization with an initial learning rate of 5e-5. The dropout probability is 0.1. The maximum number of epochs is selected from [1, 5, 10, 20, 30, 40].) (Note: maximum number of epochs is a parameter which sets the completion of the training of the dialogue understanding model)
wherein the dialogue understanding pre-training task includes: an intent pre-training task…; and a slot pre-training task in which the corpus data includes a second query and the tag data includes a corresponding hypernym of each character in the second query in a knowledge graph, (Chen
	
    PNG
    media_image6.png
    246
    494
    media_image6.png
    Greyscale
)

implementing a synchronous training of the intent pre-training task and the slot pre-training task using a shared output layer of the intent pre-training task and the slot pre-training task to achieve optimization of an intent classification and a slot labeling synchronously, (Chen

    PNG
    media_image5.png
    313
    477
    media_image5.png
    Greyscale
(top output nodes)
[p.3 1st col 2nd ¶]

    PNG
    media_image7.png
    211
    359
    media_image7.png
    Greyscale
)
 (Note: the top output nodes correspond to the shared layer of the intent pre-training task and the slot pre-training task. Examiner interprets that the utilization of the shared output layer corresponds to achieving synchronous intent classification and slot labeling. Thus, the shared layer taught by Chen cited above corresponds to the synchronous training of the intent and slot pre-training tasks. In addition, the jointly model intent classification and slot filling objective to maximize the conditional probability corresponds to achieving optimization.)
wherein in the synchronous training, an output data of the shared output layer includes: intent data corresponding to a first hidden layer output vector of the output layer which uses the [CLS] position for the intent classification; (Chen [p.2 2nd col 3rd ¶]
    PNG
    media_image8.png
    169
    353
    media_image8.png
    Greyscale
) 
and slot data corresponding to a second hidden layer output vector of the output layer which is used for the slot labelling after being subjected to Conditional Random Field (CRF) processing. (Chen [p.3 1st col 3rd ¶] Slot label predictions are dependent on predictions for surrounding words. It has been shown that structured prediction models can improve the slot filling performance, such as conditional random fields (CRF). Zhou and Xu (2015) improves semantic role labeling by adding a CRF layer for a BiLSTM encoder. Here we investigate the efficacy of adding CRF for modeling slot label dependencies, on top of the joint BERT model.) (Note: adding a CRF layer for a BiLSTM encoder corresponds to processing the second hidden layer output vector)
and wherein the output data of the shared output layer of the dialogue understanding model have different types of data in different stages of the dialogue understanding model, wherein: in the training stage of the dialogue understanding model, the output data is prediction data comprising intent prediction data or slot prediction data; in the application stage of the dialogue understanding model, the output data is a task processing result comprising an intent classification result or a slot labelling result. (Chen [p.3 left last ¶] The training, development and test sets contain 4,478, 500 and 893 utterances, respectively… [right 2nd ¶] For fine-tuning, all hyper-parameters are tuned on the development set.) (Note: the development set to fine-tune the model corresponds to the training stage with prediction output data comprising intent and slot prediction data; the test set to test the performance of the model corresponds to the application stage with the output data comprising intent and slot prediction result.)

Chen does not teach, but Pitler teaches: 
obtaining at least one website link from the search engine as a search result; (Pitler [p1432 r. ¶5] The click graph was computed on a very large sample of logs computed well before the training period. There is an edge from a query q to a URL u if at least 10 users issued q and then clicked on u.) (Note: Url U corresponds to the at least one website link from the search engine as a search result)
in which the corpus data includes a first query and the tag data includes a name of a website clicked by the user and corresponding to the first query;
(Pitler [3 Using Click Logs as a Substitute for Annotation] Here we use the click logs as a large-scale source of intents. Logs from Microsoft’s Live Search are used for training and test purposes. Logs from May 2008 were used for training, and logs from June 2008 were used for testing. The logs distinguish four types of clicks: (a) search results, (b) ads, (c) spelling suggestions and (d) query suggestions. Some prototypical queries of each type are shown in Table 1. As mentioned above, clicks on ads are evidence for commercial intent; other types of clicks are evidence for other intents. The query, ebay official, is assumed to be commercial intent, because a large fraction of the clicks are on ads. In contrast, typos tend to have relatively more clicks on “did-you-mean” spelling suggestions.) (Note: the four types of clicks are the labeled intents corresponding to the first query, e.g., “ebay official”.)

    PNG
    media_image9.png
    195
    729
    media_image9.png
    Greyscale
)
obtaining dialogue understanding training data from the search result, wherein the dialogue understanding training data are obtained based on search engine data…, and the search engine data comprise a name of a website in a search result clicked by the user and corresponding to the query; (Pitler [p1432 r. ¶5] The click graph was computed on a very large sample of logs computed well before the training period. There is an edge from a query q to a URL u if at least 10 users issued q and then clicked on u.) (Note: the click graph comprising of an edge connecting query q to a URL (website) corresponds to the search engine data. If each edge is formed when at least 10 users make the query and click, then there exists a user that was clicked by the user in this datapoint.)
 obtaining dialogue understanding training data from the search result, wherein the dialogue understanding training data are obtained based on search engine data, and the search engine data comprise a name of a website in the search result clicked by the user and corresponding to the query.)
Pitler and Chen are analogous because both are from the same field of endeavor of disambiguating a search user’s intent. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to analyze Pitler’s dataset, in particular, using the NLP model of Chen. The motivation to do so is that Chen provides good NLP performance (Chen [Conclusion] Experimental results show that our proposed joint BERT model outperforms BERT models modeling intent classification and slot filling separately, demonstrating the efficacy of exploiting the relationship between the two tasks), and Pitler’s research to analyze user intent from the queries benefits from more powerful NLP models (Pitler, [1.Classify Queries By Intent (CQI)] In this paper we consider the task of: given a class of queries, which types of answer… are likely to be clicked on? [4.2 Method 2: Using Click Graph Context to Generalize Beyond the Queries in the Training Set] To address the generalization concern, we propose a method inspired by Yarowsky (1994). Word sense disambiguation is a classic problem in natural language processing.)
Chen, via Chen/Pitler, further teaches:
and a knowledge graph in the form of a triplet showing a hypernym-hyponym relationship (Chen
            
    PNG
    media_image6.png
    246
    494
    media_image6.png
    Greyscale
) (Note: each slot (e.g. genre; directed_by) correspond to each hypernym; the matching semantic frame (e.g. movie; Steven Spielberg) correspond to the hyponym. Pitler teaches search engine data required for training the model in Chen, and Chen’s method of obtaining the knowledge graph for the dialogue understanding training data is based on the search engine data.)

Regarding Claim 6, the Chen/Pitler combination of Claim 1 teaches the method of Claim 1 (and thus the rejection of Claim 1 is incorporated). Chen further teaches:
the method further comprises: obtaining dialogue understanding training data of domains in at least one domain of dialogue understanding; performing fine-tuning for the dialogue understanding model by using the dialogue understanding training data of the domains, to obtain dialogue understanding models of the domains. (Chen [p 1-2] Pre-trained models can be fine-tuned on NLP tasks and have achieved significant improvement over training on task-specific annotated data. More recently, a pretraining technique, Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2018), was proposed and has created stateof-the-art models for a wide variety of NLP tasks, including question answering (SQuAD v1.1), natural language inference, and others.)  (Note: question-answering task is a domain of dialogue understanding; SQuAD v1.1 for example has training data of such domain)

Independent Claim 11 recites an electronic device, comprising: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor (Chen [4.2 Training details] We use English uncased BERT-Base model, which has 12 layers, 768 hidden states, and 12 heads. BERT is pre-trained on BooksCorpus (800M words) … and English Wikipedia (2,500M words). https://github.com/google-research/bert) to perform precisely the methods of Claim 1. Thus, Claim 11 is rejected for reasons set forth in Claims 1. (Note: the GitHub site gives directions to train and run the model on a computer, which requires at least a processor, a memory, and instructions to run the code to train the model.)
Claim(s) 16, dependent on Claim 11, also recite the device configured to perform precisely the methods of Claim(s) 6, respectively, and thus are rejected for reasons set forth in these claim(s). 

Independent Claim 20 recites a non-transitory computer readable storage medium with computer instructions stored thereon (Chen [4.2 Training details] We use English uncased BERT-Base model, which has 12 layers, 768 hidden states, and 12 heads. BERT is pre-trained on BooksCorpus (800M words) … and English Wikipedia (2,500M words). https://github.com/google-research/bert)) to perform precisely the methods of Claim 1, and thus it is rejected for reasons set forth in Claim 1. (Note: the GitHub site gives directions to train and run the model on a computer, which has at least a processor, a memory, and instructions to run the code to train the model.)


Conclusion
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOSEP HAN whose telephone number is (703)756-1346. The examiner can normally be reached Mon-Fri 9am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/J.H./Examiner, Art Unit 2122                                                                                                                                                                                                                                                                                                                                                                                              /KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122

Read full office action

Prosecution Timeline

Show 10 earlier events

Jun 10, 2025

Examiner Interview Summary

Jun 10, 2025

Applicant Interview (Telephonic)

Jun 18, 2025

Request for Continued Examination

Jun 23, 2025

Response after Non-Final Action

Jul 28, 2025

Non-Final Rejection mailed — §103

Sep 30, 2025

Response Filed

Nov 26, 2025

Final Rejection mailed — §103

Jan 02, 2026

Response after Non-Final Action

Precedent Cases

Applications granted by this same examiner with similar technology

17/717,547

Patent 12585965

INTERACTIVE MACHINE-LEARNING FRAMEWORK

3y 11m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 1 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

6-7

Expected OA Rounds

38%

Grant Probability

62%

With Interview (+25.0%)

4y 2m (~0m remaining)

Median Time to Grant

High

PTA Risk

Based on 16 resolved cases by this examiner. Grant probability derived from career allowance rate.