Last updated: April 19, 2026
Application No. 18/500,969
DATA PROCESSING METHOD, APPARATUS, AND DEVICE

Final Rejection §103
Filed
Nov 02, 2023
Examiner
SIRJANI, FARIBA
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Alipay (Hangzhou) Information Technology Co., Ltd.
OA Round
2 (Final)
Interview Optional

— +31.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 547 resolved cases, 2023–2026
Examiner Intelligence

SIRJANI, FARIBA View full profile →
Grants 76% — above average
Career Allow Rate
414 granted / 547 resolved
+13.7% vs TC avg
Strong +31% interview lift
Without
With
+31.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
31 currently pending
Career history
578
Total Applications
across all art units
Statute-Specific Performance

§101
14.1%
-25.9% vs TC avg
§103
49.1%
+9.1% vs TC avg
§102
14.7%
-25.3% vs TC avg
§112
10.7%
-29.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 547 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claims 1-20 are pending. Claims 1, 8 and 15 are independent.
This Application was published as U.S. 20240153500.
Apparent priority: 3 Nov 2022 (China).
Applicant’s amendments and arguments are considered but are either unpersuasive or moot in view of the new grounds of rejection that, if presented, were necessitated by the amendments to the Claims.
This action is Final.
Response to Amendments 
Objection to the Specification is withdrawn in view of the amendments to the Specification.
Objection to Claim 2 is withdrawn in view of the amendments to the Claim.
Rejection of Claims 2-7, 9-14, and 16-20 under 35 U.S.C. 112 is withdrawn in view of the amendments to Claims 2, 9, and 16.
Response to Arguments
Applicant’s arguments are directed to the amended language and are moot in view of the modified grounds of rejection.
Please note that while 3 references are cited, Zhang is sufficient for teaching the Claim and Williams is a pre-NN era reference and is maintained because it was cited previously and Aly is added to expressly state what is implied in the NN of Zhang.
Applicant has not pointed to the location of support and the corresponding support for “dimension” was found in [0066] of the published Application corresponding to Figure 5:
[0066] In implementation, it is assumed that the target data is divided based on a character-based division method, and a word vector shown in FIG. 5 can be constructed based on the subdata obtained through division. The word vector can be constructed based on a predetermined dimension. For example, if the semantic character sequence includes 13 characters, and the predetermined dimension is 50, a size of the constructed word vector is 50x13.”

    PNG
    media_image1.png
    284
    556
    media_image1.png
    Greyscale

In Figure 5 the “target data” is divided into Chinese characters that are represented by their names and the “Dimension” in the Claim pertains to Dimension of the Word Vector corresponding to a character.  These are generally one-hot vectors and they all have the same dimension which depends on the total number of elements in the set they represent such as the total number of characters or radicals or a selected number of characters in the Chinese language.

Claim 1 is amended as follows:
1. A data processing method, comprising: 
obtaining to-be-detected target data, the target data including content data of a human-computer interaction; 
obtaining a target probability that the target data corresponds to each candidate user intention; 
dividing the target data to obtain a plurality of pieces of subdata, each piece of subdata of the plurality of pieces of subdata being a part of the target data and having a same dimension among the plurality of pieces of subdata; 
obtaining, based on a gradient integration algorithm, a contribution of each piece of subdata to a correspondence level between the target data and each candidate user intention; and 
determining a target user intention corresponding to the target data based on the target probability that the target data corresponds to each candidate user intention and the contribution of each piece of subdata to the correspondence level between the target data and each candidate user intention.

As provided in the Interview Summary of 5 Feb 2026, the portion stating “each piece of subdata … being a part of the target data” is just paraphrase of “dividing the target data to obtain a plurality of pieces of subdata.”  The current amendments go further and add “and having a same dimension among the plurality of pieces of subdata,” which according to the supporting Specification refers to the dimension of the word vector comprising subdata when the subdata are characters: “[0066] In implementation, it is assumed that the target data is divided based on a character-based division method, and a word vector shown in FIG. 5 can be constructed based on the subdata obtained through division. The word vector can be constructed based on a predetermined dimension. For example, if the semantic character sequence includes 13 characters, and the predetermined dimension is 50, a size of the constructed word vector is 50x13.”
Examiner had mapped the subdata of the Claim to the words of Williams.  Applicant’s arguments regarding the use of N-Best list are not on point.  Williams recognized several options (N-best) for each word in the input utterance.  Office action mapped the subdata to the series of words that were selected by Williams to represent the utterance and not N words for the same input sound.  The different words of a recognized sentence were mapped to the subdata.  This interpretation was consistent with the Specification of the instant Application:  “[0034] For example, the target data includes speech Q1 and input data A1 in FIG. 2. Speech Q1 and input data A1 can be divided based on the character-based division method to obtain a plurality of pieces of subdata, or speech Q1 and input data A1 can be divided based on the word-based division algorithm to obtain a plurality of pieces of subdata, or speech Q1 can be used as subdata 1 and input data A1 can be used as subdata 2, for example, speech Q1 and input data A1 are divided based on the phrase-based division method.”

Further, while the words of Williams would not be of the same size, the word “dimension” in the Specification of the instant Application (appearing only twice in [0066] above) and, therefore, in the amended the Claim, refers to the dimension of the Vector that is used to represent the Subdata, and based on the support in [0066] when the division is a character-based division such that the subunits are characters, then “[t]he word vector can be constructed based on a predetermined dimension.”  This “word vector” as shown in Figure 5 is a matrix consisting of vectors (a1 … a50), (b1 … b50) ,… (m1 … m50) when each of these vectors corresponds to a character subdata.  

The limitation as amended provides:
dividing the target data to obtain a plurality of pieces of subdata, each piece of subdata of the plurality of pieces of subdata being a part of the target data and having a same dimension among the plurality of pieces of subdata; 
The “dimension” of the “subdata” refers to the “dimension” of the “vector” representing the “subdata” which is referenced by the Specification.  This would be the dimension 50 in Figure 5 which corresponds to the dimension of the vectors used to represent the characters.  The Claim does not specify that the subdata are characters, words, or phrases.  Each of these subunits can be represented by a vector.

The secondary reference Zhang is directed to intent detection from input text (or recognized speech) and teaches the use of word/subunit vectors.  Applicant has not addressed Zhang in the arguments.  While the rejection is not changed, Zhang is applied to the remainder of Claim 1 below.  Williams is a 2012 reference which was applied to show the breadth of the Claim.

Note the following from the Published Application:
[0002] With rapid development of the Internet industry, network risks also increase accordingly. In a risk control scenario, before application service providers provide services for users, customer service employees can interact with the users to determine, based on feedback information of the users, whether there is a risk in a current service (such as a transfer service, a recharging service, or a withdrawal service). To reduce costs of manual participation, risk control can be performed through human-computer interaction.
[0003] For example, a real user intention corresponding to the feedback information of the user can be determined by using a pretrained intention recognition model, to perform risk control on the current service. However, due to various fraudulent activities and complex feedback information of the user, the pretrained intention recognition model is possibly unable to accurately recognize a real intention of the user, and a risk control effect is poor.
 
[0024] The target data may include data of a content of human-computer interaction, e.g., of a user, in a human-computer interaction process. The target data can include any type of data input by the user, such as voice data, image data, and text data. For example, as shown in FIG. 2, in a resource transfer service scenario, speech Q1 and speech Q2 can be output, and input data A1 of the user for speech Q1 and input data A2 of the user for speech Q2 can be received. In this case, the target data can include speech Q1, speech Q2, and the input data (for example, input data A1 and input data A2) of the user in the human-computer interaction process. The candidate user intention can be a user intention corresponding to a current scenario. For example, in the resource transfer service scenario, the candidate user intention can include a transfer intention and an information update intention.
[0033] In implementation, if the target data includes text data (or text data converted from voice data, video data, etc.), the target data can be divided based on a predetermined data division method to obtain a plurality of pieces of subdata. For example, the target data can be divided based on a character-based division method to obtain a plurality of pieces of subdata, or the target data can be divided based on a word-based division algorithm to obtain a plurality of pieces of subdata, or the target data can be divided based on a phase-based division method to obtain a plurality of pieces of subdata.

    PNG
    media_image2.png
    558
    308
    media_image2.png
    Greyscale

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-3, 8-10, and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Williams (U.S. 20120179467) in view of Zhang (U.S. 20230004798) and further in view of Aly (U.S. 20210117623).
Regarding Claim 1, Williams teaches:
1. A data processing method, comprising: [Williams, Figure 1 shows the hardware including Memory 130 and Processor 120 that the counterpart system and CRM Claims refer to.]
obtaining to-be-detected target data, the target data including content data of a human-computer interaction; [Williams, Figures 2A/2B: “Receiving an utterance as part of a user dialog 202/210.”  “[0026] … For simplicity, FIGS. 2A and 2B are discussed …. FIG. 2A illustrates a broad application of this principle. The system receives an utterance as part of a user dialog (202). One source of user dialogs is users interacting with an interactive voice response (IVR) system. ….”]
obtaining a target probability that the target data corresponds to each candidate user intention; [Williams, Figure 2A: “Generating an n-best list of recognition hypotheses … 204” and “selecting an underlying user intention … 206.”  The members of the N-Best list each have a confidence value/probability associated with them.  See Figure 3B. Williams also associates user action with intention and generates a cumulative confidence/probability score of action/intention.  “… Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention….”  Abstract.  “[0026] … At a high level, the method generates one or more N-best lists during speech recognition and uses one or more of the N-best lists to determine a hidden user intention. …Often user intentions are more complicated and detailed than the actual uttered words. When users converse with IVR systems, one objective of the system is to discern the hidden variable of the user's intent....”]
dividing the target data to obtain a plurality of pieces of subdata, each piece of subdata of the plurality of pieces of subdata being a part of the target data and having a same dimension among the plurality of pieces of subdata; [Williams, Figures 2A/2B: “Generating an n-best list of recognition hypotheses … 204” and “selecting an underlying user intention … 206.”  The subdata are words and Williams generates N-best lists corresponding to words of the input utterance.   “[0026] … Typically the user's intent is strewn in bits and pieces across multiple N-best lists. The user's intent can be thought of as having multiple slots. For example, a user's intention for ordering a pizza can include "slots" of size, price, delivery time, toppings, type of crust, and so forth. Individual N-best lists can contain information relating to some, all, or none of the slots representing the user's intention.”]
obtaining, based on a gradient integration algorithm, a contribution of each piece of subdata to a correspondence level between the target data and each candidate user intention; and [Williams, Figure 2A, 206.   Williams has to combine the information from several n-best lists in order to determine the intent.  Figures 3A and 3B show that information from several lists is combined.  “… Selecting an intention can further be based on confidence scores associated with recognition hypotheses in the generated N-best lists, and also on the probability of a user's action given their underlying intention. A belief or cumulative confidence score can be assigned to each inferred user intention….”  Abstract. “[0027] The system generates an N-best list of recognition hypotheses for the user dialog (204). Each N-best list is some observation about what the user wants. These N-best lists can represent different views into the hidden user intention, but each is a partial noisy view into the user intention. …”  “[0028] Then the system selects an underlying user intention based on a belief distribution across the generated N-best list and at least one contextually similar N-best list (206) and responds to the user based on the selected underlying user intention (208).”   “[0029] In another example, the system generates N-best lists of first names and last names during a dialog. The system can compare the N-best lists to a names directory to determine which combination of N-best list entries is most likely to represent the user's intention. In one example, a first N-best list contains first names (such as Bryant, Bryan, Ryan) and a second N-best list contains last names (such as Milani, Mililani, Miani). These N-best lists are contextually similar because they each relate to different parts of the same name. Similarly, the system can generate pairs of N-best lists for city names and state names. The system can compare the pairs of N-best lists to a directory of city and state names to determine which combination is the most likely.”] (The “gradient integration algorithm” is not defined in the Claim.  It is not a term of art with a known definition.  And, there was no distinct definition found in the Specification.  Please include the definition inside the Claim.  The closest description that might have been intended as a definition was found in [0065]-[0067] of the published Application:  “[0067] … For example, as shown in FIG. 5, a sum of contributions corresponding to d1 to d50 can be used as a contribution of a character “ding” to the correspondence between the target data and each candidate user intention.”  But d1 to d50 are all elements of the vector corresponding to the single word “ding” and their contribution equals the contribution of “ding” alone.) 
determining a target user intention corresponding to the target data based on the target probability that the target data corresponds to each candidate user intention and the contribution of each piece of subdata to the correspondence level between the target data and each candidate user intention. [Williams, Figure 2A: “Responding to the user based on the selected underlying user intention 208.”   “[0032] The system can calculate the confidence score by summing all possible hidden states and hidden user actions, such as in a hidden Markov model. The system can further assign a confidence score to each item in each of the plurality of N-best lists. The confidence score can reflect an automatic speech recognition (ASR) certainty in the recognized speech. For example, in an N-best list containing the words `Boston` and `Austin`, the system can assign a confidence score reflecting the probable accuracy of the recognition, such as `Austin:81%` and `Boston:43%`. The system can remove from consideration those items in each of the plurality of N-best lists with confidence scores below a threshold when selecting the item. One example of a threshold is culling a long N-best list down to a maximum of 4 entries. Another example of a threshold is culling those entries in an N-best list which have a confidence score under 30%. Yet another example of a threshold is to remove the bottom 2/3 of the N-best list. Other thresholds and combinations of thresholds are possible. The system can preserve confidence scores so they are operative over a whole dialog. For example, if the same or similar question is asked several times, the system can aggregate confidence scores from previously generated N-best lists. The system can calculate the probability of the user action by iterating over each user action to form a distribution over the user's real intentions at each dialog turn and by assimilating the distribution in each of the plurality of N-best lists.”]
Williams is an early reference that does not represent its subunits/words with vectors.
Zhang teaches:
1. A data processing method, comprising: [Zhang, Figures 5 and 6 showing the training and intent recognition block diagrams.]
obtaining to-be-detected target data, the target data including content data of a human-computer interaction; [Zhang, Figures 3 and 4.  S302: Input word segmentation results of the to-be-recognized text to an intent recognition model ….”]
obtaining a target probability that the target data corresponds to each candidate user intention; [Zhang, Figure 4 shows a table on the left that includes scores/probability numbers associating each word of the input query/ “target data” with an intent:  “[0051] … The first semantic vectors of the word segmentation results and the semantic vectors of the candidate intents are inputted to a first recognition layer, to obtain first intent results corresponding to the to-be-recognized text outputted by the first recognition layer are “NAVI” and “HIGHWAY”. In addition, the first recognition layer may further output scores between the word segmentation results in the to-be-recognized text and the candidate intents, for example, the score matrix on the left of FIG. 4.”]
dividing the target data to obtain a plurality of pieces of subdata, each piece of subdata of the plurality of pieces of subdata being a part of the target data and having a same dimension among the plurality of pieces of subdata; [Zhang teaches that its training, Figure 1, and therefore operation, Figures 3 and 4, is based on word vectors of segmented words:  “[0027] Specifically, in this embodiment, in the neural network model constructed by performing S102, when outputting a first semantic vector of each segmented word in a training text according to word segmentation results of the training text inputted, the feature extraction layer may adopt the following optional implementation manner. For each training text, a word vector of each segmented word in the training text is obtained. For example, the word vector of each segmented word is obtained by performing embedding processing on the segmented word….”  “[0045] … As shown in FIG. 3, an intent recognition method according to the present disclosure may specifically include the following steps.”  “[0051] … FIG. 4 is a flowchart of intent recognition according to this embodiment. If a to-be-recognized text is “Open the navigation app and take the highway”, word segmentation results corresponding to the to-be-recognized text are “open”, “navigation app”, “take” and “highway”, and candidate intents include “NAVI”, “HIGHWAY” and “POI”, semantic vectors of the candidate intents are 11, 12 and 13 respectively. The word segmentation results corresponding to the to-be-recognized text are inputted to an intent recognition model, and a feature extraction layer in the intent recognition model passes a word vector of each word segmentation result through an encoder layer, an attention layer, a connection layer and a decoder layer to obtain a first semantic vector h1 corresponding to “open”, a first semantic vector h2 corresponding to “navigation app”, a first semantic vector h3 corresponding to “take” and a first semantic vector h4 corresponding to “highway”. Then, the first semantic vectors of the word segmentation results are inputted to a second recognition layer, to obtain second intent results corresponding to the word segmentation results outputted by the second recognition layer, which are “NAVI”, “NAVI”, “HIGHWAY” and “HIGHWAY”. The first semantic vectors of the word segmentation results and the semantic vectors of the candidate intents are inputted to a first recognition layer, to obtain first intent results corresponding to the to-be-recognized text outputted by the first recognition layer are “NAVI” and “HIGHWAY”. In addition, the first recognition layer may further output scores between the word segmentation results in the to-be-recognized text and the candidate intents, for example, the score matrix on the left of FIG. 4.”]
obtaining, based on a gradient integration algorithm, a contribution of each piece of subdata to a correspondence level between the target data and each candidate user intention; and [Zhang, Figure 4 the score table is showing the contribution of each word/piece of subdata to the correspondence level between the input/target data and the intent domain/candidate intention.  Each row of the table adds to 1 and thus each row includes the contribution of each word to the recognized intent/domain such as navigation, highway, or points of interest.  The Claim and the Application do not define the “gradient integration algorithm” and it is not a term of art.  Gradient descent is a regularly used algorithm in neural networks.  Zhang uses an attention based neural network as shown in Figure 4 and while it does not mention the phrase “gradient descent” it is very likely that it uses such an algorithm.]
determining a target user intention corresponding to the target data based on the target probability that the target data corresponds to each candidate user intention and the contribution of each piece of subdata to the correspondence level between the target data and each candidate user intention. [Zhang, Figure 3, S302 and Figure 4, the output of the second recognition layer as the second intent result.  “[0050] … n this embodiment, when S302 is performed to obtain a second intent result according to an output result of the intent recognition model, the following optional implementation manner may be adopted: obtaining the second intent result of the to-be-recognized text according to the scores between the segmented words in the to-be-recognized text and the candidate intent outputted by the intent recognition model. For example, in this embodiment, a score matrix may be constructed according to the scores between the segmented words and the candidate intent, and the second intent result corresponding to each segmented word is obtained by conducting a search with a viterbi algorithm.”]
Williams and Zhang pertain determination of user intent from his natural language input (utterance or text) and it would have been obvious to modify the system of Williams to use the more modern neural network system of Zhang that generates embedding vectors from the inputs for a neural network type of processing and teaches the use of word vectors for the word subunits of Williams.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.
The embedding vectors of Zhang will have a same dimension.
But another express reference is cited:
Aly teaches:
dividing the target data to obtain a plurality of pieces of subdata, each piece of subdata of the plurality of pieces of subdata being a part of the target data and having a same dimension among the plurality of pieces of subdata; [Aly divides the input into character-embeddings or word-embeddings.  “[0107] FIG. 8 illustrates an example view of a vector space 800. In particular embodiments, an object or an n-gram may be represented in a d-dimensional vector space, where d denotes any suitable number of dimensions. Although the vector space 800 is illustrated as a three-dimensional space, this is for illustrative purposes only, as the vector space 800 may be of any suitable dimension….”  The word vectors in Figure 8 are all of dimension=3.]
obtaining, based on a gradient integration algorithm, a contribution of each piece of subdata to a correspondence level between the target data and each candidate user intention; and [Aly: “[0114] … In particular embodiments, training an ANN may comprise modifying the weights associated with the connections between nodes of the ANN by optimizing an objective function. As an example and not by way of limitation, a training method may be used (e.g., the conjugate gradient method, the gradient descent method, the stochastic gradient descent) to backpropagate the sum-of-squares error measured as a distances between each vector representing a training object (e.g., using a cost function that minimizes the sum-of-squares error). …”]
Williams/Zhang and Aly pertain determination of user intent from his natural language input (utterance or text) and it would have been obvious to supplement the teachings of the combination which impliedly include vectors of the same dimension and in all likelihood use the gradient descent method of optimization with the express teachings of Aly that uses CNNs.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 2, Williams teaches several regimes of thresholding the confidence probability scores that suggest the upper and lower thresholds of this Claim: “[0032] … The system can remove from consideration those items in each of the plurality of N-best lists with confidence scores below a threshold when selecting the item. One example of a threshold is culling a long N-best list down to a maximum of 4 entries. Another example of a threshold is culling those entries in an N-best list which have a confidence score under 30%. Yet another example of a threshold is to remove the bottom 2/3 of the N-best list. Other thresholds and combinations of thresholds are possible….”  Williams also teaches that the scores are probability scores.
Williams does not vectorize the input data.
Zhang teaches:
2. The method according to claim 1, further comprising: before the obtaining the target probability that the target data corresponds to each candidate user intention, [The preamble is merely setting the process before reaching the result.] 
determining a first vector corresponding to the target data, and determining, based on an intention recognition model and the first vector, a first probability that the target data corresponds to each first user intention; and [Zhang vectorizes the user input / “target data” into vectors/embeddings and uses a trained neural network intention recognition model to determine the intent of the user input / “target data.”  Figure 3, “[0046] In S301, a to-be-recognized text is acquired.”  “[0047] In S302, word segmentation results of the to-be-recognized text are inputted to an intent recognition model, and a first intent result and a second intent result of the to-be-recognized text are obtained according to an output result of the intent recognition model.”  Figure 4 shows the intent recognition process and includes word segmentation which yields the “first vector corresponding to the target data” of the Claim: [0051] … The first semantic vectors of the word segmentation results …” and are shown in Figure 4 as h1, h2, h3, h4.]
determining, as a candidate user intention, a first first user intention corresponding to a first probability that is greater than a first probability threshold and not greater than a second probability threshold, [Zhang, Figure 3, S302: “… obtain a first intent result …”  Figure 4 showing the “first intent result as “NAVI HIGHWAY.”  Figure 4 shows the probability scores in the matrix to the left.  “[0050] … after word segmentation results of the to-be-recognized text are inputted to the intent recognition model, the intent recognition model outputs the first intent result and scores between segmented words in the to-be-recognized text and the candidate intent through the first recognition layer. …”  “[0051] … The first semantic vectors of the word segmentation results and the semantic vectors of the candidate intents are inputted to a first recognition layer, to obtain first intent results corresponding to the to-be-recognized text outputted by the first recognition layer are “NAVI” and “HIGHWAY”. In addition, the first recognition layer may further output scores between the word segmentation results in the to-be-recognized text and the candidate intents, for example, the score matrix on the left of FIG. 4.”  The score has to exceed a threshold:  “[0031] … Then, the candidate intent whose score exceeds a preset threshold is selected as the first intent result of the training text.” The scores are probabilities because that is what a trained NN model generates.  However, Zhang does not include the word “probability.”]
wherein the determining the target user intention corresponding to the target data based on the target probability that the target data corresponds to each candidate user intention and the contribution of each piece of subdata to the correspondence between the target data and each candidate user intention includes: 
determining, as a second user intention, a second first user intention corresponding to a first probability that is greater than a third probability threshold; and [Zhang, Figure 3, S302, “… obtain … a second intent result … according to an output result of the intent recognition model.”  “[0043] … Then, the candidate intent whose score exceeds a preset threshold is selected as the second intent result corresponding to the segmented word.”  The preset threshold teaches the “third threshold” of the Claim. ] (This limitation is interpreted to ask for a second user intention with a probability that is higher than a third threshold noting that the Claim does not establish any type of relationship between the third threshold and the other two (first and second)).
determining the target user intention corresponding to the target data based on the second user intention, the target probability that the target data corresponds to each candidate user intention, and the contribution of each piece of subdata to the correspondence between the target data and each candidate user intention. [Zhang, the second user intention of Zhang and Claim, both, is based on segmented words / “each piece of subdata.”  Figure 4, “second intent result” is shown at the top right.  “[0050] …  In this embodiment, when S302 is performed to obtain a second intent result according to an output result of the intent recognition model, the following optional implementation manner may be adopted: obtaining the second intent result of the to-be-recognized text according to the scores between the segmented words in the to-be-recognized text and the candidate intent outputted by the intent recognition model. For example, in this embodiment, a score matrix may be constructed according to the scores between the segmented words and the candidate intent, and the second intent result corresponding to each segmented word is obtained by conducting a search with a viterbi algorithm.”  “[0051] …  Then, the first semantic vectors of the word segmentation results are inputted to a second recognition layer, to obtain second intent results corresponding to the word segmentation results outputted by the second recognition layer, which are “NAVI”, “NAVI”, “HIGHWAY” and “HIGHWAY”….”]
The score has to exceed a threshold but Zhang does not teach an upper limit threshold and a lower limit threshold.  Additionally, the scores are probabilities because that is what a trained NN model generates.  However, Zhang does not include the word “probability.”  These features are taught or suggested by Williams as set forth above.
Williams and Zhang pertain determination of user intent from his natural language input (utterance or text) and it would have been obvious to modify the system of Williams to use the more modern neural network system of Zhang that generates embedding vectors from the inputs for a neural network type of processing.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 3, Williams does not teach generating vectors and using neural networks for its intent determination.
Zhang teaches:
3. The method according to claim 2, wherein the obtaining the target probability that the target data corresponds to each candidate user intention includes: [Zhang generates a score that is a probability score based on the mechanism of generation of this score. However, Zhang does not use the phrase “probability” expressly.]
replacing a word vector corresponding to the target data with a replacement word vector, and determining a second vector corresponding to the target data based on the replacement word vector; [Zhang, Figures 1 and 2 teach the training of the intent detection model and the training begins with an initial condition or S102 of Figure 1 and S202 of Figure 2.  The “replacement word vector” is taught by the “preset semantic vectors.”  “[0026] In this embodiment, when S102 is performed to construct the neural network model, a plurality of candidate intents and a semantic vector corresponding to each candidate intent may also be preset. The semantic vector of the candidate intent is configured to represent semantics of the candidate intent, which may be constantly updated with the training of the neural network model.”  ](This limitation again provides no guidance as to what the “replacement word vector” is or how it is generated.  The Specification defines it as a “zero vector.”  The goal of setting the vector to zero is this: “[0062] In implementation, the second vector can be input to the pretrained intention recognition model to obtain the second probability that the target data corresponds to each candidate user intention. As such, because the word vector for determining the second vector is the zero vector, the second probability, determined based on the second vector, that the target data corresponds to each candidate user intention is a probability that the target data corresponds to each candidate user intention that is determined without the semantic effect being considered.”  This seems to be the weight/likelihood of each intent in general and irrespective of user input. The definitions must be inside the Claim language in order to be given effect.)
determining, based on the intention recognition model and the second vector, a second probability that the target data corresponds to each candidate user intention; and [Zhang, the “second vector” of the Claim which yields a “second candidate user intention” is the preset training vector for each intent that yields the initial state of the trained model which generated the “second (probability) score” of the Claim.  Figure 2, S102: “… a first intent result of the training text and a score between each segmented word in the training text and the candidate intent.”  “[0026] …The semantic vector of the candidate intent is configured to represent semantics of the candidate intent, which may be constantly updated with the training of the neural network model.”]
determining, based on the first probability and the second probability, the target probability that the target data corresponds to each candidate user intention. [Zhang, as the training is conducted and also when new data beyond the training data (which is the “replacement word vectors”) are input by the user, the candidate user intentions become available as outputs by the trained model. Figure 3, S302.  These results are based on the first and second (probability) scores generated during the training of the model.]
Zhang teaches scores for the intent obtained from the trained neural network which are inherently probability scores.  However, the phrase probability is not present expressly and is taught expressly by Williams.
Rationale for combination as provided for Claim 2.

Claim 8 is a system claim with limitations corresponding to the limitations of Claim 1 and is rejected under similar rationale.
Claim 9 is a system claim with limitations corresponding to the limitations of Claim 2 and is rejected under similar rationale.
Claim 10 is a system claim with limitations corresponding to the limitations of Claim 3 and is rejected under similar rationale.

Claim 15 is a computer program product system claim with limitations corresponding to the limitations of method Claim 1 and is rejected under similar rationale.
Claim 16 is a computer program product system claim with limitations corresponding to the limitations of method Claim 2 and is rejected under similar rationale.
Claim 17 is a computer program product system claim with limitations corresponding to the limitations of method Claim 3 and is rejected under similar rationale.

Claims 4-7, 11-14, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Williams and Zhang and Aly and further in view of Gao (U.S. 20230080671).
Regarding Claim 4, Williams does not teach generating vectors and using neural networks for its intent determination.
In Zhang, while the position of the words in the input sentence may be preserved according to depiction of Figure 4, the use of a location/position vector is not taught.
Aly does not each a position vector.
Gao teaches:
4. The method according to claim 3, wherein the determining the second vector corresponding to the target data based on the replacement word vector includes: 
obtaining a location vector of each word in the target data and a segment vector of each word in the target data; and [Gao is also directed to intention detection and as shown in Figure 2 converts an input sentence: “I am in meeting  I will call you later” into 3 embeddings include a word embedding vector, an identification vector, and a position embedding vector that shows the position/location of each word in the input.  The “Identification Embedding Vector” teaches the “Segment Vector” and the “Position Embedding Vector” teaches the “Location Vector” of the Claim.  “[0032] In some embodiments, sentences can be distinguished in two ways. The first way is to use special symbols, such as ‘[SEP]’, to separate them. The second way is to add a learned identification embedding vector to each word to indicate whether it belongs to sentence A (i.e., the first sentence) or sentence B (i.e., the second sentence). For each word, the input of the model is obtained by adding the word embedding vector, the identification embedding vector (EA, EB) and the position embedding vector (E0, E1, E2, . . . ) of the word itself. The specific process can be referred to FIG. 2.”  “[0029] …  the identification embedding vector represents that the corresponding word belongs to the first sentence or the second sentence; the position embedding vector represents a position of the corresponding word in the sentence.”] (Note that neither “location vector” nor “segment vector” are defined in the Claim or in the Specification.  Position vectors are known in the field.  But “segment vector” has no customary meaning.  Key words of Claim are required to be defined inside the Claim.)
determining the second vector corresponding to the target data based on the replacement word vector, the location vector, and the segment vector. [Gao, the input sentence/target data of the Claim is converted into 3 vectors shown in Figure 2 as representatives of the “target data” including a position/location vector and an identification/segment vector.]
Williams/Zhang/Aly and Gao pertain determination of user intent from his natural language input (utterance or text) and it would have been obvious to modify the system of combination to use the position embeddings of Gao as a method commonly used in the art to preserve the context of each word.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 5, Williams does not teach generating vectors and using neural networks for its intent determination.
Zhang teaches:
5. The method according to claim 4, wherein the determining the target user intention corresponding to the target data based on the second user intention, the target probability that the target data corresponds to each candidate user intention, and the contribution of each piece of subdata to the correspondence between the target data and each candidate user intention includes: 
determining, as a potential user intention, a candidate user intention corresponding to a contribution that is lower than a contribution threshold; and [Zhang, Figure 2, S202.  “[0043] …  For example, the first semantic vector of each segmented word is inputted into a classifier after linear layer transformation, and a score of each candidate intent is obtained by the classifier. Then, the candidate intent whose score exceeds a preset threshold is selected as the second intent result corresponding to the segmented word.”  More than one intent may have a score exceeding or below the threshold and those with scores exceeding the threshold teach the “target user intention” and those with scores below the threshold teach the “potential user intention” of this Claim.  See Figure 2 where the model is trained to yield two types of intent results and also yields a different candidate intent for each “segmented word” of the user input.]
determining the target user intention corresponding to the target data based on the second user intention and the potential user intention. [Zhang, Figure 2, S202.  “[0043] …   Then, the candidate intent whose score exceeds a preset threshold is selected as the second intent result corresponding to the segmented word.”]
	Rationale as provided for Claim 2.

Regarding Claim 6, Williams and Zhang do not mention risk control policies.
Aly teaches:
6. The method according to claim 5, wherein the target data is data required for performing a target service, and the method further includes: 
obtaining a first risk control policy corresponding to the second user intention and a second risk control policy corresponding to the potential user intention, and [Aly, Figure 2, “dialog Arbitrator 216” and Figure 3, showing the Dialog Arbitrator including several types of policies all of which are intended to control the risk of execution of an intended function/target service.  “[0075] In particular embodiments, the reasoning module 214 may communicate with the remote action execution module 226 and the dialog arbitrator 216, respectively. In particular embodiments, the dialog manager 335 of the reasoning module 214 may communicate with a task completion component 340 of the action execution module 226 about the dialog intent and associated content objects. In particular embodiments, the task completion module 340 may rank different dialog hypotheses for different dialog intents. The task completion module 340 may comprise an action selector 341. In alternative embodiments, the action selector 341 may be comprised in the dialog manager 335. In particular embodiments, the dialog manager 335 may additionally check against dialog policies 345 comprised in the dialog arbitrator 216 regarding the dialog state. In particular embodiments, a dialog policy 345 may comprise a data structure that describes an execution plan of an action by an agent 350. The dialog policy 345 may comprise a general policy 346 and task policies 347. In particular embodiments, the general policy 346 may be used for actions that are not specific to individual tasks. The general policy 346 may comprise handling low confidence intents, internal errors, unacceptable user response with retries, skipping or inserting confirmation based on ASR or NLU confidence scores, etc. The general policy 346 may also comprise the logic of ranking dialog state update candidates from the dialog state tracker 337 output and pick the one to update (such as picking the top ranked task intent). In particular embodiments, the assistant system 140 may have a particular interface for the general policy 346, which allows for consolidating scattered cross-domain policy/business-rules, especial those found in the dialog state tracker 337, into a function of the action selector 341. … The interface for the general policy 346 may also allow for providing a layering of policies with back-off, i.e. multiple policy units, with highly specialized policy units that deal with specific situations being backed up by more general policies 346 that apply in wider circumstances. In this context the general policy 346 may alternatively comprise intent or task specific policy. In particular embodiments, a task policy 347 may comprise the logic for action selector 341 based on the task and current state. …”]
performing risk detection on the target service based on the first risk control policy and the second risk control policy, to determine whether there is a risk in execution of the target service. [Aly, “dialog policies 345” including the “task policies 347” are there such that some tasks that are considered risky or inappropriate according to some policy are not executed.  “[0076] In particular embodiments, the action selector 341 may take candidate operators of dialog state and consult the dialog policy 345 to decide what action should be executed. The assistant system 140 may use a hierarchical dialog policy with general policy 346 handling the cross-domain business logic and task policies 347 handles the task/domain specific logic. In particular embodiments, the general policy 346 may pick one operator from the candidate operators to update the dialog state, followed by the selection of a user facing action by a task policy 347. Once a task is active in the dialog state, the corresponding task policy 347 may be consulted to select right actions….”  Privacy for example is a policy to avoid risk:  “[0006] … n particular embodiments, the assistant system may check privacy settings to ensure that accessing a user's profile or other user information and executing different tasks are permitted subject to the user's privacy settings.”  “[0007] …  By leveraging both client-side and server-side processes, the assistant system can effectively assist a user with optimal usage of computing resources while at the same time protecting user privacy and enhancing security.”  See [0115]-[0127] which is focused on Privacy as a policy and violations of privacy as a type of risk to be avoided to enhance “security.”]
Williams/Zhang and Aly pertain determination of user intent from his natural language input (utterance or text) and it would have been obvious to modify the system of combination to use the control policies of Aly to prevent the machine from performing tasks that are not considered prudent for some reason.  This combination falls under combining prior art elements according to known methods to yield predictable results or use of known technique to improve similar devices (methods, or products) in the same way. See MPEP 2141, KSR, 550 U.S. at 418, 82 USPQ2d at 1396.

Regarding Claim 7, Williams does not teach generating vectors and using neural networks for its intent determination.
Zhang teaches:
7. The method according to claim 5, further comprising: 
training the intention recognition model based on the target data and the target user intention to obtain a trained intention recognition model. [Zhang, Figure 1, training data is acquired at S101 and by S103 a neural network is trained which is an intent recognition model.  [0018] … As shown in FIG. 1, an intent recognition model training method according to the present disclosure may specifically include the following steps.”  “[0019] In S101, training data including a plurality of training texts and first annotation intents of the plurality of training texts is acquired.”]
Rationale as provided for Claim 2.

Claim 11 is a system claim with limitations corresponding to the limitations of Claim 4 and is rejected under similar rationale.
Claim 12 is a system claim with limitations corresponding to the limitations of Claim 5 and is rejected under similar rationale.
Claim 13 is a system claim with limitations corresponding to the limitations of Claim 6 and is rejected under similar rationale.
Claim 14 is a system claim with limitations corresponding to the limitations of Claim 7 and is rejected under similar rationale.

Claim 18 is a computer program product system claim with limitations corresponding to the limitations of method Claim 4 and is rejected under similar rationale.
Claim 19 is a computer program product system claim with limitations corresponding to the limitations of method Claim 5 and is rejected under similar rationale.
Claim 20 is a computer program product system claim with limitations corresponding to the limitations of method Claim 6 and is rejected under similar rationale.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FARIBA SIRJANI whose telephone number is (571)270-1499. The examiner can normally be reached 9 to 5, M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre Desir can be reached at 571-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Fariba Sirjani/
Primary Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Nov 02, 2023
Application Filed
Nov 20, 2025
Non-Final Rejection — §103
Jan 29, 2026
Interview Requested
Feb 05, 2026
Examiner Interview Summary
Feb 05, 2026
Applicant Interview (Telephonic)
Mar 02, 2026
Response Filed
Apr 08, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/454,031
Patent 12603099
SELF-ADJUSTING ASSISTANT LLMS ENABLING ROBUST INTERACTION WITH BUSINESS LLMS
2y 5m to grant Granted Apr 14, 2026
18/152,553
Patent 12579482
Schema-Guided Response Generation
2y 5m to grant Granted Mar 17, 2026
18/341,681
Patent 12572737
GENERATIVE THOUGHT STARTERS
2y 5m to grant Granted Mar 10, 2026
18/406,094
Patent 12537013
AUDIO-VISUAL SPEECH RECOGNITION CONTROL FOR WEARABLE DEVICES
2y 5m to grant Granted Jan 27, 2026
18/180,329
Patent 12492008
Cockpit Voice Recorder Decoder
2y 5m to grant Granted Dec 09, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+31.0%)
2y 10m
Median Time to Grant
Moderate
PTA Risk
Based on 547 resolved cases by this examiner. Grant probability derived from career allow rate.