Last updated: April 19, 2026
Application No. 18/266,432
VOICE RECOGNITION MODEL TRAINING METHOD, VOICE RECOGNITION METHOD, ELECTRONIC DEVICE, AND STORAGE MEDIUM

Final Rejection §101§102§103
Filed
Jun 09, 2023
Examiner
BLANKENAGEL, BRYAN S
Art Unit
2658
Tech Center
2600 — Communications
Assignee
BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.
OA Round
2 (Final)
Interview Optional

— +35.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 377 resolved cases, 2023–2026
Examiner Intelligence

BLANKENAGEL, BRYAN S View full profile →
Grants 67% — above average
Career Allow Rate
254 granted / 377 resolved
+5.4% vs TC avg
Strong +35% interview lift
Without
With
+35.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 7m
Avg Prosecution
23 currently pending
Career history
400
Total Applications
across all art units
Statute-Specific Performance

§101
25.6%
-14.4% vs TC avg
§103
49.3%
+9.3% vs TC avg
§102
13.3%
-26.7% vs TC avg
§112
6.5%
-33.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 377 resolved cases
Office Action

§101 §102 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Response to Arguments
Applicant's arguments filed 08/08/2025 have been fully considered but they are not persuasive. Regarding arguments on pages 9-10 of the Remarks, Examiner notes that the usage of a computer to perform the abstract idea does not constitute a practical application or significantly more. The technical solution of selecting a negative sample using a confusion matrix and a language score is a mental process, where a human could perform the steps using the confusion matrix and language score. Therefore, the argued technical solution is itself abstract, and cannot be considered a practical application or significantly more.
Regarding arguments on pages 11-13 of the Remarks, Examiner notes that each of the encoders and decoder in Alon are made up of LSTM layers. Further, the association layer of the claim could be interpreted to include the encoder and decoder, as the term is not detailed in the claim. Therefore, the association layer uses both the feature vector and history vector to obtain the association feature.
Applicant’s arguments with respect to claim(s) 1-3, 5-9, 19-20, 22-23, 25, and 27-30 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-3, 5-9, 19-20, and 22-23, 25, and 27-30 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more.  Using the subject matter eligibility test from page 74621 of the Federal Register Notice titled “2014 Interim Guidance on Patent Subject Matter Eligibility,” a two-step process is performed. Under step 1, the claims are analyzed to determine if the claim is directed to a process, machine, article of manufacture, or composition of matter. In this case, claims 1-3, 5-9, 25, and 27 are directed to a method, which is a process; claims 19 and 28-30 are directed to a device, which is a machine or an article of manufacture; and claims 20 and 22-23 are directed to a computer readable storage medium, which is a machine or an article of manufacture. Step 2A (part 1 of the Mayo test), using the guidance from pages 50-57 of the Federal Register Vol. 84 No. 4 from Monday, January 7, 2019, requires applying a two-prong inquiry. In Prong One, examiners evaluate whether the claim recites a judicial exception, determining if the claim is directed to a law of nature, a natural phenomenon, or an abstract idea. In this case, claim 1 recites constructing a negative sample, which is a mental process, as well as training a model, which is a mathematical operation. In Prong Two, examiners evaluate whether the judicial exception is integrated into a practical application that imposes a meaningful limit on the judicial exception. In this case, additional elements of obtaining data is mere extrasolution activity, while structural elements of processor, memory, and computer readable medium are generic computing components, neither of which integrate the abstract idea into a practical application.
Step 2B (part 2 of the Mayo test) requires analyzing the claims to determine if they recite additional elements that amount to significantly more than the judicial exception. In this case, the claims do not include additional elements that are sufficient to amount to significantly more than the abstract idea itself.  

Regarding claims 1 and 19-20, constructing a negative sample is a mental process, while associating data, obtaining a loss function, and training a model are mathematical operations, both of which are abstract ideas. For example, a human could receive a positive sample, and pick another sample similar to the positive sample as the negative sample, while the training could be by performing backpropagation, which is a mathematical operation. Additional elements of obtaining and inputting data are mere extrasolution activity, while structural elements of processor, memory, and computer readable medium are generic computing components, neither of which integrate the abstract idea into a practical application or constitute significantly more.

Regarding claim 2 and 22 and 28, determining a text as a positive sample and determining a different sample as a negative sample is a mental process, which is an abstract idea without integration into a practical application and without significantly more.

Regarding claim 3, traversing a path and determining another path are mental processes, which is an abstract idea, while obtaining data is mere extrasolution activity, and does not integrate the abstract idea into a practical application or constitute significantly more.

Regarding claims 5, 25, and 27, the limitations are a further clarification of the above abstract ideas.

Regarding claims 6, 23, and 29, constraining a voice decoding path is a mathematical operation, which is an abstract idea. For example, certain paths could be set to 0 so that they are not traversed. Obtaining a result is mere extrasolution activity, which does not integrate the abstract idea into a practical application or constitute significantly more.

Regarding claims 7 and 30, determining a target path is a mental process, which is an abstract idea. Obtaining data is mere extrasolution activity, which does not integrate the abstract idea into a practical application or constitute significantly more.

Regarding claim 8, Obtaining data is mere extrasolution activity, which does not integrate the abstract idea into a practical application or constitute significantly more.


Regarding claim 9, determining a decoding path is a mental process, which is an abstract idea. Obtaining and acquiring is mere extrasolution activity, which does not integrate the abstract idea into a practical application or constitute significantly more.

The limitations of the claims, taken alone, do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements individually. Applicable case law cited in the Federal Register includes, but is not limited to: Alice Corp., 134 S. Ct. at 2355-56, Digitech Image Tech., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344 (Fed. Cir. 2014), Benson, 409 U.S. at 63.

See "Preliminary Examination Instructions in view of the Supreme Court Decision in Alice Corporation Pty. Ltd. v. CLS Bank International, et al.," dated June 25, 2014, and the Federal Register notice titled "2014 Interim Guidance on Patent Subject Matter Eligibility" (79 FR 74618).

	
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 29-30 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Alon et al. (Alon, U., Pundak, G., & Sainath, T. N. (2019, May). Contextual speech recognition with difficult negative training examples. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6440-6444). IEEE.), hereinafter referred to as Alon.

Regarding claim 29, Alon teaches:
An electronic device, comprising: 
at least one processor (page 6441 section 4.1, where tensor processing units are used); and 
a memory connected in communication with the at least one processor (page 6441 section 4.1, where the performing necessarily would require memory or storage); 
wherein the memory stores an instruction executable by the at least one processor to enable the at least one processor to execute operations, comprising:
constraining a voice decoding path corresponding to voice data to be recognized according to a second voice recognition model, in a case where the voice data to be recognized is being decoded, wherein the second voice recognition model is a model trained according to the method of claim 1 (Page 6442 section 4.3, where various models are used for speech recognition, including constraining decoding paths by training with fuzzy alternative data); and 
obtaining a voice recognition result according to constraint on the voice decoding path (Page 6442 section 4.4, where results are obtained for each model), 
wherein the voice recognition result is a text object that matches expected text (Page 6442 section 4.4, where word error rate is used to determine if the text matches the expected text).  

Regarding claim 30, Alon teaches:
The electronic device of claim 29, wherein obtaining the voice recognition result according to the constraint on the voice decoding path comprises: 
obtaining a language score corresponding to the voice data to be recognized satisfying the constraint on the decoding path, according to the second voice recognition model (Page 6440 section 2, where the probability distribution is considered the language score); 
determining a target decoding path according to the language score (Page 6440-6441 section 2, where the decoding path is the process to obtain the output); and 
obtaining the voice recognition result according to the target decoding path (Page 6442 section 4.4, where results are obtained for each model).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-2, 5-9, 19-20, 22-23, 25, and 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Alon, in view of Iscen et al. (US 2024/0320493 A1), hereinafter referred to as Iscen.

Regarding claim 1, Alon teaches:
A voice recognition model training method, applied to an electronic device in a cluster system for performing voice recognition (Fig. 1, where both encoder and decoders are used, interpreted as a cluster system), wherein the method comprises: 
constructing, by the electronic device, a negative sample according to a positive sample (page 6441 section 3.2, where fuzzy alternatives to proper noun positive samples are determined as negative samples);
selecting, by the electronic device, a target negative sample for constraining a voice decoding path from the negative sample by using a language score (page 6441 section 3.2, where fuzzy alternatives to proper noun positive samples are determined as negative samples based on their scores); 
obtaining, by the electronic device, training data according to the positive sample and the target negative sample (page 6441 section 3, where the set of extracted proper nouns with their fuzzy alternatives are the bias phrases for training); and 
training, by the electronic device, a first voice recognition model according to the training data to obtain a second voice recognition model (page 6441 section 3, where the bias phrases are used for training the model), wherein:
the training data is inputted into an embedding layer of the first voice recognition model to convert the training data into a corresponding feature vector through the embedding layer (Fig. 1, page 6440 section 2 third paragraph, where the audio inputs and context phrases are embedded); 
the feature vector is associated with a history vector in an association layer of the first voice recognition model to obtain an association feature for voice recognition prediction (Fig. 1, page 6440-6441 section 2 third paragraph, where the previous predictions are input to the decoder with the audio and context inputs), wherein the association layer comprises a plurality of Long Short Term Memory Network (LSTM) layers (page 6441, section 4.1 last paragraph, where each of the encoders and decoder consist of LSTM layers);
Alon does not teach:
an acoustic confusion matrix;
the association feature is inputted into a full connection layer of the first voice recognition model, to perform a binary classification process of an activation function;
a loss function is obtained according to an output value obtained after the binary classification process and a target value; and 
the first voice recognition model is trained according to backpropagation of the loss function to obtain the second voice recognition model.  
Iscen teaches:
the first voice recognition model (para [0029], where a fully connected layer is used for classification and uses a softmax function as an activation function);
a loss function is obtained according to an output value obtained after the binary classification process and a target value (para [0059], where a loss function is backpropagated to update parameters of the model); and 
the first voice recognition model is trained according to backpropagation of the loss function to obtain the second voice recognition model (para [0059], where a loss function is backpropagated to update parameters of the model).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Alon by using the model structure and training of Iscen (Iscen para [0029], [0059]) in the model of Alon (Alon Fig. 1) by including a fully connected layer, in order to analyze the feature representations to determine a class (Iscen para [0029]).  
Ramos teaches:
a confusion matrix (col. 19 lines 42-54, where a confusion matrix is used in the fuzzy logic);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Alon in view of Iscen by using the confusion matrix of Ramos (Ramos col. 19 lines 42-54) in the fuzzy alternatives of Alon in view of Iscen (Alon page 6441 section 3.2), in order to determine a difference between text strings (Ramos col. 19 lines 42-54).

Regarding claim 2, Alon in view of Iscen and Ramos teaches:
The method of claim 1, wherein constructing the negative sample according to the positive sample to obtain the target negative sample for constraining the voice decoding path comprises: 
determining a text character in a matching library as the positive sample (Alon page 6441 section 3.1, where the extracted proper nouns are the text characters); and 
determining a sample other than the positive sample as the target negative sample (Alon page 6441 section 3.2, where fuzzy alternatives to proper noun positive samples are determined as negative samples).  

Regarding claim 5, Alon in view of Iscen and Ramos teaches:
The method of claim 1, wherein the second voice recognition model is a composition model based on a neural network (Alon Fig. 1, page 6440, section 2 first paragraph, where the model is an all-neural approach, and is composed of an encoder and decoder as well as attention).  

Regarding claim 6, Alon in view of Iscen and Ramos teaches:
A voice recognition method, comprising: 
constraining a voice decoding path corresponding to voice data to be recognized according to a second voice recognition model, in a case where the voice data to be recognized is being decoded, wherein the second voice recognition model is a model trained according to the method of claim 1 (Alon Page 6442 section 4.3, where various models are used for speech recognition, including constraining decoding paths by training with fuzzy alternative data); and 
obtaining a voice recognition result according to constraint on the voice decoding path (Alon Page 6442 section 4.4, where results are obtained for each model), 
wherein the voice recognition result is a text object that matches expected text (Alon Page 6442 section 4.4, where word error rate is used to determine if the text matches the expected text). 

Regarding claim 7, Alon in view of Iscen and Ramos teaches:
The method of claim 6, wherein obtaining the voice recognition result according to the constraint on the voice decoding path comprises: 
obtaining a language score corresponding to the voice data to be recognized satisfying the constraint on the decoding path, according to the second voice recognition model (Alon Page 6440 section 2, where the probability distribution is considered the language score); 
determining a target decoding path according to the language score (Alon Page 6440-6441 section 2, where the decoding path is the process to obtain the output); and 
obtaining the voice recognition result according to the target decoding path (Alon Page 6442 section 4.4, where results are obtained for each model).  

Regarding claim 8, Alon in view of Iscen and Ramos teaches:
The method of claim 7, further comprising: 
obtaining an acoustic score corresponding to the voice data to be recognized, according to an acoustic model (Alon page 6440 section 1 second and third paragraphs, page 6441 section 3.2, where the scores of similar sounding pairs are determined using an external conventional model).  

Regarding claim 9, Alon in view of Iscen and Ramos teaches:
The method of claim 8, wherein determining the target decoding path according to the language score, comprises: 
obtaining an evaluation value according to the language score and the acoustic score (Alon Page 6442 section 4.4, where word error rate is used to determine if the text matches the expected text); 
acquiring a decoding space obtained in the case where the voice data to be recognized is being decoded, wherein the decoding space comprises a plurality of decoding paths (Alon Page 6442 section 4.4, Table 2, where the results of the various models are considered the decoding space); and 
determining a decoding path with a highest evaluation value among the plurality of decoding paths as the target decoding path (Alon Page 6442 section 4.4, where the path resulting in the lowest WER is considered the target).  

Regarding claim 19, Alon teaches:
An electronic device, applied to a node in a cluster system for performing voice recognition (Fig. 1, where both encoder and decoders are used, interpreted as a cluster system), wherein the electronic device comprises: 
at least one processor (page 6441 section 4.1, where tensor processing units are used); and 
a memory connected in communication with the at least one processor (page 6441 section 4.1, where the performing necessarily would require memory or storage);
wherein the memory stores an instruction executable by the at least one processor to enable the at least one processor to execute operations, comprising:
constructing a negative sample according to a positive sample (page 6441 section 3.2, where fuzzy alternatives to proper noun positive samples are determined as negative samples);
selecting a target negative sample for constraining a voice decoding path from the negative sample by using a language score (page 6441 section 3.2, where fuzzy alternatives to proper noun positive samples are determined as negative samples based on their scores);
obtaining training data according to the positive sample and the target negative sample (page 6441 section 3, where the set of extracted proper nouns with their fuzzy alternatives are the bias phrases for training); and 
training a first voice recognition model according to the training data to obtain a second voice recognition model (page 6441 section 3, where the bias phrases are used for training the model), wherein:
the training data is inputted into an embedding layer of the first voice recognition model to convert the training data into a corresponding feature vector through the embedding layer (Fig. 1, page 6440 section 2 third paragraph, where the audio inputs and context phrases are embedded); 
the feature vector is associated with a history vector in an association layer of the first voice recognition model to obtain an association feature for voice recognition prediction (Fig. 1, page 6440-6441 section 2 third paragraph, where the previous predictions are input to the decoder with the audio and context inputs), wherein the association layer comprises a plurality of Long Short Term Memory Network (LSTM) layers (page 6441, section 4.1 last paragraph, where each of the encoders and decoder consist of LSTM layers);
Alon does not teach:
an acoustic confusion matrix;
the association feature is inputted into a full connection layer of the first voice recognition model, to perform a binary classification process of an activation function;
a loss function is obtained according to an output value obtained after the binary classification process and a target value; and 
the first voice recognition model is trained according to backpropagation of the loss function to obtain the second voice recognition model.  
Iscen teaches:
the first voice recognition model (para [0029], where a fully connected layer is used for classification and uses a softmax function as an activation function);
a loss function is obtained according to an output value obtained after the binary classification process and a target value (para [0059], where a loss function is backpropagated to update parameters of the model); and 
the first voice recognition model is trained according to backpropagation of the loss function to obtain the second voice recognition model (para [0059], where a loss function is backpropagated to update parameters of the model).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Alon by using the model structure and training of Iscen (Iscen para [0029], [0059]) in the model of Alon (Alon Fig. 1) by including a fully connected layer, in order to analyze the feature representations to determine a class (Iscen para [0029]). 
Ramos teaches:
a confusion matrix (col. 19 lines 42-54, where a confusion matrix is used in the fuzzy logic);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Alon in view of Iscen by using the confusion matrix of Ramos (Ramos col. 19 lines 42-54) in the fuzzy alternatives of Alon in view of Iscen (Alon page 6441 section 3.2), in order to determine a difference between text strings (Ramos col. 19 lines 42-54).

Regarding claim 20, Alon teaches:
A non-transitory computer-readable storage medium storing a computer instruction thereon, applied to an electronic device in a cluster system for performing voice recognition (Fig. 1, where both encoder and decoders are used, interpreted as a cluster system), wherein the computer instruction is used to cause a computer to execute operations (page 6441 section 4.1, where tensor processing units are used, and where the performing necessarily would require memory or storage), comprising:
constructing a negative sample according to a positive sample (page 6441 section 3.2, where fuzzy alternatives to proper noun positive samples are determined as negative samples);
selecting a target negative sample for constraining a voice decoding path from the negative sample by using a language score (page 6441 section 3.2, where fuzzy alternatives to proper noun positive samples are determined as negative samples based on their scores);
obtaining training data according to the positive sample and the target negative sample (page 6441 section 3, where the set of extracted proper nouns with their fuzzy alternatives are the bias phrases for training); and 
training a first voice recognition model according to the training data to obtain a second voice recognition model (page 6441 section 3, where the bias phrases are used for training the model), wherein:
the training data is inputted into an embedding layer of the first voice recognition model to convert the training data into a corresponding feature vector through the embedding layer (Fig. 1, page 6440 section 2 third paragraph, where the audio inputs and context phrases are embedded); 
the feature vector is associated with a history vector in an association layer of the first voice recognition model to obtain an association feature for voice recognition prediction (Fig. 1, page 6440-6441 section 2 third paragraph, where the previous predictions are input to the decoder with the audio and context inputs), wherein the association layer comprises a plurality of Long Short Term Memory Network (LSTM) layers (page 6441, section 4.1 last paragraph, where each of the encoders and decoder consist of LSTM layers);
Alon does not teach:
an acoustic confusion matrix;
the association feature is inputted into a full connection layer of the first voice recognition model, to perform a binary classification process of an activation function;
a loss function is obtained according to an output value obtained after the binary classification process and a target value; and 
the first voice recognition model is trained according to backpropagation of the loss function to obtain the second voice recognition model.  
Iscen teaches:
the first voice recognition model (para [0029], where a fully connected layer is used for classification and uses a softmax function as an activation function);
a loss function is obtained according to an output value obtained after the binary classification process and a target value (para [0059], where a loss function is backpropagated to update parameters of the model); and 
the first voice recognition model is trained according to backpropagation of the loss function to obtain the second voice recognition model (para [0059], where a loss function is backpropagated to update parameters of the model).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Alon by using the model structure and training of Iscen (Iscen para [0029], [0059]) in the model of Alon (Alon Fig. 1) by including a fully connected layer, in order to analyze the feature representations to determine a class (Iscen para [0029]). 
Ramos teaches:
a confusion matrix (col. 19 lines 42-54, where a confusion matrix is used in the fuzzy logic);
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Alon in view of Iscen by using the confusion matrix of Ramos (Ramos col. 19 lines 42-54) in the fuzzy alternatives of Alon in view of Iscen (Alon page 6441 section 3.2), in order to determine a difference between text strings (Ramos col. 19 lines 42-54).

Regarding claim 22, Alon in view of Iscen and Ramos teaches:
The non-transitory computer-readable storage medium of claim 20, wherein constructing the negative sample according to the positive sample to obtain the target negative sample for constraining the voice decoding path comprises: 
determining a text character in a matching library as the positive sample (Alon page 6441 section 3.1, where the extracted proper nouns are the text characters); and 
determining a sample other than the positive sample as the target negative sample (Alon page 6441 section 3.2, where fuzzy alternatives to proper noun positive samples are determined as negative samples). 

Regarding claim 23, Alon in view of Iscen and Ramos teaches:
A non-transitory computer-readable storage medium storing a computer instruction thereon, wherein the computer instruction is used to cause a computer to execute operations, comprising: 
constraining a voice decoding path corresponding to voice data to be recognized according to a second voice recognition model, in a case where the voice data to be recognized is being decoded, wherein the second voice recognition model is a model trained according to the method of claim 1 (Alon Page 6442 section 4.3, where various models are used for speech recognition, including constraining decoding paths by training with fuzzy alternative data); and 
obtaining a voice recognition result according to constraint on the voice decoding path (Alon Page 6442 section 4.4, where results are obtained for each model), 
wherein the voice recognition result is a text object that matches expected text (Alon Page 6442 section 4.4, where word error rate is used to determine if the text matches the expected text).  

Regarding claim 25, Alon in view of Iscen and Ramos teaches:
The method of claim 2, wherein the second voice recognition model is a composition model based on a neural network (Alon Fig. 1, page 6440, section 2 first paragraph, where the model is an all-neural approach, and is composed of an encoder and decoder as well as attention). 

Regarding claim 28, Alon in view of Iscen and Ramos teaches:
The electronic device of claim 19, wherein constructing the negative sample according to the positive sample to obtain the target negative sample for constraining the voice decoding path comprises: 
determining a text character in a matching library as the positive sample (Alon page 6441 section 3.1, where the extracted proper nouns are the text characters); and 
determining a sample other than the positive sample as the target negative sample (Alon page 6441 section 3.2, where fuzzy alternatives to proper noun positive samples are determined as negative samples).  

Claim(s) 3 and 27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Alon, in view of Iscen, and Ramos, and further in view of Hoffman et al. (US 2021/0232615 A1), hereinafter referred to as Hoffman.

Regarding claim 3, Alon in view of Iscen and Ramos teaches:
The method of claim 2, wherein determining the sample other than the positive sample as the target negative sample comprises:
Alon in view of Iscen and Ramos does not teach:
obtaining a data structure in a form of a node tree according to the positive sample, wherein each node in the node tree is an identifier corresponding to the text character constituting the positive sample; 
traversing a positive path formed by the positive sample in the node tree to obtain a first path set; and 
determining a path other than the first path set in the node tree as a second path set, the second path set comprising the target negative sample.
Hoffman teaches:
obtaining a data structure in a form of a node tree according to the positive sample, wherein each node in the node tree is an identifier corresponding to the text character constituting the positive sample (Fig. 6, para [0099], where a decision tree consists of nodes corresponding to words); 
traversing a positive path formed by the positive sample in the node tree to obtain a first path set (Fig. 6, para [0099], where certain paths correspond to a positive sentence candidate); and 
determining a path other than the first path set in the node tree as a second path set, the second path set comprising the target negative sample (Fig. 6, para [0099], where certain paths correspond to a negative sentence candidate).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the system of Alon in view of Iscen and Ramos by using the tree structure of Hoffman (Hoffman Fig. 6) in the model of Alon in view of Iscen and Ramos (Alon Fig. 1) by applying the tree to sentences, in order to determine and/or identify whether a sentence is a positive sentence candidate of a negative sentence candidate (Hoffman para [0099]).

Regarding claim 27, Alon in view of Iscen, Ramos, and Hoffman teaches:
The method of claim 3, wherein the second voice recognition model is a composition model based on a neural network (Alon Fig. 1, page 6440, section 2 first paragraph, where the model is an all-neural approach, and is composed of an encoder and decoder as well as attention).  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2016/0275945 A1 para [0059] teaches using a confusion matrix and fuzziness parameters during a fuzzy search; US 2011/0103688 A1 para [0023] teaches using a confusion matrix to refine fuzzy searching approaches.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached at 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Jun 09, 2023
Application Filed
May 07, 2025
Non-Final Rejection — §101, §102, §103
Aug 08, 2025
Response Filed
Sep 08, 2025
Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/401,768
Patent 12602551
GENERATION OF SYNTHETIC DOCUMENTS FOR DATA AUGMENTATION
2y 5m to grant Granted Apr 14, 2026
17/850,617
Patent 12579993
Multi-Talker Audio Stream Separation, Transcription and Diaraization
2y 5m to grant Granted Mar 17, 2026
18/014,217
Patent 12572759
MULTILINGUAL CONVERSATION TOOL
2y 5m to grant Granted Mar 10, 2026
18/251,876
Patent 12555591
MACHINE LEARNING ASSISTED SPATIAL NOISE ESTIMATION AND SUPPRESSION
2y 5m to grant Granted Feb 17, 2026
18/066,128
Patent 12547836
KNOWLEDGE FACT RETRIEVAL THROUGH NATURAL LANGUAGE PROCESSING
2y 5m to grant Granted Feb 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
99%
With Interview (+35.2%)
2y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 377 resolved cases by this examiner. Grant probability derived from career allow rate.