Prosecution Insights
Last updated: April 19, 2026
Application No. 18/735,672

SPEECH RECOGNITION METHOD AND APPARATUS

Final Rejection §103
Filed
Jun 06, 2024
Examiner
LELAND III, EDWIN S
Art Unit
2654
Tech Center
2600 — Communications
Assignee
Huawei Technologies Co., Ltd.
OA Round
2 (Final)
75%
Grant Probability
Favorable
3-4
OA Rounds
2y 5m
To Grant
74%
With Interview

Examiner Intelligence

Grants 75% — above average
75%
Career Allow Rate
338 granted / 452 resolved
+12.8% vs TC avg
Minimal -0% lift
Without
With
+-0.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 5m
Avg Prosecution
18 currently pending
Career history
470
Total Applications
across all art units

Statute-Specific Performance

§101
15.3%
-24.7% vs TC avg
§103
45.4%
+5.4% vs TC avg
§102
16.8%
-23.2% vs TC avg
§112
14.0%
-26.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 452 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Priority Receipt is acknowledged of papers submitted under 35 U.S.C. 119(a)-(d), which papers have been placed of record in the file. Information Disclosure Statement The information disclosure statements (IDS) submitted on 9/26/2024, 12/19/2024, 1/19/2025, 7/14/2025 and 12/15/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner. Status of Claims Claims 1-20 are pending in this application. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-3, 10-12 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Du et al. (Chinese Patent Application Publication CN111063162, listed in IDS dated 12/19/2024) in view of Anand et al. (U.S. Patent 12,217,749). As per claims 1, 10 and 20, Du et al. discloses: An apparatus (Figure 3 and Paragraph [0073] - multi-modal recognition is performed on the image data and audio data), comprising: at least one processor (Paragraphs [0038] & [0059] - includes a memory and a processor, the memory stores a computer program); and at least one computer-readable storage medium storing a program that is executable by the at least one processor (Paragraph [0038] - includes a memory and a processor, the memory stores a computer program), the program comprising instructions for: obtaining a first speech text (Paragraph [0073] - When the user recognizes the preset key sentence in the code word data table through voice recognition,); and obtaining, based on the first speech text, first modal information that matches the first speech text (Paragraph [0073] - Specifically, audio detection may include sound event detection and voice information detection. Further, it may include performing feature extraction on the audio data, outputting text information according to the extracted feature result,), wherein a modality indicated by the first modal information is a first modality in a plurality of preset modalities (Paragraphs [0073-0074] - Different users have different settings… identify whether there are target data that triggers the generation of warning information…recognition is performed on the image data and audio data of the regulatory environment… In one of the embodiments, as shown in FIG. 4, the preset multimodal recognition model includes an image classification unit, an action classification unit, and an item detection and positioning unit); and determining, based on the first speech text and the first modal information, a first intention (Paragraph [0073] - If there are multiple preset keywords such as "Hand over the money", "Robbery" and "Help" in the text information, or perform sound event detection on the audio data, identify the dangerous sounds in the audio data such as gunshots, violent Impact sound, etc., or sentence intent recognition of audio data, when the sentence intent is recognized as a distress or threat to the safety of others 'personal property, it is judged that there is a dangerous situation in the car and an early warning message is generated.) and a first slot (Paragraphs [0070] & [0079] - intercepting the monitoring video frames in the car at a certain time interval… it can also send real-time image data and audio data in the car to the platform) that are indicated by the first speech text when the first speech text matches the first modal information (Paragraphs [0064] & [0073] - performing feature extraction on the audio data, outputting text information according to the extracted feature result… extract the fusion features of multiple data modalities. ) Du et al. fails to disclose but Anand in the same field of endeavor teaches: the first intention is an executable intention supported by the apparatus, and wherein the first slot is a slot holding data needed as input for executing the first intention (Column 18, lines 5-37). It would be obvious for a person having ordinary skill in the art at the effective filing date of the invention to modify the method, apparatus and computer readable medium of Du et al. with the slot based intention system of Anand et al. because it is a case of combining prior art elements according to known methods to yield predictable results. Claim 1 is directed to the method of using the apparatus of claim 10, so is rejected for similar reason. Claim 20 is directed to a non-transitory computer readable medium storing instructions to cause a processor to act as the apparatus of claim 10. As per claims 2 and 11, the combination of Du et al. and Anand et al. discloses all of the limitations of claims 1 and 10 above. Du et al. in the combination further discloses: obtaining, based on the first speech text, the first modal information that matches the first speech text comprises (Paragraphs [0073] & [0093] - when the sentence intent is recognized as a distress or threat to the safety of others 'personal property, it is judged that there is a dangerous situation in the car and an early warning message is generated…when it recognizes that the audio data has preset keywords and generates warning information): obtaining a multimodal selection vector based on the first speech text, wherein the multimodal selection vector indicates a probability of relevance between the first speech text and each of the plurality of preset modalities (Paragraph [0073] – example of a linked modality to the detected speech words: If there are multiple preset keywords such as "Hand over the money", "Robbery" and "Help" in the text information, or perform sound event detection on the audio data, identify the dangerous sounds in the audio data such as gunshots, violent Impact sound, etc.,); and obtaining the first modal information based on the multimodal selection vector (Paragraph [0098] - The preset multi-modal recognition model is based on the training of scene data constructed by multi-modal data under the regulatory environment. Detection methods are activated in certain modalities based on detected speech words, which is the functional equivalent of the use of a two bit multimodal selection vector). As per claims 3 and 12, the combination of Du et al. and Anand et al. discloses all of the limitations of claims 2 and 11 above. Du et al. in the combination further discloses: obtaining the multimodal selection vector based on the first speech text comprises (Paragraph [0073] – sentence intent is recognized): determining a first context category to which the first speech text belongs (Paragraph [0073] - If there are multiple preset keywords such as "Hand over the money", "Robbery" and "Help" in the text information ); and obtaining the multimodal selection vector based on the first context category, wherein the multimodal selection vector indicates a probability of relevance between the first context category and each of the plurality of preset modalities (Paragraph [0073], [0093] & [0068]- three examples of context relevance - when the sentence intent is recognized as a distress or threat to the safety of others 'personal property, it is judged that there is a dangerous situation in the car and an early warning message is generated; when it recognizes that the audio data has preset keywords and generates warning information; when the voiceprint recognizes that the user's voice is an adult voice and the user's age group recognition result is a child, the corresponding warning information is filtered.). As per claim 19, the combination of Du et al. and Anand et al. discloses all of the limitations of claim 10 above. Du et al. in the combination further discloses: performing an operation related to the first intention (Paragraph [0073] - sentence intent recognition of audio data, when the sentence intent is recognized as a distress or threat to the safety of others 'personal property, it is judged that there is a dangerous situation in the car and an early warning message is generated. ). Allowable Subject Matter Claims 4-9 and 13-18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. Examiner Notes The Examiner cites particular columns and line numbers in the references as applied to the claims above for the convenience of the Applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the Applicant fully considers the references in its entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or as disclosed by the Examiner. Communications via Internet e-mail are at the discretion of the applicant and require written authorization. Should the Applicant wish to communicate via e-mail, including the following paragraph in their response will allow the Examiner to do so: “Recognizing that Internet communications are not secure, I hereby authorize the USPTO to communicate with me concerning any subject matter of this application by electronic mail. I understand that a copy of these communications will be made of record in the application file.” Should e-mail communication be desired, the Examiner can be reached at Edwin.Leland@USPTO.gov Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to EDWIN S LELAND III whose telephone number is (571)270-5678. The examiner can normally be reached 8:00 - 5:00 M-F. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at 571-272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /EDWIN S LELAND III/Primary Examiner, Art Unit 2654
Read full office action

Prosecution Timeline

Jun 06, 2024
Application Filed
Dec 23, 2025
Non-Final Rejection — §103
Mar 16, 2026
Response Filed
Mar 24, 2026
Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12596869
DETECTING ARTIFICIAL INTELLIGENCE GENERATED TEXT
2y 5m to grant Granted Apr 07, 2026
Patent 12591602
TRAINING MACHINE LEARNING BASED NATURAL LANGUAGE PROCESSING FOR SPECIALTY JARGON
2y 5m to grant Granted Mar 31, 2026
Patent 12579370
MULTILINGUAL CHATBOT
2y 5m to grant Granted Mar 17, 2026
Patent 12579986
Systems and Methods for Distinguishing Between Human Speech and Machine Generated Speech
2y 5m to grant Granted Mar 17, 2026
Patent 12536385
SYSTEMS AND METHODS FOR A READING AND COMPREHENSION ASSISTANCE TOOL
2y 5m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
75%
Grant Probability
74%
With Interview (-0.3%)
2y 5m
Median Time to Grant
Moderate
PTA Risk
Based on 452 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month