Last updated: April 19, 2026

Application No. 18/129,030

INSERTION ERROR REDUCTION WITH CONFIDENCE SCORE-BASED WORD FILTERING

Non-Final OA §103

Filed

Mar 30, 2023

Examiner

SAINT CYR, LEONARD

Art Unit

2658

Tech Center

2600 — Communications

Assignee

International Business Machines Corporation

OA Round

3 (Non-Final)

Interview Optional

— +18.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.

Based on 1144 resolved cases, 2023–2026

Examiner Intelligence

SAINT CYR, LEONARD View full profile →

Grants 77% — above average

Career Allow Rate

882 granted / 1144 resolved

+15.1% vs TC avg

Strong +18% interview lift

Without

With

+18.2%

Interview Lift

resolved cases with interview

Typical timeline

3y 1m

Avg Prosecution

32 currently pending

Career history

1176

Total Applications

across all art units

Statute-Specific Performance

§101

17.8%

-22.2% vs TC avg

§103

39.1%

-0.9% vs TC avg

§102

28.0%

-12.0% vs TC avg

§112

2.2%

-37.8% vs TC avg

Black line = Tech Center average estimate • Based on career data from 1144 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 12/10/25 has been entered.
 
Response to Arguments
Applicant’s arguments, see pages 6 - 9, filed 12/10/25, with respect to claims 1 – 20 have been fully considered and are persuasive.  The rejection of claims 1 – 20 under 35 U.S.C 101 has been withdrawn. 
Applicant argues that the independent claims (Claims 1, 9, and 13) include the components or steps of the invention that provide, for example, the improvements to the technological process of computerized speech recognition by calculating an average of confidence for each character in a corresponding word, including space characters that appear after the last character of the word, and removing uncertain words using a threshold process based on the calculated confidence score, as achieved by the steps of: calculating a word-level confidence score by computing an average of confidence levels for each character in a word and a trailing space character delineating an end of the word (Amendment, pages 6 – 9).

4.	Applicant’s arguments with respect to claims 1 - 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant argues that the prior art of record computing confidence levels for each character in a word of an alphabet-based language and a trailing space character delineating an end of the word (Amendment, pages 9, 10).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4 – 10, 12 – 14, 16 -20 are rejected under 35 U.S.C. 103 as being unpatentable over Harada (US PAP 2013/0096918) in view of MCCARTNEY, Jr.et al. (US PAP 2020/0404386).
As per claims 1, 9, 13, Harada teaches a computer-implemented method, the method comprising:
calculating, using a computerized automatic speech recognition system, a word-level confidence score (“calculates the sum of the corresponding similarity and the corresponding connection score for every combination of the acoustic models used when the similarity is calculated”; paragraph 99); and
managing, using the computerized automatic speech recognition system, the word using a threshold process based on the calculated word-level confidence score (“determining unit 26c judges whether there is a sum among the plurality of calculated sums that exceeds a threshold value. If there is a sum that exceeds a threshold value, the character string corresponding to the largest sum among the sums that exceed a threshold value is determined as the character string corresponding to the voice signal.”; paragraphs 99, 112 - 114);
performing automatic speech recognition based on a result of the managing of the word; and presenting a result of the automatic speech recognition to a user (“The output unit 27 transmits the character string determined for each of the frames to the output unit 22 so as to display the character string on a screen as the recognition result of the voice.”; paragraphs 41, 42, 100).
However, Harada does not specifically teach computing confidence levels for each character in a word of an alphabet-based language and a trailing space character delineating an end of the word.
MCCARTNEY, Jr.et al. discloses processing logic may, subtract median duration timing values for each low confidence caption character string that precedes the anchored caption character string in the sentence fragment to determine the start time of the sentence fragment. For example, if the sentence fragment contains caption character strings “fried served with chutney” where confidence values are {fried=0.72, served=0.18, with=1.00, chutney=0.34}, then processing logic may identify the caption character string “with” as a high confidence caption character string and anchor the character string using the start and end time… The end time for the sentence fragment may similarly be calculated by adding median duration times corresponding to each of the trailing low confidence caption character strings until the end of the sentence fragment is reached. For example, the end time may be estimated by adding the median duration for the 7-character string “chutney” to the end time of the anchor caption character string “with”(paragraph 85).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to calculate the average of the confidence levels as taught by Lee in Harada, because that would help provide accurate alignment of translated audio portions generated from translated caption speech with durations and timings of original audio speech in the video (paragraph 28).

As per claims 2, 10, 14, Harada in view of MCCARTNEY, Jr.et al. further disclose managing the word comprises deleting the word based on the calculated word-level confidence score and a given threshold (“Processing logic may be configured to normalize the translated character strings in the translation language caption data in order to remove non-spoken text or special characters from the translated character strings… generated character strings are within a threshold”; MCCARTNEY, Jr.et al., paragraphs 54 – 58, 61 – 66, 79).

As per claims 4, 12, 16, Harada in view of MCCARTNEY, Jr.et al. further disclose applying a first weight to the confidence level of the trailing space character and a second weight to the confidence level for each character in the word, the application occurring prior to the computing of the average of the confidence levels for each character in the word and the trailing space character (“If a sentence fragment contains a set of caption character strings near the middle of the sentence fragment with high confidence values and another set of caption character strings in the beginning and the end of the sentence that have lower confidence values, then processing logic may use the set of caption character strings with the high confidence values as an anchor for determining timing information for the sentence fragment.”; MCCARTNEY, Jr.et al., paragraphs 83 - 86; Harada, paragraph 99).

As per claims 5, 17, Harada in view of MCCARTNEY, Jr.et al. further disclose assigning the second weight for each character in the word independently on a letter-by-letter basis (MCCARTNEY, Jr.et al., paragraphs 83 - 86; Harada, paragraph 99).

As per claims 6, 18, Harada in view of MCCARTNEY, Jr.et al. further disclose separately evaluating the confidence level of a first character of the word and basing the confidence level of the word on the separate evaluation (“the recognizing device 20 calculates the connection score so as to be higher as the plurality of words of the character string which is used to calculate the similarity is closer to each other. Therefore, the recognizing device 20 determines the character string corresponding to the input voice signal by adding not only the similarity but also the connection score.”; Harada paragraphs 41, 42).

As per claims 7, 19, Harada in view of MCCARTNEY, Jr.et al. further disclose merging confidence levels on a letter-by-letter basis from each instance of a same word, maintaining a highest confidence level of a letter of a given position and discarding remaining confidence levels of the letter of the given position (“The verifying unit 26 determines a character string corresponding to a sum that exceeds a threshold value and has the largest value among a plurality of calculated sums as a character string corresponding to the voice signal”; Harada, paragraphs 42, 79).

As per claims 8, 20, Harada in view of MCCARTNEY, Jr.et al. further disclose basing the confidence level for each character on a corresponding log-likelihood (“When the voice is recognized to calculate the similarity (probability value), the acoustic model is compared with the voice signal”; Harada, paragraph 76).

Allowable Subject Matter
Claims 3, 11, and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.  The following is a statement of reasons for the indication of allowable subject matter:  
As to claims 3, 11, 15, the prior art made of record does not teach or suggest that  the word is deleted in response to the confidence level of the entire word being less than a first threshold a and the confidence level of a first character of the word being greater than a second threshold B

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD SAINT-CYR whose telephone number is (571)272-4247. The examiner can normally be reached Monday- Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LEONARD SAINT-CYR/           Primary Examiner, Art Unit 2658

Read full office action

Prosecution Timeline

Mar 30, 2023

Application Filed

Mar 08, 2025

Non-Final Rejection — §103

Jul 10, 2025

Response Filed

Aug 20, 2025

Applicant Interview (Telephonic)

Sep 09, 2025

Final Rejection — §103

Nov 24, 2025

Applicant Interview (Telephonic)

Nov 29, 2025

Examiner Interview Summary

Dec 10, 2025

Request for Continued Examination

Dec 15, 2025

Response after Non-Final Action

Feb 02, 2026

Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

18/397,693

Patent 12603100

SYSTEM AND METHOD FOR OPTIMIZED AUDIO MIXING

2y 5m to grant Granted Apr 14, 2026

18/065,588

Patent 12597415

VOICE RECOGNITION GRAMMAR SELECTION BASED ON CONTEXT

2y 5m to grant Granted Apr 07, 2026

18/442,239

Patent 12592227

DIALOG UNDERSTANDING DEVICE AND DIALOG UNDERSTANDING METHOD

2y 5m to grant Granted Mar 31, 2026

18/496,523

Patent 12591765

SYSTEMS AND METHODS FOR BUILDING A CUSTOMIZED GENERATIVE ARTIFICIAL INTELLIGENT PLATFORM

2y 5m to grant Granted Mar 31, 2026

18/561,788

Patent 12585884

DIALOGUE APPARATUS, DIALOGUE METHOD, AND PROGRAM

2y 5m to grant Granted Mar 24, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.

Prosecution Projections

3-4

Expected OA Rounds

77%

Grant Probability

95%

With Interview (+18.2%)

3y 1m

Median Time to Grant

High

PTA Risk

Based on 1144 resolved cases by this examiner. Grant probability derived from career allow rate.