Last updated: April 19, 2026
Application No. 18/337,984
TEXT NORMALIZATION AND INVERSE TEXT NORMALIZATION FOR MULTI-LINGUAL LANGUAGE MODELS

Final Rejection §103
Filed
Jun 20, 2023
Examiner
CAUDLE, PENNY LOUISE
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Nvidia Corporation
OA Round
2 (Final)
Interview Optional

— +15.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 69 resolved cases, 2023–2026
Examiner Intelligence

CAUDLE, PENNY LOUISE View full profile →
Grants 67% — above average
Career Allow Rate
46 granted / 69 resolved
+4.7% vs TC avg
Strong +16% interview lift
Without
With
+15.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 2m
Avg Prosecution
19 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
21.0%
-19.0% vs TC avg
§103
43.7%
+3.7% vs TC avg
§102
15.8%
-24.2% vs TC avg
§112
17.1%
-22.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 69 resolved cases
Office Action

§103
DETAILED ACTION
This examination is in response to the communication filed on 11/06/2025. Claims 1-20 are currently pending, wherein claims 1, 6-8, 10,14,16,18, and 20 have been amended.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment/Argument
Applicant’s amendments/arguments filed 11/06/2025, with respect to the rejection of claims 1-20 have been fully considered and are persuasive.  The rejection of claims 1-20 under §101 has been withdrawn. 
Further, Applicant’s amendment overcome/address the objections to the specification and drawings and the rejection of claims 10, 18, and 20 under §112.
Applicant’s arguments with respect to the rejections of claims 1-20 under §102 and/or §103 have been fully considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1, 3-5, 7-10, 13, 14, 16, 17 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over by Beliga et al. "Text normalization for Croatian speech synthesis," 2011 Proceedings of the 34th International Convention MIPRO, Opatija, Croatia, 2011, pp. 1664-1669 (herein “Beliga”) in view of Huang (US 2003/0202641 A1; herein “Huang”)..
Regarding claim 1, Beliga teaches a method, comprising: 
obtaining a textual input corresponding to one or more semiotic classes (p. 1665, Fig. 1 input text; p. 1667 TABLE 1, step 1 “Read the input text…” and p. 1668, Section V teaches “the performance…was tested on the input text which contains typical Croatian NSW classes…”);
determining, based at least in part on the one or more semiotic classes, a set of tokens for the textual input (p. 1665, Figure 1, “Tokenizer” and section B. teaches “The text pre-processing module initially has to identify the NSW, and separate it from standard words, as shown in Figure 1” the tokenizer of Figure 1 inherently determines a set of tokens for the input text);
determining, for individual tokens of the set of tokens, a classification (P. 1665, Figure 1 “NSW classification” and Section B. teaches “The text pre-processing module initially has to identify the NSW, and separate it from standard words as shown in Figure 1” As shown in Fig. 1, the output of the tokenizer is input to the NSW classification, accordingly, the classification is determined for individual tokens);
determining, using one or more first rule-based algorithms, respective plain text representations for the individual tokens (p. 1665, Figure 1, “NORMALIZED FORM”; p. 1666, first column, teaches “The module initially classifies NSW as letters, numerals or combination as shown in the main classification tree in Figure 2…With suggested classification it is possible to retrieve algorithms that make the normalization more achievable”; p. 1667, Table 1, steps 2b-2c. teaches “If it is a NEW, find its class in the classification tree…The algorithms for normalization of subclass NUMBER are applied…The algorithms for normalization of subclass LETTER are applied…The algorithms for normalization of subclass COMBINED are applied…Write the expanded form of the NSW obtained in the process of normalization to the output”  );
determining, using one or more second rule-based algorithms  (p. 1667, TABLE 1, step 3 teaches “Repeat the procedure iteratively until the input text completely normalized” the iteration of the classification steps results in a combined plain text representation); and
generating, based at least on the combined plain text representation, an auditory representation corresponding to the textual input (p. 1667 TABLE 1, step 4 teaches “the normalized text is sent to the module for grapheme-to-phoneme conversion” the phoneme conversation is interpreted as an auditory representation of the textual input). 
Beliga fails to disclose determining a language for the textual input or that the one or more second rule-based algorithms are selected based, at least in part, on the language of the textual input.
Huang teaches TTS system architectures which function as a synthesizer for multiple languages where the language-specific information, e.g., special rules for linguistic analysis are loaded by the TTS engine at run-time so that it is possible to switch voices and languages as desired at run-time (Huang, ¶[0066]). 
Beliga differs from the claimed invention, as defined by claim 1, in that Beliga fails to explicitly disclose determining a corresponding language and selecting the rule-based algorithms based on the determined language as claimed. TTS systems which determine/select the language specific information/rules needed to generate synthesized speech in the desired language are known in the art as evidenced by Huang. Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, to have modified the TTS system of Beliga to detect/determine a corresponding input language and selected the rule-based algorithms based on the determined language in order to make it possible to switch voices and languages of the TTS system at run-time (Huang, ¶[0066].).
Regarding claim 3, the combination of Beliga and Huang teaches all of the elements of claim 1. In addition, Beliga further teaches the one or more semiotic classes include at least a first level class and one or more sub-classes (p. 1666 Figure 4 and p. 1667 Table 1, Step 2, subclass determination). 
Regarding claim 4, the combination of Beliga and Huang teaches all of the elements of claim 1. In addition, Beliga further teaches providing, to a trained vocalizer, the combined plain text representation (p. 1667, Table 1, step 4 “The normalized text is sent to the module for grapheme-to-phoneme conversation” and p. 1669, first column teaches “The proposed normalization can easily be integrated with existing grapheme-to-phoneme conversation system [14] and speech generation modules of TTS synthesis system [15]…” the speech generation modules of a TTS synthesis inherently includes a trained vocalizer or vocoder).  
Regarding claim 5, the combination of Beliga and Huang teaches all of the elements of claim 1. In addition, Beliga further teaches the one or more first rule-based algorithms are the same as the one or more second rule-based algorithms (p. 1667, TABLE 1, step 3 teaches “Repeat the procedure iteratively until the input text completely normalized” the iteration of the classification steps inherently includes utilization of the same rule-based algorithms as utilized in the previous classification step). 
Regarding claim 7, the combination of Beliga and Huang teaches all of the elements of claim 1(See detailed element mapping above). In addition, Huang further teaches determining, for the auditory output, a desired language (¶[0066] teaches “Some language-specific information is necessary; there are acoustic inventories unique to each language and there are also special rules for linguistic analysis. These data, however, are stored externally in tables and parameter files, and are loaded by the TIS engine at run-time. Thus, in applications such as dialog or e-mail reading, it is possible to switch voices and languages as desired at run-time”); and selecting, based at least on the desired language, the one or more rule-based algorithms ([0066] teaches “Some language-specific information is necessary; there are acoustic inventories unique to each language and there are also special rules for linguistic analysis. These data, however, are stored externally in tables and parameter files, and are loaded by the TIS engine at run-time. Thus, in applications such as dialog or e-mail reading, it is possible to switch voices and languages as desired at run-time”)
Beliga differs from the claimed invention, as defined by claim 7, in that Beliga fails to explicitly disclose determining a desired language for the auditory output and selecting, based at least one the desired language, one or more rule-based algorithms as claimed. TTS systems which determine/select the language specific information/rules needed to generate synthesized speech in the desired language are known in the art as evidenced by Huang. Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, to have modified the TTS system of Beliga to detect/determine a desired output language and selected the rule-based algorithms based on the desired language in order to make it possible to switch voices and languages of the TTS system at run-time (Huang, ¶[0066].).
Regarding claim 8, Beliga teaches a system comprising: at least one processor (Beliga teaches the text normalization process as an integral part of a text-to-speech (TTS) synthesis system such a system inherently requires at least one processor) to:
determine one or more tokens for segments of an input (p. 1665, Figure 1, “Tokenizer” );
classify, using a rule-based grammar model for the language, the one or more tokens into a semiotic class (p. 1665, Figure 1, NSW classification and Section B teaches “The text pre-processing module initially has to identify the NSW, and separate it from standard words…Normalization module then classifies NSW according to the given taxonomy of Croatian language…” );
select a class rule-based grammar model, from one or more rule-based grammar models corresponding to the language, based at least on a respective semiotic class for a token of the one or more tokens (p. 1665, Figure 1, NSW classification and Section B teaches “The text pre-processing module initially has to identify the NSW, and separate it from standard words…Normalization module then classifies NSW according to the given taxonomy of Croatian language…Within each NSW class, certain rules are typical so the normalization can be carried out in a more standard way” and p. 1667, Table 1, step 2); 
generate, using the class rule-based grammar model, a textual output for the token (p. 1667, Table 1, step 2c. “Write the expanded form of the NSW obtained in the process of normalization to the output” the normalized tokens are interpreted as a textual output); and
generate, a combined textual output including respective textual outputs for each token of the one or more tokens (Fig. 1, “List of words” and p. 1665, Section B teaches “The module creates a list of words from the normalized text and it passes them forward to the module for grapheme-to-phoneme conversion”).
Beliga fails to disclose determining a language for the textual input or that the one or more second rule-based algorithms are selected based, at least in part, on the language of the textual input.
Huang teaches TTS system architectures which function as a synthesizer for multiple languages where the language-specific information, e.g., special rules for linguistic analysis are loaded by the TTS engine at run-time so that it is possible to switch voices and languages as desired at run-time (Huang, ¶[0066]). 
Beliga differs from the claimed invention, as defined by claim 8, in that Beliga fails to explicitly disclose determining a corresponding language and selecting the rule-based algorithms based on the determined language as claimed. TTS systems which determine/select the language specific information/rules needed to generate synthesized speech in the desired language are known in the art as evidenced by Huang. Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, to have modified the TTS system of Beliga to detect/determine a corresponding input language and selected the rule-based algorithms based on the determined language in order to make it possible to switch voices and languages of the TTS system at run-time (Huang, ¶[0066].).
Regarding claim 10, the combination of Beliga and Huang teaches all of the elements of claim 8. In addition, Beliga further teaches the one or more processing units are further to provide the textual output to a trained vocalizer (p. 1667, Table 1, step 4 “The normalized text is sent to the module for grapheme-to-phoneme conversation” and p. 1669, first column teaches “The proposed normalization can easily be integrated with existing grapheme-to-phoneme conversation system [14] and speech generation modules of TTS synthesis system [15]…” the speech generation modules of a TTS synthesis inherently includes a trained vocalizer or vocoder). 
Regarding claim 13, the combination of Beliga and Huang teaches all of the elements of claim 8. In addition, Beliga further teaches the input is a first textual input that is in a different form from the textual output (p. 1667, Table 1, step 2c teaches “Write the expanded form of the NSW obtained in the process of normalization to the output” Therefore, the expanded textual output in step 2c is in a different form from the text input received/read in Step 1). 
Regarding claim 14, the combination of Beliga and Huang teaches all of the elements of claim 8. In addition, Beliga further teaches the one or more processing units are further to select a classification rule-based grammar model to classify the one or more tokens into the semiotic class (p. 1667, Table 1, steps 2a-2c teaches the normalization algorithm is selected based on the class/subclass of the NSW, i.e., the semiotic class). 
Regarding claim 16, Beliga teaches a processor comprising: 
processing circuitry (Beliga teaches the text normalization process as an integral part of a text-to-speech (TTS) synthesis system such a system inherently requires at least one processor) to classify a set of tokens into respective semiotic classes (p. 1665, Figure 1, NSW classification and Section B teaches “The text pre-processing module initially has to identify the NSW, and separate it from standard words…Normalization module then classifies NSW according to the given taxonomy of Croatian language…”) using a classification rule-based grammar model for an identified language associated with the set of tokens (p. 1667, Table 1, steps 2a-2c teaches the normalization algorithm is selected based on the class/subclass of the NSW, i.e., the semiotic class) and 
to process each token, based at least on the respective semiotic class, with a trained class rule-based grammar model (p. 1665, Figure 1, NSW classification and Section B teaches “The text pre-processing module initially has to identify the NSW, and separate it from standard words…Normalization module then classifies NSW according to the given taxonomy of Croatian language…Within each NSW class, certain rules are typical so the normalization can be carried out in a more standard way” and p. 1667, Table 1, step 2), and 
to combine each processed token into a combined output text sequence (p. 1667, TABLE 1, step 3 teaches “Repeat the procedure iteratively until the input text completely normalized” the iteration of the classification steps results in a combined plain text representation). 
Beliga fails to disclose using language-specific rules.
Huang teaches TTS system architectures which function as a synthesizer for multiple languages where the language-specific information, e.g., special rules for linguistic analysis are loaded by the TTS engine at run-time so that it is possible to switch voices and languages as desired at run-time (Huang, ¶[0066]). 
Beliga differs from the claimed invention, as defined by claim 16, in that Beliga fails to explicitly disclose selecting the rule-based algorithms based on the determined language as claimed. TTS systems which determine/select the language specific information/rules needed to generate synthesized speech in the desired language are known in the art as evidenced by Huang. Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, to have modified the TTS system of Beliga to selected the rule-based algorithms based on the determined language in order to make it possible to switch voices and languages of the TTS system at run-time (Huang, ¶[0066].).
Regarding claims 9 and 17, the combination of Beliga and Huang teaches all of the elements of claims 8 and 16 (see detailed element mappings above). In addition, Beliga further teaches the system comprises at least one of: a system for performing simulation operations; a system for performing simulation operations to test or validate autonomous machine applications; a system for performing digital twin operations; a system for performing light transport simulation; a system for rendering graphical output; a system for performing deep learning operations; a system implemented using an edge device; a system for generating or presenting virtual reality (VR) content; a system for generating or presenting augmented reality (AR) content; a system for generating or presenting mixed reality (MR) content; a system incorporating one or more Virtual Machines (VMs);a system for performing operations for a conversational Al application; a system for performing operations for a generative Al application; a system for performing operations using a language model; a system for performing one or more generative content operations using a large language model (LLM); a system implemented at least partially in a data center; a system for performing hardware testing using simulation; a system for performing one or more generative content operations using a language model; a system for synthetic data generation; a collaborative content creation platform for 3D assets; or a system implemented at least partially using cloud computing resources (Beliga teaches the text normalization method/module is for integration into existing text-to-speech synthesis systems. Existing text-to-speech system perform operations using a language model, performing one or more generative operations using a LLM and generate synthetic data, thus Beliga teaches the text normalization can be integrated into one or more of these systems).
Regarding claim 20, the combination of Beliga and Huang teaches all of the elements of claim 19. In addition, Beliga further teaches the set of tokens is extracted from a textual input (p. 1665, Figure 1, “Text input” and “Tokenizer” ).
Claims 11, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Gaur et al. (US 2023/0289536 A1; herein “Gaur”) in view of Huang. 
Regarding claim 11, Gaur teaches a system comprising: at least one processor (Fig. 10, processor(s) 1014) to:
determine one or more tokens for segments of an input (Fig. 1, Tagger 140; Fig. 7, Step 510 “Output Stream of Tokens”; and );
classify, using a rule-based grammar model for the language, the one or more tokens into a semiotic class (Fig. 7, Step 716 “Tag the Stream of Tokens” and ¶[0054] teaches “Operation 716 includes tagging, by tagger 640, stream of tokens 122 with a first tag ( e.g., ITN tag 223) to produce tagged stream of token 650. The first tag indicates whether ITN is to be performed on the at least one token of tagged stream of tokens 650.”);
select a class rule-based grammar model, from one or more rule-based grammar models corresponding to the language, based at least on a respective semiotic class for a token of the one or more tokens (Fig. 7, Step 718 “Convert” and ¶[0054] teaches “In flowchart 700, it is the language converter that determines which normalization category of a plurality of normalization categories to use for converting a lexical language form to a natural language form. This occurs as part of operation 718, which also includes, based on at least the first tag, converting by a language converter, at least one token of tagged stream of tokens 650, from a first lexical language form to a first natural language form”); 
generate, using the class rule-based grammar model, a textual output for the token (Fig. 7, Step 720 “Output Natural Language” and ¶[0054] teaches “Operation 720 outputs the natural language representation of steam of tokens 122, based on at least the natural language form”); 
generate, a combined textual output including respective textual outputs for each token of the one more token (Fig. 5, step 520 “Output Natural Language” ),
wherein the class rule-based grammar model is a weighted finite state transducer (WFST) (¶[0047] teaches “each language converter of plurality of category-specific natural language converters 160 comprises a WFST”).
Gaur ails to disclose determining a language for the textual input or that the one or more second rule-based algorithms are selected based, at least in part, on the language of the textual input.
Huang teaches TTS system architectures which function as a synthesizer for multiple languages where the language-specific information, e.g., special rules for linguistic analysis are loaded by the TTS engine at run-time so that it is possible to switch voices and languages as desired at run-time (Huang, ¶[0066]). 
Gaur differs from the claimed invention, as defined by claim 11, in that Gaur fails to explicitly disclose determining a corresponding language and selecting the rule-based algorithms based on the determined language as claimed. TTS systems which determine/select the language specific information/rules needed to generate synthesized speech in the desired language are known in the art as evidenced by Huang. Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, to have modified the TTS system of Gaur to detect/determine a corresponding input language and selected the rule-based algorithms based on the determined language in order to make it possible to switch voices and languages of the TTS system at run-time (Huang, ¶[0066].).
Regarding claim 12, Gaur teaches a system comprising: at least one processor (Fig. 10, processor(s) 1014) to:
determine one or more tokens for segments of an input (Fig. 1, Tagger 140; Fig. 7, Step 510 “Output Stream of Tokens”; and );
classify, using a rule-based grammar model for the language, the one or more tokens into a semiotic class (Fig. 7, Step 716 “Tag the Stream of Tokens” and ¶[0054] teaches “Operation 716 includes tagging, by tagger 640, stream of tokens 122 with a first tag ( e.g., ITN tag 223) to produce tagged stream of token 650. The first tag indicates whether ITN is to be performed on the at least one token of tagged stream of tokens 650.”);
select a class rule-based grammar model, from one or more rule-based grammar models corresponding to the language, based at least on a respective semiotic class for a token of the one or more tokens (Fig. 7, Step 718 “Convert” and ¶[0054] teaches “In flowchart 700, it is the language converter that determines which normalization category of a plurality of normalization categories to use for converting a lexical language form to a natural language form. This occurs as part of operation 718, which also includes, based on at least the first tag, converting by a language converter, at least one token of tagged stream of tokens 650, from a first lexical language form to a first natural language form”); 
generate, using the class rule-based grammar model, a textual output for the token (Fig. 7, Step 720 “Output Natural Language” and ¶[0054] teaches “Operation 720 outputs the natural language representation of steam of tokens 122, based on at least the natural language form”); 
generate, a combined textual output including respective textual outputs for each token of the one more token (Fig. 5, step 520 “Output Natural Language” ),
wherein the input is an auditory input (Fig. 7, Step 506 “Receive Audio” and ¶[0044] teaches “Microphone 102 captures audio input 104 comprising human speech in operation 504”). 
Gaur ails to disclose determining a language for the textual input or that the one or more second rule-based algorithms are selected based, at least in part, on the language of the textual input.
Huang teaches TTS system architectures which function as a synthesizer for multiple languages where the language-specific information, e.g., special rules for linguistic analysis are loaded by the TTS engine at run-time so that it is possible to switch voices and languages as desired at run-time (Huang, ¶[0066]). 
Gaur differs from the claimed invention, as defined by claim 11, in that Gaur fails to explicitly disclose determining a corresponding language and selecting the rule-based algorithms based on the determined language as claimed. TTS systems which determine/select the language specific information/rules needed to generate synthesized speech in the desired language are known in the art as evidenced by Huang. Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, to have modified the TTS system of Gaur to detect/determine a corresponding input language and selected the rule-based algorithms based on the determined language in order to make it possible to switch voices and languages of the TTS system at run-time (Huang, ¶[0066].).
Regarding claim 19, Gaur teaches a system comprising: at least one processor (Fig. 10, processor(s) 1014) to:
determine one or more tokens for segments of an input (Fig. 1, Tagger 140; Fig. 7, Step 510 “Output Stream of Tokens”; and );
classify, using a rule-based grammar model for the language, the one or more tokens into a semiotic class (Fig. 7, Step 716 “Tag the Stream of Tokens” and ¶[0054] teaches “Operation 716 includes tagging, by tagger 640, stream of tokens 122 with a first tag ( e.g., ITN tag 223) to produce tagged stream of token 650. The first tag indicates whether ITN is to be performed on the at least one token of tagged stream of tokens 650.”);
select a class rule-based grammar model, from one or more rule-based grammar models corresponding to the language, based at least on a respective semiotic class for a token of the one or more tokens (Fig. 7, Step 718 “Convert” and ¶[0054] teaches “In flowchart 700, it is the language converter that determines which normalization category of a plurality of normalization categories to use for converting a lexical language form to a natural language form. This occurs as part of operation 718, which also includes, based on at least the first tag, converting by a language converter, at least one token of tagged stream of tokens 650, from a first lexical language form to a first natural language form”); 
generate, using the class rule-based grammar model, a textual output for the token (Fig. 7, Step 720 “Output Natural Language” and ¶[0054] teaches “Operation 720 outputs the natural language representation of steam of tokens 122, based on at least the natural language form”); 
generate, a combined textual output including respective textual outputs for each token of the one more token (Fig. 5, step 520 “Output Natural Language” ), 
wherein the set of tokens is extracted from an auditory input (Fig. 7, Step 506 “Receive Audio” and ¶[0044] teaches “Microphone 102 captures audio input 104 comprising human speech in operation 504”).
Gaur ails to disclose determining a language for the textual input or that the one or more second rule-based algorithms are selected based, at least in part, on the language of the textual input.
Huang teaches TTS system architectures which function as a synthesizer for multiple languages where the language-specific information, e.g., special rules for linguistic analysis are loaded by the TTS engine at run-time so that it is possible to switch voices and languages as desired at run-time (Huang, ¶[0066]). 
Gaur differs from the claimed invention, as defined by claim 11, in that Gaur fails to explicitly disclose determining a corresponding language and selecting the rule-based algorithms based on the determined language as claimed. TTS systems which determine/select the language specific information/rules needed to generate synthesized speech in the desired language are known in the art as evidenced by Huang. Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, to have modified the TTS system of Gaur to detect/determine a corresponding input language and selected the rule-based algorithms based on the determined language in order to make it possible to switch voices and languages of the TTS system at run-time (Huang, ¶[0066].).
Claims 2, 15, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Beliga and Huang as applied to claims 1, 14, and 16 above, and further in view of Gaur.
Regarding claim 2,15 and 18, the combination of Beliga and Huang teaches all of the elements of claim 1 (see detailed element mapping above). In addition, Beliga teaches that using rule-based weight finite-state transducers for text normalization where known in the art. (See p. 1665, second column, Section III, second paragraph). However, the combination of Beliga and Huang fails to specifically disclose the one or more first rule-based algorithms are incorporated into a library of weighted finite state transducers (WFSTs) as recited in claim 2 ; that the classification rule-based grammar model is a weighted finite state transducer (WFST) as recited in claim 15, or that each of a classifier to classify the set of tokens and the trained class rule-based grammar model include weighted finite state transducers (WFSTs) as recited in claim 18.
	Gaur teaches an on-device streaming inverse text normalization process which utilized normalization categories, e.g., semiotic classes, to select a category-specific natural language converter (e.g., weighted finite state transducer, WFSTs) for normalization of tagged tokens. (Gaur, Abstract) More specifically, Gaur teaches one or more first rule-based algorithms are incorporated into a library of weighted finite state transducers (WFSTs) (¶[0047] teaches “each language converter of [the] plurality of category-specific natural language converters 160 comprises a WFST”).
	The combination of Beliga and Huang differs from the claimed invention, as defined by claim 2, in that the combination fails to explicitly disclose utilizing a plurality of WFSTs as the semiotic class specific converters. Utilizing a plurality of WFST to provide a semiotic class-specific converters are known in the art as evidenced by Gaur. Therefore, it would have been obvious to one having ordinary skill in the art, before the effective filing date of the invention, to have modified the text normalization module of Beliga to utilize semiotic class-specific WFSTs as taught by Gaur as it mere constitutes the substitute of known elements to achieve the predictable result of providing text normalization and classification of the tokens into semiotic classes.
Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Beliga and Huang as applied to claim 1 above, and further in view of Vu et al. (US 2023/0141853 A1; herein “Vu”).
Regarding claim 6, the combination of Beliga and Huang teaches all of the elements of claim 1 (See detailed element mapping above). Although the combination of Beliga and Huang teaches selecting, based at least on the corresponding language, the one or more rule-based algorithms (See Huang ¶[0066] and discussion above with respect to claim 1), the combination of Beliga and Huang fails to explicitly teach determining, for the textual input, the language based on characters represented in the one or more segments as claimed.
Vu teaches a language detection system and method that includes, inter alia, determining, for the textual input, the language based on characters represented in the one or more segments (¶[0004] teaches “Language detection is the task of identifying the language of a textual input” and ¶[005] teaches “…presenting the input text to a wide network as a sequence of characters, or a sequence of n-grams or subwords. Techniques disclose herein can provide language detection for textual inputs” ). 
The combination of Beliga and Huang differs from the claimed invention, as defined by claim 6, in that the combination fails to explicitly disclose determining a corresponding language based on the characters of the text input as claimed. Language detection based on the sequence of characters in a textual input is known in the art as evidenced by Vu. Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the invention, to have modified the system taught by the combination of Beliga and Huang to include detecting the input text language based on the sequence of characters in the input as taught by Vu as it merely constitutes the combination of known processes to achieve the predictable result of detecting the language of the textual input.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PENNY L CAUDLE whose telephone number is (703)756-1432. The examiner can normally be reached M-Th 8:00 am to 5:00 pm eastern.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PENNY L CAUDLE/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Jun 20, 2023
Application Filed
Jul 30, 2025
Non-Final Rejection — §103
Oct 28, 2025
Applicant Interview (Telephonic)
Oct 28, 2025
Examiner Interview Summary
Nov 06, 2025
Response Filed
Jan 28, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/302,683
Patent 12592243
METHOD AND ELECTRONIC DEVICE FOR PERSONALIZED AUDIO ENHANCEMENT
2y 5m to grant Granted Mar 31, 2026
18/038,631
Patent 12573371
VOCABULARY SELECTION FOR TEXT PROCESSING TASKS USING POWER INDICES
2y 5m to grant Granted Mar 10, 2026
18/051,429
Patent 12566924
Apparatus for Evaluating and Improving Response, Method and Computer Readable Recording Medium Thereof
2y 5m to grant Granted Mar 03, 2026
18/334,876
Patent 12567433
AUTOMATED EVALUATION OF SYNTHESIZED SPEECH USING CROSS-MODAL AND CROSS-LINGUAL TRANSFER OF LANGUAGE ENCODING
2y 5m to grant Granted Mar 03, 2026
18/307,199
Patent 12554937
FEW SHOT INCREMENTAL LEARNING FOR NAMED ENTITY RECOGNITION
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
67%
Grant Probability
82%
With Interview (+15.5%)
3y 2m
Median Time to Grant
Moderate
PTA Risk
Based on 69 resolved cases by this examiner. Grant probability derived from career allow rate.