Prosecution Insights
Last updated: April 19, 2026
Application No. 18/572,200

Chinese Spelling Correction Method, System, Storage Medium and Terminal

Non-Final OA §101§103
Filed
Dec 20, 2023
Examiner
LAM, PHILIP HUNG FAI
Art Unit
2656
Tech Center
2600 — Communications
Assignee
Shanghai Midu Science And Technology Co. Ltd.
OA Round
1 (Non-Final)
83%
Grant Probability
Favorable
1-2
OA Rounds
2y 8m
To Grant
99%
With Interview

Examiner Intelligence

Grants 83% — above average
83%
Career Allow Rate
107 granted / 129 resolved
+20.9% vs TC avg
Strong +46% interview lift
Without
With
+45.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
29 currently pending
Career history
158
Total Applications
across all art units

Statute-Specific Performance

§101
23.7%
-16.3% vs TC avg
§103
53.7%
+13.7% vs TC avg
§102
11.1%
-28.9% vs TC avg
§112
5.3%
-34.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 129 resolved cases

Office Action

§101 §103
DETAILED ACTION Introduction This office action is in response to Applicant’s submission filed on 8/10/2023. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claim 9 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter because they recite “A storage medium with a computer program stored thereon” where the storage medium is not claimed to be non-transitory. Therefore, the broadest reasonable interpretation of “storage medium or computer readable storage media” includes signals per se, rendering claim 9 subject matter ineligible. It is noted that Applicant’s statement in para. 0083 of the originally filed specification stating “The storage medium of the present disclosure stores a computer program, which is characterized in that when the program is executed by a processor, the above-mentioned Chinese spelling correction method is implemented. The storage medium includes: ROM, RAM, magnetic disks, USB disks, memory cards or optical disks and other media that can store program codes.” However, it does not explicitly exclude transitory propagating signals from being one potential form of the storage medium. The broadest reasonable interpretation of a claim drawn to a computer readable media or storage medium (also called machine readable medium and other such variations) typically covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media, particularly when the specification is absent an explicit definition or is silent. See MPEP 2111.01. When the broadest reasonable interpretation of a claim covers a signal per se, the claim must be rejected under 35 U.S.C. § 101 as covering non-statutory subject matter. See In re Nuijten, 500 F.3d 1346, 1356-57 (Fed. Cir. 2007) (transitory embodiments are not directed to statutory subject matter) and Interim Examination Instructions for Evaluating Subject Matter Eligibility Under 35 U.S.C. § 101, Aug. 24, 2009; p. 2. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1, 4, 6-10 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claim 1 recites a method that, under the broadest reasonable interpretation, claims limitations that cover performance of the limitations in the human mind with the assistance of physical aids (e.g., pen and paper), but for the recitation of generic computer components. That is other than imply the use of a generic computer or computer components (such as text files), nothing in these claim limitations precludes the steps from practically being performed in the mind. As a whole, claim 1 pertains to performing Chinese spelling correction, which is a mental process that a human can do. Individually, each of the limitations also pertains to a mental process, for example: obtaining a text sequence, a pinyin sequence, and a picture sequence from a Chinese input text file; (e.g., a human can obtain information from a text file, including information such as text string, pinyin string, and a picture of the character string.) extracting word meaning features, phonetic features, and glyph features of the Chinese input text file based on the text sequence, the pinyin sequence and the picture sequence respectively; (e.g., the human can extract feature information from a text file, including information such as text string, pinyin string, and a picture of the character string.) integrating the word meaning features, the phonetic features and the glyph features; (e.g., the human can combine the feature information.) performing a correctness prediction, a pinyin prediction, and a character prediction on the Chinese input text file based on the integrated word meaning features, phonetic features, and glyph features to obtain a Chinese output text file which has been error-corrected; (e.g., the human can perform a correctness prediction, pinyin prediction and character prediction based on the features to obtain an output file. The human can just write down the correct words/characters and pinyin on a piece of paper.) and performing a rationality judgment on the Chinese output text file to obtain a final Chinese text file. (e.g., the human can figure out in the mind if the word/character and pinyin are correct in order to come up with final result.) The judicial exception is not integrated into a practical application. In particular, the claims only recites generic computing components. Such generic computing components are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function of receiving, determining, or outputting information) such that they amount to no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional limitations of using generic computer components amount to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. Claim 1 is not patent eligible. The examiner further notes that the use of claimed generic computer components (“text files”) invokes such generic computer components “merely as a tool to perform an existing process”. MPEP 2106.05(f). MPEP 2106.05(f) further explains: Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more. See Affinity Labs v. DirecTV, 838 F.3d 1253, 1262, 120 USPQ2d 1201, 1207 (Fed. Cir. 2016) (cellular telephone); TLI Communications LLC v. AV Auto, LLC, 823 F.3d 607, 613, 118 USPQ2d 1744, 1748 (Fed. Cir. 2016) (computer server and telephone unit). Similarly, "claiming the improved speed or efficiency inherent with applying the abstract idea on a computer" does not integrate a judicial exception into a practical application or provide an inventive concept. Intellectual Ventures I LLC v. Capital One Bank (USA), 792 F.3d 1363, 1367, 115 USPQ2d 1636, 1639 (Fed. Cir. 2015). Claim 1 recites generic computer components (“text file”), with respect to performing tasks. MPEP 2106.05(d) and (f) further provides examples of court decisions where the courts found generic computing components to be mere instructions to apply a judicial exception, and further explains “increased speed” (e.g., using a computer to increase the speed of an otherwise mental process) does not provide an inventive concept. For example: A commonplace business method or mathematical algorithm being applied on a general purpose computer, Alice Corp. Pty. Ltd. V. CLS Bank Int’l, 573 U.S. 208, 223, 110 USPQ2d 1976, 1983 (2014); Gottschalk v. Benson, 409 U.S. 63, 64, 175 USPQ 673, 674 (1972); Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015). A process for monitoring audit log data that is executed on a general-purpose computer where the increased speed in the process comes solely from the capabilities of the general-purpose computer, FairWarning IP, LLC v. Iatric Sys., 839 F.3d 1089, 1095, 120 USPQ2d 1293, 1296 (Fed. Cir. 2016) (emphasis added). Performing repetitive calculations. Bancorp Services v. Sun Life, 687 F.3d 1266, 1278, 103 USPQ2d 1425, 1433 (Fed. Cir. 2012) ("The computer required by some of Bancorp’s claims is employed only for its most basic function, the performance of repetitive calculations, and as such does not impose meaningful limits on the scope of those claims.") Claim 8 recites a system that corresponds to the method of claim 1 and is therefore rejected under the same grounds as claim 1 above. While claim 8 further recites, “an acquisition module”, “an extraction module”, “an integration module”, “an error correction module”, and “a judgment module”, these are merely generic computer components recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component. Therefore, none of these limitations (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception, because in either case the additional limitations merely utilize generic computer components that amounts to no more than mere instructions to apply the exception using generic computer function. Claim 8 is not patent eligible. Claim 9 recites a storage medium that corresponds to the method of claim 1 and is therefore rejected under the same grounds as claim 1 above. While claim 9 further recites a “computer program is executed by a processor”, these are merely generic computer components recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component. Therefore, none of these limitations (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception, because in either case the additional limitations merely utilize generic computer components that amounts to no more than mere instructions to apply the exception using generic computer function. Claim 9 is not patent eligible. Claim 10 recites a terminal that corresponds to the method of claim 1 and is therefore rejected under the same grounds as claim 1 above. While claim 10 further recites a “a processor and a memory; wherein the memory stores computer programs; and wherein the processor executes the computer programs stored in the memory”, these are merely generic computer components recited at a high-level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer component. Therefore, none of these limitations (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception, because in either case the additional limitations merely utilize generic computer components that amounts to no more than mere instructions to apply the exception using generic computer function. Claim 10 is not patent eligible. Claims 4, 6 and 7 depend from claims 1, do not remedy any of the deficiencies of claim 1, and therefore are rejected on the same grounds as claim 1 above. Claim 4 further recites: wherein the integrating the word meaning features, the phonetic features and the glyph features comprises following steps: increasing a weight of the word meaning features when a word meaning error occurs; (e.g., the human can evaluate meaning of a word, like did the word accurately convey the intended meaning, if not, pay more attention of the word in the future.) increasing a weight of the phonetic features when a phonetic error occurs; (e.g., the human can evaluate the pronunciation of a word, like was the word spoken correctly, if not, practice the pronunciation more.) and increasing a weight of the glyph features when a glyph error occurs. (e.g., the human can check the spelling/stroke of the word/character, if the word is not spelled correctly, place more emphasize on the visual form of the characters or strokes.) Claim 6 further recite: performing a rationality judgment on the Chinese output text file based on a language model. (e.g., the human can use their judgement based on understanding of language on the text file.) [language model could be notebook with rules regarding language and grammar or a generic computer component] Claim 7 further recite: wherein the language model comprises a Generative Pre-Training (GPT) language model or an N-Gram language model. (e.g., GPT or N-Gram language model are considered generic computer components). In sum, claims 4, 6 and 7 depend from claim 1, and further recite mental processes as explained above. None of the additional limitations recited in claims 4, 6 and 7 amount to anything more than the same or a similar abstract idea as recited in claim 1. Nor do any limitations in claims 4, 6 and 7: (a) integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea or (b) amount to significantly more than the judicial exception because the additional limitations of using generic computer components amounts to no more than mere instructions to apply the exception using generic computer components. Claims 4, 6 and 7 are not patent eligible. Dependent claims 2, 3 and 5 are directed to specific methods of correcting spelling in Chinese characters, and therefore even if the individual steps could be practice in the mind, the claim when viewed as a whole reflects a practical application, therefore they are deemed patent eligible. Claim Interpretation The following is a quotation of 35 U.S.C. 112(f): (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph: An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked. As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph: (A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; (B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and (C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: “an acquisition module”, “an extraction module”, “an integration module”, “an error correction module”, and “a judgment module”, in claim 8. Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof. If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or non-obviousness. Claims 1-10 are rejected under 35 U.S.C. 103 as being unpatentable over Xu, H. D., Li, Z., Zhou, Q., Li, C., Wang, Z., Cao, Y., ... & Mao, X. L. (2021). Read, listen, and see: Leveraging multimodal information helps Chinese spell checking. arXiv preprint arXiv:2105.12306., in view of Applicant supplied reference -Wang (CN 114036930 A) with machine translated text provided. Regarding Claim 1, Xu discloses: A method for correcting Chinese spelling errors, ([sect 1] The Chinese Spell Checking (CSC) task aims to identify erroneous characters and generate candidates for correction.) comprising following steps: obtaining a text sequence (sect 3.1 The semantic encoder] The input tokens X = (x1, . . . , xN ) are first projected into Ht0) through the input embedding. The output of the last layer Ht = HtL = (ht1, . . . , htN ) is used as the contextualized semantic representation of the input tokens in textual modality.), a pinyin sequence, ([sect 3.2 The phonetic encoder] we use a sequence of letters in REALISE to capture the subtle phonetic difference between Chinese characters. We denote the pinyin of the i-th character in the input sentence as pi = (pi,1, . . . , pi,|pi|), where |pi| is the length of pinyin pi. [The sentence-level encoder] and apply the Transformer layers to calculate the contextualized representation in acoustic modality, denoted as Ha = (ha1, ha2, ..., haN ).) and a picture sequence from a Chinese input text file; ([sect 3.3 The graphic encoder] We denote the representation in visual modality of the input sentence as Hv = (hv1, hv2, . . . , hvN ).) Also see [sect 3 The ReaLiSe Model] In this section, we introduce the REALISE model, which utilizes the semantic, phonetic, and graphic information to distinguish the similarities of Chinese characters and correct the spelling errors. As shown in Figure 1, multiple encoders are firstly employed to capture valuable information from textual, acoustic and visual modalities. Then, we develop a selective modality fusion module to obtain the context-aware multimodal representations. Finally, the output layer predicts the probabilities of error corrections. extracting word meaning features, phonetic features, and glyph features of the Chinese input text file based on the text sequence, the pinyin sequence and the picture sequence respectively; ([sect 3 The ReaLiSe Model] In this section, we introduce the REALISE model, which utilizes the semantic, phonetic, and graphic information to distinguish the similarities of Chinese characters and correct the spelling errors. As shown in Figure 1, multiple encoders are firstly employed to capture valuable information from textual, acoustic and visual modalities. Then, we develop a selective modality fusion module to obtain the context-aware multimodal representations. Finally, the output layer predicts the probabilities of error corrections.) integrating the word meaning features, the phonetic features and the glyph features; ([sect 3 The ReaLiSe Model] Then, we develop a selective modality fusion module to obtain the context-aware multimodal representations.) performing a correctness prediction, ([sect 1] we propose to pretrain the phonetic and the graphic encoders by predicting the correct character given input in the corresponding modality. [sect 3 The ReaLiSe Model] Finally, the output layer predicts the probabilities of error corrections. [sect 3.4 Selective Modality Fusion Module] To predict the final correct Chinese characters, we develop a selective modality fusion module to integrate these vectors in different modalities. [sect 4.2 Implementation Details] The selective modality fusion module has 3 transformer layers, i.e., L0 = 3, and the prediction matrix Wo is tied with the word embedding matrix of the semantic encoder.) Also see fig. 2, where red highlights the incorrect character and the respective gate values. and a character prediction on the Chinese input text file based on the integrated word meaning features, phonetic features, and glyph features to obtain a Chinese output text file which has been error-corrected; ([sect 3.4 Selective Modality Fusion Module] To predict the final correct Chinese characters, we develop a selective modality fusion module to integrate these vectors in different modalities.) Also see fig. 2, where blue highlights the predicted/correct character and the respective gate values. Xu does not explicitly disclose a pinyin prediction and performing a rationality judgment on the Chinese output text file to obtain a final Chinese text file. Wang (in the related art of Chinese error correction) discloses: a pinyin prediction, ([0130] each feature may be scored and ranked based on the scores of each feature, so that a candidate text with the highest score may be selected as the final text, i.e., the corrected text. Among the above text features, the higher the selection frequency of the candidate text, the higher the score; the smaller the editing distance between the candidate text and the text to be corrected, the higher the score; the closer the Jaccard distance between the pinyin of the candidate text and the pinyin of the text to be corrected, the higher the score; the higher the semantic accuracy of the candidate text determined by the multi-language model, the higher the score.) and performing a rationality judgment on the Chinese output text file to obtain a final Chinese text file. ([0130] Multi-language models (n-grams) can evaluate whether a sentence is reasonable.) Xu and Wang are considered analogous art. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the teachings of Xu to combine the teaching of Wang for the above mentioned features, because present application provides a text error correction method, apparatus, device and computer-readable medium to solve the technical problem of low text error correction accuracy. (Wang, [0007]). Regarding Claim 2, Xu and Wang discloses all the limitation of Claim 1 (See detail mapping from above). Xu further discloses: wherein the extracting the word meaning features, the phonetic features and the glyph features of the Chinese input text file based on the text sequence, the pinyin sequence and the picture sequence respectively further comprises following steps: extracting the word meaning features of the Chinese input text file based on a word meaning encoder; ([sect 3.1] The Semantic Encoder, also see fig. 1.) extracting the phonetic features of the Chinese input text file based on a phonetic encoder; ([sect 3.2 The Phonetic Encoder] also see fig. 1) and extracting the glyph features of the Chinese input text file based on a glyph encoder. (sect 3.3 The Graphic Encoder) also see fig. 1) Regarding Claim 3, Xu and Wang discloses all the limitation of Claim 2 (See detail mapping from above). Xu further discloses: wherein the word meaning encoder adopts a Transformer Blocks model; ([sect 3.1] The Semantic Encoder] We adopt BERT (Devlin et al., 2019) as the backbone of the semantic encoder. BERT provides rich contextual word representation with the unsupervised pretraining on large corpora.) wherein the phonetic encoder adopts a GRU neural network; ([sect 3.2 The Phonetic Encoder] The Character-level Encoder is to model the basic pronunciation and capture the subtle sound difference between characters. It is a single-layer uni-directional GRU) and wherein the glyph encoder adopts a ResNet neural network. ([sect 3.3 The Graphic Encoder] We apply the ResNet (He et al., 2016) as the graphic encoder. The graphic encoder has 5 layers of ResNet blocks (denoted as ResNet5) followed by a layer normalization (Ba et al., 2016) operation.) Regarding Claim 4, Xu and Wang discloses all the limitation of Claim 1 (See detail mapping from above). Xu further discloses: wherein the integrating the word meaning features, the phonetic features and the glyph features comprises following steps: increasing a weight of the word meaning features when a word meaning error occurs; ([sect 3.4 Selective Modality Fusion Module] First, for each modality, a selective gate unit is employed to control how much information can flow to the mixed multimodal representation. For example, if a character is misspelled due to its similar pronunciation to the correct one, then more information in the acoustic modality should flow into the mixed representation. The gate values are computed by a fully-connected layer followed by a sigmoid function. The inputs include the character representation of three modalities and the mean of the semantic encoder output Ht to capture the overall semantics of the input sentence. Formally, we denote the gate values for the textual, acoustic and visual modalities as gt, ga and gv. The mixed multimodal representation h˜i of the i-th character is computed as follows:) [the reference read on the same concept because it describes a mechanism for adjusting the influence of different input modalities (textual or semantic, pronunciation/phonetic, visual/glyph) based on a specific type of error. This is functionally equivalent to the concept of “increasing a weight” when an “error occurs,” though the passages describe it using different terminology. Gate values read on weight.] increasing a weight of the phonetic features when a phonetic error occurs; ([sect 3.4 Selective Modality Fusion Module] First, for each modality, a selective gate unit is employed to control how much information can flow to the mixed multimodal representation. For example, if a character is misspelled due to its similar pronunciation to the correct one, then more information in the acoustic modality should flow into the mixed representation. The gate values are computed by a fully-connected layer followed by a sigmoid function. The inputs include the character representation of three modalities and the mean of the semantic encoder output Ht to capture the overall semantics of the input sentence. Formally, we denote the gate values for the textual, acoustic and visual modalities as gt, ga and gv. The mixed multimodal representation h˜i of the i-th character is computed as follows:) [the reference read on the same concept because it describes a mechanism for adjusting the influence of different input modalities (textual or semantic, pronunciation/phonetic, visual/glyph) based on a specific type of error. This is functionally equivalent to the concept of “increasing a weight” when an “error occurs,” though the passages describe it using different terminology. Gate values read on weight.] and increasing a weight of the glyph features when a glyph error occurs. ([sect 3.4 Selective Modality Fusion Module] First, for each modality, a selective gate unit is employed to control how much information can flow to the mixed multimodal representation. For example, if a character is misspelled due to its similar pronunciation to the correct one, then more information in the acoustic modality should flow into the mixed representation. The gate values are computed by a fully-connected layer followed by a sigmoid function. The inputs include the character representation of three modalities and the mean of the semantic encoder output Ht to capture the overall semantics of the input sentence. Formally, we denote the gate values for the textual, acoustic and visual modalities as gt, ga and gv. The mixed multimodal representation h˜i of the i-th character is computed as follows:) [the reference read on the same concept because it describes a mechanism for adjusting the influence of different input modalities (textual or semantic, pronunciation/phonetic, visual/glyph) based on a specific type of error. This is functionally equivalent to the concept of “increasing a weight” when an “error occurs,” though the passages describe it using different terminology. Gate values read on weight.] Regarding Claim 5, Xu and Wang discloses all the limitation of Claim 1 (See detail mapping from above). Xu further discloses: wherein the performing the correctness prediction, the pinyin prediction, and the character prediction on the Chinese input text file based on the integrated word meaning features, phonetic features, and glyph features to obtain the Chinese output text file further comprises following steps: performing a spelling error detection on the Chinese input text file based on a correctness predictor; ([sect 1] we propose to pretrain the phonetic and the graphic encoders by predicting the correct character given input in the corresponding modality. [sect 3 The ReaLiSe Model] Finally, the output layer predicts the probabilities of error corrections. [sect 3.4 Selective Modality Fusion Module] To predict the final correct Chinese characters, we develop a selective modality fusion module to integrate these vectors in different modalities. [sect 4.2 Implementation Details] The selective modality fusion module has 3 transformer layers, i.e., L0 = 3, and the prediction matrix Wo is tied with the word embedding matrix of the semantic encoder.) Also see fig. 2, where red highlights the incorrect character and the respective gate values. and outputting the Chinese output text file based on a character predictor according to the integrated word meaning features, the phonetic features, the glyph features, the spelling error ([sect 3 The ReaLiSe Model] In this section, we introduce the REALISE model, which utilizes the semantic, phonetic, and graphic information to distinguish the similarities of Chinese characters and correct the spelling errors. As shown in Figure 1, multiple encoders are firstly employed to capture valuable information from textual, acoustic and visual modalities. Then, we develop a selective modality fusion module to obtain the context-aware multimodal representations. Finally, the output layer predicts the probabilities of error corrections.) and the identified pinyin. ([sect 3.2] we use a sequence of letters in REALISE to capture the subtle phonetic difference between Chinese characters. For example, the pinyin of “中” (middle) and “棕” (brown) are“zhong” and “z ¯ ong” respectively. The two characters have very similar sounds but quite different meanings. We thus represent pinyin as a symbol sequence, e.g., {z, h, o, n, g, 1} for “中”. We denote the pinyin of the i-th character in the input sentence as pi = (pi,1, . . . , pi,|pi|), where |pi| is the length of pinyin pi.) . Wang further discloses: performing a pinyin identification based on a pinyin predictor on the Chinese input text file when the spelling error is detected; ([0130] each feature may be scored and ranked based on the scores of each feature, so that a candidate text with the highest score may be selected as the final text, i.e., the corrected text. Among the above text features, the higher the selection frequency of the candidate text, the higher the score; the smaller the editing distance between the candidate text and the text to be corrected, the higher the score; the closer the Jaccard distance between the pinyin of the candidate text and the pinyin of the text to be corrected, the higher the score; the higher the semantic accuracy of the candidate text determined by the multi-language model, the higher the score.) Where the motivation for the combination has already been previously provided. Regarding Claim 6, Xu and Wang discloses all the limitation of Claim 1 (See detail mapping from above). Wang further discloses: further comprising: performing a rationality judgment on the Chinese output text file based on a language model. ([0130] Multi-language models (n-grams) can evaluate whether a sentence is reasonable.) Where the motivation for the combination has already been previously provided. Regarding Claim 7, Xu and Wang discloses all the limitation of Claim 6 (See detail mapping from above). Wang further discloses: wherein the language model comprises ([0130] Multi-language models (n-grams) can evaluate whether a sentence is reasonable.) [the claim only required one of the elements] Where the motivation for the combination has already been previously provided. Regarding Claim 8, Xu discloses: A Chinese spelling correction system, comprising: an acquisition module, an extraction module, an integration module, an error correction module, and a judgment module; ([Xu disclose use of computer/processor/memory because it uses the architecture of a multimodal modal call “ReaLiSe”, see fig. 1. Further, according to the spec of the instant application, all the module recite in the claim is essentially part of the computer or codes stored in the memory and is executed by the processor/memory] As for the rest of the claim, they recited the elements from method of Claim 1, therefore the rejection applied in Claim 1 is equally applicable. Regarding Claim 9, Xu discloses: A storage medium with a computer program stored thereon, wherein when the computer program is executed by a processor, ([Xu disclose use of computer/processor/memory because it uses the architecture of a multimodal modal call “ReaLiSe”see fig. 1.) As for the rest of the claim, they recited the elements from method of Claim 1, therefore the rejection applied in Claim 1 is equally applicable. Regarding Claim 10, Xu discloses: A Chinese spelling correction terminal, comprising: a processor and a memory; wherein the memory stores computer programs; and wherein the processor executes the computer programs stored in the memory, so that the Chinese spelling correction terminal executes the method for correcting Chinese spelling errors according to claim 1. ([Xu disclose use of computer/processor/memory because it uses the architecture of a multimodal modal call “ReaLiSe”, see fig. 1.) As for the rest of the claim, they recited the elements from method of Claim 1, therefore the rejection applied in Claim 1 is equally applicable. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Pang US 20210397780 – disclose correction of spelling in Chinese characters. See para 0045-0056 and fig. 5 for additional details. Any inquiry concerning this communication or earlier communications from the examiner should be directed to Phillip H Lam whose telephone number is (571)272-1721. The examiner can normally be reached 9 AM-3 PM Pacific Time. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on (571) 272-7453. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /PHILIP H LAM/ Examiner, Art Unit 2656
Read full office action

Prosecution Timeline

Dec 20, 2023
Application Filed
Sep 24, 2025
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12591626
SEARCH STRING ENHANCEMENT
2y 5m to grant Granted Mar 31, 2026
Patent 12572735
DOMAIN-SPECIFIC DOCUMENT VALIDATION
2y 5m to grant Granted Mar 10, 2026
Patent 12572747
MULTI-TURN DIALOGUE RESPONSE GENERATION WITH AUTOREGRESSIVE TRANSFORMER MODELS
2y 5m to grant Granted Mar 10, 2026
Patent 12562158
ELECTRONIC APPARATUS AND CONTROLLING METHOD THEREOF
2y 5m to grant Granted Feb 24, 2026
Patent 12561194
ROOT CAUSE PATTERN RECOGNITION BASED MODEL TRAINING
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+45.5%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 129 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month