Last updated: April 19, 2026
Application No. 18/114,984
QUERY FORMATTING SYSTEM, QUERY FORMATTING METHOD, AND INFORMATION STORAGE MEDIUM

Non-Final OA §101§103
Filed
Feb 27, 2023
Examiner
HUTCHESON, CODY DOUGLAS
Art Unit
2659
Tech Center
2600 — Communications
Assignee
Rakuten Group Inc.
OA Round
3 (Non-Final)
Interview Optional

— +47.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 24 resolved cases, 2023–2026
Examiner Intelligence

HUTCHESON, CODY DOUGLAS View full profile →
Grants 62% of resolved cases
Career Allow Rate
15 granted / 24 resolved
+0.5% vs TC avg
Strong +47% interview lift
Without
With
+47.1%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
34 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
33.9%
-6.1% vs TC avg
§103
40.9%
+0.9% vs TC avg
§102
14.8%
-25.2% vs TC avg
§112
7.5%
-32.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 24 resolved cases
Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/05/2025 has been entered.

Response to Arguments
1. Regarding the rejection of claims 1-8 and 10-21 under 35 U.S.C. § 101, Applicant's arguments filed have been fully considered but they are not persuasive. 

Regarding Argument 1
Applicant first argues that the claimed invention of a combined dictionary-based and machine learning based query formatting provides an improvement to search engine technology by improving search accuracy, citing Examples 37 and 40 of the USPTO guidance (see pg. 10 of Remarks). Applicant further argues that the claimed invention of generating formatted queries is not abstract but technical (see last para. on pg. 10 of Remarks). The Examiner respectfully disagrees with these arguments. In order for the claimed invention to provide an improvement to a technical field, there must be additional elements in the claim which alone or viewed in combination integrate the judicial exception into a practical application under Step 2A Prong 2 analysis. There are no such additional elements in the claimed invention. 
In Step 2A Prong 1, the claims are analyzed to determine if abstract ideas are recited. In claim 1, there are several limitations which as currently recited can be performed as mental processes with the aid of pen and paper. Specifically, a person can: 1) can obtain first tokens by writing down words or word fragments in a user query, 2) compare the tokens to tokens in a dictionary, labelling the tokens without a match as second tokens (can mark next to token on piece of paper that token does not have match), 3) can label each of the plurality of characters in a second token (e.g. for token ‘tokyorestaurant’, can write down corresponding labels 111112222222222, indicating first five characters are a word ‘1’, and rest of characters are a word ‘2’), 4) can format the second tokens based on one of the plurality of first tokens which matches dictionary data, and the label for the characters (can determine based on, for example, seeing a matching word ‘Japan’ in query, and based on above labels, deciding the split the second token to be two words ‘Tokyo’ and ‘restaurant’), 5) can use the formatted second token to search for information (e.g. search in a book or encyclopedia for information), 6) can label each token as first or second (e.g. can mark next to each token on piece of paper that token matches ‘1’ or does not match ‘2’). The remaining independent claims and dependent claims recite similar formatting operations which can be performed either mentally with the aid of pen and paper or as mathematical calculations (see below 101 rejection), and thus similarly recite abstract ideas. Therefore, the claims recite abstract ideas.
Under Step 2A Prong 2, additional elements are considered alone or viewed in combination to determine if the claims integrate the judicial exception into a practical application. The only additional limitation in claim 1 which could be considered as an additional element is the “learning model”. However, this model is being recited at a high level of generality, and would amount to mere instructions to implement the judicial exception using a generic computer. There are no additional limitations in claim 1 which, for example, describe either the structure of the learning model or how the learning models obtains the labels for the second characters, and thus the learning model is merely used to perform operations which can be performed as mental processes. The learning model does not integrate the judicial exception into a practical application as it does not impose any meaningful limits on practicing the abstract idea. Similarly, the remaining independent claims and dependent claims lack additional elements which do not merely perform the mental process using a generic computer. Since mere instructions to implement the judicial exception using a generic computer cannot integrate the judicial exception into a practical application, the claimed invention does not reflect a technical improvement, in comparison to Examples 37 and 40 of the USPTO guidance. Hence, Applicant’s first argument is not persuasive.

Regarding Argument 2
Applicant argues that the claimed invention is similar to example 37 in the USPTO guidance, and thus is patent eligible for analogous reasons (see pgs. 11-12). Specifically, it is argued that the claimed invention similarly targets a specific technical filed, provides a tangible benefit via producing more accurate search results, and that the claimed invention combines machine learning and dictionary matching in a non-conventional way. The Examiner respectfully disagrees with these arguments. 
In Example 37, claim 1 is considered patent eligible under Step 2A Prong 2 analysis, as claim 1 contains additional elements which integrate the judicial exception into a practical application. The additional elements of receiving via a GUI a user selection to organize each icon, and automatically moving the icons to a position on the GUI closest to the start icon of the computer system based on determined amount of use, when viewed in combination, recite a specific technical improvement to user interfaces and thus make the claim eligible. In contrast, the claimed invention does not contain additional elements which viewed in combination provide a technical improvement. The difference between Example 37 and the claimed invention is that Example 37 recites additional elements which cannot be performed mentally and are not performed by generic computer components, while the claimed invention does not. Adjusting a position of icons on a graphical user interface cannot be performed by a person and is not reciting a generic computer component, and thus the claim is subject matter eligible. The claimed invention does not recite such additional elements. The claimed invention performs a process that can be performed mentally (e.g. analyzing tokens, labelling these tokens, and rewriting tokens, all using pen and paper), merely performed by a generic “learning model”. 
The Examiner further notes that language of the use of “fine-tuned CharacterBERT, a second learning model (e.g., BERT or Bi-LSTM), and BIOES chunking” is not currently present in the claims, instead reciting a generic “learning model”, which does not integrate the judicial exception into a practical application. 
Hence, Applicant’s arguments are not persuasive.

Regarding Argument 3
Applicant further argues that the claimed invention is subject matter eligible in view of the recent precedential decision of Ex Parte Desjardins, arguing that the additional elements of the claimed invention recite a technological improvement for analogous reasons to those cited in the decision (see pgs. 12-14 of Remarks). The Examiner respectfully disagrees with this argument.
	In the Memo, the limitation of “adjust the first values of the plurality of parameters to optimize performance of the machine learning model on the second machine learning task while protecting performance of the machine learning model on the first machine learning task” was identified as reflecting the improvement disclosed in the specification and was determined as making the claims subject matter eligible under Step 2A Prong 2 analysis. This limitation recites a technical improvement to a technical problem (e.g. adjusting parameters of a machine learning model to obtain a trained model which prevents catastrophic forgetting in continual learning systems). The Applicant’s claimed invention does not recite such a technical improvement. While the optimization of machine learning parameters to prevent catastrophic learning cannot be performed mentally and is not recited as being implemented by generic computer components, the claimed invention of labeling word tokens to reformat the tokens for searching, is not currently claimed at a level which reflects a technical improvement to a technical problem. While Examiners are reminded in the Memo to consider the claims as a whole when considering subject matter eligibility, the additional elements of the “learning model”, even when viewed in combination, do not reflect a technical improvement.
	Hence, Applicant’s arguments are not persuasive.

2. Regarding the rejection of claims 1-3, 10, 12, 14-15, and 18-21 under 35 U.S.C. § 103, Applicant's arguments have been fully considered but they are not persuasive. 

Regarding claim 1
Applicant first argues that the cited references do not disclose the amended features of claim 1 as the cited references do not disclose labeling the “first tokens that fail to match the dictionary data are labeled as second tokens” and then “format the second token based on one of the plurality of first tokens”, and that the matching vector taught in Nguyen is not labels of the tokens. The Examiner respectfully disagrees. Under the broadest reasonable interpretation of these claims, these limitations require first assigning a label (any type of indication of no match, which would include the matching entry in the matching vector corresponding to that word (see Fig. 4, D1-n)) to first tokens that do not match, and then formatting these tokens (e.g. organizing the information in the query in some manner) based on one of the plurality of first tokens. The Nguyen reference discloses each of these limitations. First, Nguyen discloses labeling a first token (e.g. w1 in Fig. 4) as a second token which does not match dictionary data (assigned an ‘O’ to indicate a word that is not a named entity in the dictionary), which is then reflected in the matching vector (entry d1). This reads on the BRI of labelling the “first tokens that fail to match the dictionary data are labeled as second tokens”. Next, Nguyen passes this information (d1) along with information of first tokens that match (for example, dn, which has as exact match to dictionary), into a named entity recognition model which provides a format to the words in the text (tag scores which reflect a location of each token in an n-gram (beginning, inside, and ending tokens), see para. 0085, 0078). This reads on the BRI of “format the second token based on one of the plurality of first tokens”.
Applicant further argues that the cited references do not disclose “wherein each token is labeled as either a first token or a second token” as Nguyen uses more than two labels. The Examiner respectfully disagrees. Under the broadest reasonable interpretation of this limitation, the limitation requires that each token being given an indication of being a first token (some indication of there being a match to dictionary) or of being a second token (some indication of there not being a match to dictionary). Nguyen discloses this feature. Nguyen teaches that each token is either assigned an indication of being a first type of token (matched; for example, see Fig. 4 “Gene Exact E” which has exact match with dictionary data) or a second type of token (does not match; for example, see Fig. 4 ‘O’ which is not a match). These operations read on the BRI of “wherein each token is labeled as either a first token or a second token”. Hence, Applicant’s arguments are not persuasive.

Regarding claims 12 and 14
Applicant’s arguments with respect to claims 12 and 14 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Regarding claim 22
Applicant argues that the cited references do not disclose the limitations of “wherein there are a plurality of pieces of dictionary data; and the first tokens that fail to match at least two pieces of the dictionary data are labeled as second tokens”. The Examiner respectfully disagrees, as Nguyen discloses this feature. Nguyen discloses that the dictionary data comprises multiple pieces (see Fig. 1, 14, and Fig. 5 with multiple entries each), and that tokens which fail to match the dictionary data, which comprises multiple pieces, is labeled as second tokens (assigned an ‘O’ to indicate a word that is not a named entity in the dictionary which comprises multiple entries, see again Fig. 5). These operations read on the BRI of claim 22. Hence, Applicant’s arguments are not persuasive. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


3. Claims 1-8 and 10-22 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 

	Regarding claims 1, 19, and 20, “A query formatting system”, “A query formatting method”, and “A non-transitory information storage medium” are recited, which are directed to one of the four statutory categories of invention (machine, process, and article of manufacture) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
	The following limitations, under their broadest reasonable interpretation, recite mental processes:
“acquire a plurality of tokens included in a query, where each of the plurality of first tokens includes at least one word or word fragment”: a person obtains a query with multiple tokens (e.g. a query with multiple words)
“execute matching through use of dictionary data based on the plurality of first tokens; wherein the first tokens that fail to match the dictionary data are labeled as second tokens”: a person compares each token to words in a written dictionary to see if any tokens match words in the dictionary, and writes down matches with pen and paper, and also marks down tokens which do not match
“input a second token …to obtain a label for each of a plurality of second characters included in the second token”: a person obtains a label for each character in the token which did not match, and writes down each label using pen and paper.
“format the second token based one of the plurality of first tokens which matches the dictionary data and the label for each of the plurality of second characters”: a person uses tokens which match those in a dictionary, along with the labels to format the tokens which do not match those in the dictionary, and writes down the formatted second tokens using pen and paper.
execute search processing based on the formatted second token: A person provides information about the query based on the formatted second token.
wherein each token is labeled as either a first token or a second token: A person marks down on paper next to each token a label indicating that it is a first token that matches or a second token that does not match.
Claims 1, 19, and 20 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are “at least one processor configured to” (claim 1), “A non-transitory information storage medium having stored thereon a program for causing a computer to” (claim 20), and a “learning model” (claims 1, 19, and 20). These limitations are recited broadly, and amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer cannot integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, claims 1, 19, and 20 are directed to an abstract idea.
	Claims 1, 19, and 20 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above with respect to integrating the judicial exception into a practical application, the additional limitations amount to mere instructions to implement the judicial exception using a generic computer. Mere instructions to implement the judicial exception using a generic computer cannot allow the claim to amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 1, 19, and 20 are not patent eligible.

Regarding dependent claims 2-8, 10-18, and 21-22, “The query formatting system” is recited, which is directed to one of the four statutory categories of invention (machine) (Step 1: YES). However, the claims limitations, under their broadest reasonable interpretation, recite mental processes which fall into the category of abstract idea (Step 2A Prong 1: YES).
	Claims 2-8, 10-18, and 21-22 recite the mental processes of claim 1 due to their dependence on claim 1. Additionally, the following limitations of claims 2-8, 10-18, and 21-22, under their broadest reasonable interpretation, recite mental processes:
Claim 2: “format the first token based on an execution result of the matching; and format the second token based on the formatted first token and the …model”: a person formats and writes down a first token which matches (e.g., if a word matches in dictionary, keep the original format of the word), and then formats and writes down a second token based on the first token and a model (uses the model as a set of rules along with the matching word to rewrite the second word).
Claim 2 contains the additional element “learning model”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 3: “format, based on the execution result of the matching, a first token that is required to be formatted among a plurality of the first tokens; and format, when only some of the plurality of the first tokens are formatted, the second token based on the formatted first tokens, unformatted first tokens, and the …model”: a person formats a first token that needs to be formatted, and writes down the token using pen and paper. The person then formats the second token based on tokens the formatted token, the unformatted token, and the …model.

Claim 3 contains the additional element “learning model”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 4: “calculate, for each first character included in the first token, a first character feature amount relating to a feature of the each first character based on the …model; acquire a first token feature amount which is calculated based on a predetermined calculation method, and relates to a feature of the first token itself; calculate, for each second character included in the second token, a second character feature amount relating to a feature of the each second character based on the … model; and format the second token based on the first character feature amount, the first token feature amount, and the second character feature amount.”: Calculating first character features, a first token feature amount calculated based on a predetermined calculation method, and a second character feature amount all amount to mathematical calculations. A person can then use these calculated values to format the second token using pen and paper.

Claim 4 contains the additional element “based on the learning model”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 5: “wherein the predetermined calculation method is a method of using a calculation model that calculates a feature amount of an input string, and wherein the at least one processor is configured to acquire the first token feature amount calculated by the calculation model.”: Calculating via a calculation method involving a model to calculate a feature amount of an input string amounts to a mathematical calculation.
Claim 6: “calculate, for the each first character, an average of the first character feature amount of the each first character and the first token feature amount; and format the second token based on the average and the second character feature amount.”: Calculating an average of first character feature amounts to a mathematical calculation. A person can use the calculated information to format the second token using pen and paper.
Claim 7: “determine a weighting coefficient relating to the average based on an execution result of the matching; and calculate the average based on the weighting coefficient.”: Assigning weighting coefficients based on averaging, and then using the coefficient to calculate a weighted average amounts to a mathematical calculation.
Claim 8: “format the second token not based on a second token feature amount relating to a feature of the second token itself, but based on the first character feature amount, the first token feature amount, and the second character feature amount.”: A person formats the second token using the first character feature amount, first token feature amount, and second character feature amount, and without the use of a second token feature.
Claim 10: “execute the matching through use of each of a plurality of pieces of the dictionary data, wherein the first token is one of the plurality of tokens which matches at least one of the plurality of pieces of the dictionary data, and wherein the second token is one of the plurality of tokens which fails to match any of the plurality of pieces of the dictionary data.”: A person uses a dictionary, and finds a match for a first token based on the pieces of data in the dictionary, and does not find a match for a second token.
Claim 11: “determine whether all of the plurality of tokens match the dictionary data; omit the formatting of the second token when the all of the plurality of tokens are determined to match the dictionary data; and format the second token when only some of the plurality of tokens are determined to match the dictionary data.”: A person decides not to format a second token when all of the tokens are found in a dictionary, and only formats the second token if only some of the tokens match.
Claim 12: “select any one of a plurality of operations based on the first token and the … model; wherein the plurality of operations comprises a split operation, a merge operation, a segment operation, and a single operation; and format the second token based on the selected one of the plurality of operations.”: A person decides to perform a particular operation (e.g. splitting one word into two words, merge two words into one word, segment words, or mark a word as a single word) based on a first token and a learning model. The person then formats the second token based on the decided operation.

Claim 12 contains the additional elements “select…based on …the learning model”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 13: “acquire a first score relating to the first token based on an execution result of the matching; acquire a second score relating to the first token based on a processing result of the … model; and select one of the execution result of the matching or the processing result of the … model based on the first score and the second score, and format the first token based on the selected one of the execution result or the processing result.”: A person calculates a first and second score. The person then selects either the matching result or the execution result based on the scores (e.g. which has a higher confidence value). The person then formats the first token based on the selected result.

Claim 13 contains the additional element “a processing result of the learning model”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 14: “output splitting requirement information on whether the second token is to be split, and wherein the at least one processor is configured to format the second token based on the splitting requirement information.”: A person decides whether a token needs to be split (e.g. if a particular word should be split into two words), and writes down whether or not splitting is required. The person then formats the second token based on the determination to split.

Claim 14 contains the additional element “wherein the learning model is configured to output…”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 15: “split the second token having the splitting requirement information indicating the splitting through use of the dictionary data, to thereby format the second token.”: A person splits the second token (e.g. makes a word two separate words) by using the entries in the dictionary, and created a formatted second token.
Claim 16: “wherein the query is input when an online shopping service is searched, and wherein the at least one processor is configured to: identify a product genre corresponding to the query; and execute the matching based on the product genre.”: A person identifies a genre a query belongs to (e.g. electronics, home décor, etc.) and then matches the tokens to dictionary data based on the genre (e.g. looking in an electronics specific dictionary to see if there are matching tokens)
Claim 17: “wherein the query is input when an online shopping service is searched, and wherein the at least one processor is configured to execute the matching through use of a product title in the online shopping service as the dictionary data.”: A person uses a dictionary to match tokens in a query based on a product title that is identified.
Claim 18: “present an execution result of the search processing to a user who has input the query.”: A person provides information about the query based on the formatted second token.
Claim 21: “wherein the … model outputs the one of the plurality of operations”: a person uses the … model as a set of rules to determine an operation, and writes down the selected operation using pen and paper.

Claim 21 contains the additional element “wherein the learning model outputs…”, which amounts to mere instructions to implement the judicial exception using a generic computer.

Claim 22: “wherein there are a plurality of pieces of dictionary data; and the first tokens that fail to match at least two pieces of the dictionary data are labeled as second tokens”:  a person writes down multiple pieces of dictionary data, and compares the first tokens to the multiple pieces of dictionary data, marking the token as a second token if it does not match any of the multiple pieces of dictionary data.

Claims 2-8, 10-18, and 21-22 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are those discussed above and “wherein the at least one processor is configured to”, which amount to mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination, do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Accordingly, claims 2-8, 10-18, and 21-22 are directed to an abstract idea.
	Claims 2-8, 10-18, and 21-22 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above with respect to integrating the judicial exception into a practical application, the additional limitations amount to mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination, do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 2-8, 10-18, and 21-22 are not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


4. Claims 1-3, 10, 18-20, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen & Morita (US PGPUB No. 2023/0044266, hereinafter Nguyen) in view of Wang et al. (US PGPUB No. 20170068655, hereinafter Wang).

Regarding claim 1, Nguyen teaches A query formatting system (Fig. 3, “Machine Learning Apparatus 100”), comprising at least one processor (Fig. 3, “CPU 101”) configured to: acquire a plurality of first tokens included in a query, wherein each of the plurality of first tokens includes at least one word or word fragment (Fig. 4, “Tokens” from “Text 141”; para. 0067 “Upon receiving text 141 written in natural language, the machine learning apparatus 100 divides a character string included in the text 141 into tokens w.sub.1, w.sub.2, w.sub.3, . . . , and w.sub.N.”); execute matching through use of dictionary data based on the plurality of first tokens (Fig. 4 “Matching Information”; para. 0074 “In addition, the machine learning apparatus 100 calculates matching vectors … corresponding to the tokens … separately from the word vectors .... The matching vector of an individual token is obtained by converting matching information, which indicates a matching state between a named entity dictionary in which known named entities are listed and the token, into a vector in distributed representation.”); wherein the first tokens that fail to match the dictionary data are labeled as second tokens (Fig. 4, “matching information” labels tokens, which is then converted into matching vector D; para. 0077 “If a named entity similar to an n-gram is found, the machine learning apparatus 100 generates matching information for each of the tokens included in the n-gram. The matching information includes three elements, which are a class, a matching degree, and a location…”; para. 0078 “The “class” is a named entity class to which a known named entity belongs. These classes are listed in the named entity dictionary. In the third embodiment, there are four named entity classes, which are gene/protein names (Gene/Protein), drug names (Drug), disease names (Disease), and mutations (Mutation). There is another class (O: Outside) indicating that the token is not a named entity.”; para. 0079 “For the token for which no matching information is obtained, dummy matching information having “outside” as the class is given.”); input a second token into a learning model (Fig. 4, tokens word embeddings input to learning model (BioBERT); para. 0071 “The machine learning apparatus 100 enters the word vectors … into BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) 142, to convert the word vectors W.sub.1, W.sub.2, W.sub.3, . . . , and W.sub.N into word vectors T.sub.1, T.sub.2, T.sub.3, . . . , and T.sub.N.”)…; format the second token (Fig. 4 “Tag Scores” generated for each token, including unmatched tokens; para. 0085 “The machine learning apparatus 100 enters the combined vectors … into a named entity recognition model 143 and calculates tag scores … corresponding to the tokens ... The tag scores include confidence factors of the respective pieces of tag information.”; para. 0079 “Therefore, among the tokens w.sub.1, w.sub.2, w.sub.3,…, and w.sub.N, there are a token for which only one piece of matching information is obtained, a token for which two or more pieces of matching information are obtained, and a token for which no matching information is obtained”) based on one of the plurality of first tokens which matches the dictionary data (Fig. 4, “Tag Scores” determined based on “Matching Information”; para. 0083 “The machine learning apparatus 100 combines the word vectors T.sub.1, T.sub.2, T.sub.3, . . . , and T.sub.N and the matching vectors D.sub.1, D.sub.2, D.sub.3, . . . , and D.sub.N to generate combined vectors V.sub.1, V.sub.2, V.sub.3, . . . , and V.sub.N.”; para. 0085 “The machine learning apparatus 100 enters the combined vectors …into a named entity recognition model 143 and calculates tag scores … corresponding to the tokens …”)…wherein each token is labeled as either a first token or a second token (Fig. 4, “matching information” labels tokens (indicates as a match via matching degree, or indicates as not a match using ‘O’), which is then converted into matching vector D; para. 0077 “If a named entity similar to an n-gram is found, the machine learning apparatus 100 generates matching information for each of the tokens included in the n-gram. The matching information includes three elements, which are a class, a matching degree, and a location…”; para. 0078 “There is another class (O: Outside) indicating that the token is not a named entity. The “matching degree” is a flag indicating whether an n-gram and a known named entity have an exact match relationship (Exact) or an approximate match relationship (Approximate).”; para. 0079 “For the token for which no matching information is obtained, dummy matching information having “outside” as the class is given.”).
Nguyen does not specifically disclose:
[input a second token…into a learning model] to obtain a label for each of a plurality of second characters included in the second token;
[format the second token…based on…] the label for each of the plurality of second characters;
execute search processing based on the formatted second token.
Wang teaches [input a second token…into a learning model] to obtain a label for each of a plurality of second characters included in the second token (para. 0097 “At 444, control generates a forward chart parse, such as is shown in FIG. 6A. At 448, control generates a reverse chart parse, such as is shown in FIG. 6B.”; chart parse entries obtained as discussed in Fig. 6A-B);
[format the second token…based on…] the label for each of the plurality of second characters (para. 0097 “At 456, control outputs tokens from the identified partition as the tokenization of the input string.”);
execute search processing based on the formatted second token (para. 0053 “The query analysis module 204 analyzes the tokenized text query from the query wrapper.”; para. 0054 “The set generation module 208 identifies a consideration set of application (or, equivalently, app) state records from a search data store 124 based on the query tokens.”; Fig. 2, 204-224).
Nguyen and Wang are considered to be analogous to the claimed invention as
they both are in the same field of text processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen to incorporate the teachings of Wang in order to specifically use a learning model to output a label for each of a plurality of second characters in the second token, format based on the labels, and execute search processing based on the formatted token. Doing so would be beneficial, as this would aid in tokenizing languages for search processing for which do not have natural divisions between words such as Chinese or Japanese (para. 0007), leading to improved searching processing for these languages.
	
Regarding claim 2, Nguyen in view of Wang discloses wherein the at least one processor is configured to: format the first token based on an execution result of the matching (Fig. 4, first tokens are formatted (assigned label of (B), (I), (E)) to generate “Matching Information”; para. 0093 “The “matching pattern” indicates a matching information pattern and is obtained by concatenating three elements, which are a class, a matching degree, and a location…The location is beginning (B), inside (I), ending (E), or single (S).”); and format the second token based on the formatted first token and the learning model (Fig. 4, second tokens that do not match are formatted (“Tag Scores”) using formatted first tokens (matched tokens in “Matching Information”) and learning model (BioBERT); para. 0085 “The tag scores include confidence factors of the respective pieces of tag information. The individual tag information indicates a class and a location such as “Gene/Protein-B” or “Drug-E”. The machine learning apparatus 100 determines tag information to be associated with each of the tokens w.sub.1, w.sub.2, w.sub.3, . . . , and w.sub.N based on the tag scores s.sub.1, s.sub.2, s.sub.3, . . . , and s.sub.N. The machine learning apparatus 100 may select, per token, tag information having the highest confidence factor among the plurality of pieces of tag information.”).

Regarding claim 3, Nguyen in view of Wang discloses wherein the at least one processor is configured to: format, based on the execution result of the matching, a first token that is required to be formatted among a plurality of the first tokens (tokens with edit distances which meet a threshold are required to be formatted; para. 0077 “When the edit distance between a named entity and an n-gram is equal to or less than a predetermined threshold, the machine learning apparatus 100 determines that the named entity and the n-gram are similar to each other. If a named entity similar to an n-gram is found, the machine learning apparatus 100 generates matching information for each of the tokens included in the n-gram. The matching information includes three elements, which are a class, a matching degree, and a location.”); and format, when only some of the plurality of the first tokens are formatted, the second token based on the formatted first tokens, unformatted first tokens, and the learning model (Fig. 4, second tokens formatted (“Tag Scores”) based on formatted first tokens (tokens with matches that meet threshold), unformatted first tokens (tokens with partial match that do not meet threshold) and the learning model (BioBERT); para. 0085 “The tag scores include confidence factors of the respective pieces of tag information. The individual tag information indicates a class and a location such as “Gene/Protein-B” or “Drug-E”. The machine learning apparatus 100 determines tag information to be associated with each of the tokens w.sub.1, w.sub.2, w.sub.3, . . . , and w.sub.N based on the tag scores s.sub.1, s.sub.2, s.sub.3, . . . , and s.sub.N. The machine learning apparatus 100 may select, per token, tag information having the highest confidence factor among the plurality of pieces of tag information.”).

Regarding claim 10, Nguyen in view of Wang discloses wherein the at least one processor is configured to execute the matching through use of each of a plurality of pieces of the dictionary data (Fig. 5, entries “#101”, “#102”…; para. 0091 “FIG. 5 illustrates an example of a named entity dictionary. The machine learning apparatus 100 holds the named entity dictionary 131 in advance. The named entity dictionary 131 includes a plurality of records, and a term identification (ID), a named entity, and a class are associated with each other in each record.”; para. 0114 “(S14) The model generation unit 124 searches the named entity dictionary 131 for the n-gram selected in step S13. In this case, approximate string matching is performed.”), wherein the first token is one of the plurality of tokens which matches at least one of the plurality of pieces of the dictionary data (First token is token in “Text 141” which matches an entry in the dictionary at a certain threshold; para. 0114-0115 “The model generation unit 124 calculates the edit distance between the selected n-gram and each of the plurality of named entities included in the named entity dictionary 131 and searches for a similar named entity having the edit distance equal to or less than a threshold. [0115] (S15) If at least one similar named entity is retrieved in step S14, the model generation unit 124 generates matching information indicating {class, matching degree, location} for each token included in the n-gram selected in step S13.”), and wherein the second token is one of the plurality of tokens which fails to match any of the plurality of pieces of the dictionary data (Second token is token in “Text 141” which does not match any dictionary data; para. 0079 “Therefore, among the tokens w.sub.1, w.sub.2, w.sub.3, . . . , and w.sub.N, there are a token for which only one piece of matching information is obtained, a token for which two or more pieces of matching information are obtained, and a token for which no matching information is obtained.”).

Regarding claim 18, Nguyen in view of Wang discloses present an execution result of the search processing to the user who has input the query (Wang, Fig. 1; para. 0040 “A search module 128 receives a search query from the user device 100 and, based on data from the search data store 124, responds to the user device 100 with results. In FIG. 1, an unsophisticated search app 132 is shown executing on the user device 100. The search app 132 includes a query box 134 where a user can enter a text query and a search button 136 that, when actuated by the user, sends a query wrapper to the search module 128. Results from the search module 128 are provided to a results window 138 of the search app 132.”).
Nguyen and Wang are considered to be analogous to the claimed invention as
they both are in the same field of text processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen to incorporate the teachings of Wang in order to specifically display the result of the search processing to the user who input the query. Doing so would be beneficial, as this would enable efficient access to information for the user based on the query.

Regarding claim 19, claim 19 is a method claim with limitations similar to the limitations of system claim 1 and is rejected under similar rationale.

Regarding claim 20, claim 20 is a non-transitory information storage medium claim with limitations similar to the limitations of system claim 1 and is rejected under similar rationale.
Additionally, Nguyen discloses A non-transitory information storage medium having stored thereon a program for causing a computer to (para. 0012).

Regarding claim 22, Nguyen in view of Wang discloses wherein there are a plurality of pieces of dictionary data (Nguyen, para. 0032 “The dictionary information 14 is a named entity dictionary in which a plurality of known named entities are listed.”); and the first tokens that fail to match at least two pieces of the dictionary data are labeled as second tokens (Nguyen teaches determining matching information for tokens compared to the plurality of dictionary information: para. 0036 “In the matching processing, the control unit 12 compares each of the plurality of named entities included in the dictionary information 14 with the token string 13a.”; “matching information” labels tokens (indicates as a match via matching degree, or indicates as not a match using ‘O’), based on this comparison; para. 0077 “If a named entity similar to an n-gram is found, the machine learning apparatus 100 generates matching information for each of the tokens included in the n-gram. The matching information includes three elements, which are a class, a matching degree, and a location…”; para. 0078 “There is another class (O: Outside) indicating that the token is not a named entity. The “matching degree” is a flag indicating whether an n-gram and a known named entity have an exact match relationship (Exact) or an approximate match relationship (Approximate).”; para. 0079 “For the token for which no matching information is obtained, dummy matching information having “outside” as the class is given.”).

5. Claims 4-5, and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen in view of Wang, and further in view of Park et al. (US PGPUB No. 2022/0374602, hereinafter Park).

Regarding claim 4, Nguyen in view of Wang discloses acquire a first token feature amount which is calculated based on a predetermined calculation method, and relates to a feature of the first token itself ((Fig. 4, first token feature amount is calculated (a particular word vector T for word token W) based on predetermined calculation method (BioBERT) using features of token itself; para. 0071 “The machine learning apparatus 100 enters the word vectors W.sub.1, W.sub.2, W.sub.3, . . . , and W.sub.N into BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) 142, to convert the word vectors W.sub.1, W.sub.2, W.sub.3, . . . , and W.sub.N into word vectors T.sub.1, T.sub.2, T.sub.3, . . . , and T.sub.N. The BioBERT 142 is a trained multi-layer neural network generated by machine learning using text in the biomedical field as training data. The BioBERT 142 includes 24 transformer layers stacked in series. Each transformer is a multi-layer neural network that converts the input vectors into different vectors.”)); and format the second token based on…the first token feature amount (Fig. 4 “Tag Scores” generated for each token, based on feature amounts T; para. 0083 “The machine learning apparatus 100 combines the word vectors T.sub.1, T.sub.2, T.sub.3, . . . , and T.sub.N and the matching vectors D.sub.1, D.sub.2, D.sub.3, . . . , and D.sub.N to generate combined vectors V.sub.1, V.sub.2, V.sub.3, . . . , and V.sub.N.”; para. 0085 “The machine learning apparatus 100 enters the combined vectors … into a named entity recognition model 143 and calculates tag scores … corresponding to the tokens ... The tag scores include confidence factors of the respective pieces of tag information.”).
Nguyen in view of Wang does not specifically disclose:
calculate, for each first character included in the first token, a first character feature amount relating to a feature of the each first character based on the learning model;
calculate, for each second character included in the second token, a second character feature amount relating to a feature of the each second character based on the learning model; and
format the second token based on the first character feature amount…and the second character feature amount.
Park teaches calculate, for each first character included in the first token, a first character feature amount relating to a feature of the each first character based on the learning model (Fig. 1, First tokens which contain matches based on “Semantic Pattern Embedding Engine 150” have character embeddings calculated (Fig. 1, 124); para. 0040 “To extract the features 120 from the input 130, a variety of different engines 140-170 may be utilized to process the natural language content or unstructured text 130 and extract the corresponding features 122-128 for input to the NER computer model 110 ... The tokens may then be passed to the various engines 140-170 for additional natural language processing operations. For example, the NLP embedding engine 140 may process the tokens, e.g., word1, word2, . . . , wordn, from the input 130 to perform a word embedding and/or character embedding.”; para. 0048 “However, for purposes of the entity pattern embedding engine 150, the mapping of the pattern matching the token may be used to generate the entity pattern embedding, i.e., vector representation of the matching entity pattern, which may specify the one or more particular named entity type classifications to which the matching entity pattern is mapped, e.g., a vector may have vector slots for each of the possible named entity type classifications and values in each of the vector slots may be set based on a probability that the corresponding named entity type classification corresponds to the particular instance.”); calculate, for each second character included in the second token, a second character feature amount relating to a feature of the each second character based on the learning model (para. 0040 “To extract the features 120 from the input 130, a variety of different engines 140-170 may be utilized to process the natural language content or unstructured text 130 and extract the corresponding features 122-128 for input to the NER computer model 110 ... The tokens may then be passed to the various engines 140-170 for additional natural language processing operations. For example, the NLP embedding engine 140 may process the tokens, e.g., word1, word2, . . . , wordn, from the input 130 to perform a word embedding and/or character embedding.”); and format the second token based on the first character feature amount…and the second character feature amount (Fig. 1, Character feature amounts (124) for each token (WORD1-WORDN) in the input (130) are used to format second tokens (“Tags 115”); para. 0051 “Thus, for example, for the token “word1”, word embedding 1 122, character embedding 1 124, entity pattern embedding 1 126, and semantic category embedding 1 128 are generated.”; para. 0056 “In the depicted example, the first token “John” is tagged as a beginning (“B”) token for a person (“PER”) named entity type, or “B-PER”. The token “Doe” is tagged as an intermediate (“I”) token for a person (“PER”) named entity type, or “I-PER”. The tokens “,”, “born”, and “on” are not named entities and thus, are given tags of “0”. The token “Aug” is tagged as a beginning (“B”) of a date of birth (DOB), or “B-DOB”, where the date of birth may be a low-resource named entity type. By evaluating the word embedding, character embedding, and specifically the entity pattern embedding along with the semantic category embedding representing the pattern of high resource named entity types (NAME, DATE), then improved NER computer model determines that the token “Aug” is likely representing the beginning of a date of birth (DOB). The same is true for the intermediate (“I”) date of birth tokens “11” and “1986”.”).
Nguyen, Wang, and Park are considered to be analogous to the claimed invention as they both are in the same field of formatting input strings. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen to incorporate the teachings of Park in order to calculate first and second character feature amounts, and to use the first and second character feature amounts to format the second tokens. Doing so would beneficial, as character features are important for representing words which are out-of-vocabulary, leading to a more generalizable model (Park, para. 0041).
	
Regarding claim 5, Nguyen in view of Wang and further in view of Park teaches wherein the predetermined calculation method is a method of using a calculation model that calculates a feature amount of an input string (Nguyen, predetermined calculation method uses a model (BioBERT) to calculate feature amount of input string (Fig. 4, “Word Vectors” T); para. 0071 “The machine learning apparatus 100 enters the word vectors W.sub.1, W.sub.2, W.sub.3, . . . , and W.sub.N into BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) 142, to convert the word vectors W.sub.1, W.sub.2, W.sub.3, . . . , and W.sub.N into word vectors T.sub.1, T.sub.2, T.sub.3, . . . , and T.sub.N. The BioBERT 142 is a trained multi-layer neural network generated by machine learning using text in the biomedical field as training data. The BioBERT 142 includes 24 transformer layers stacked in series. Each transformer is a multi-layer neural network that converts the input vectors into different vectors.”), and wherein the at least one processor is configured to acquire the first token feature amount calculated by the calculation model (Nguyen, first token feature amount (“Word Vectors” T) are acquired and used to calculate combined vectors; para. 0083 “The machine learning apparatus 100 combines the word vectors T.sub.1, T.sub.2, T.sub.3, . . . , and T.sub.N and the matching vectors D.sub.1, D.sub.2, D.sub.3, . . . , and D.sub.N to generate combined vectors V.sub.1, V.sub.2, V.sub.3, . . . , and V.sub.N.”).

Regarding claim 8, Nguyen in view of Wang and further in view of Park discloses wherein the at least one processor is configured to format the second token not based on a second token feature amount relating to a feature of the second token itself, but based on the first character feature amount, the first token feature amount, and the second character feature amount (Park, para. 0040 “To extract the features 120 from the input 130, a variety of different engines 140-170 may be utilized to process the natural language content or unstructured text 130 and extract the corresponding features 122-128 for input to the NER computer model 110. …For example, the NLP embedding engine 140 may process the tokens, e.g., word1, word2, . . . , wordn, from the input 130 to perform a word embedding and/or character embedding.”).
Nguyen, Wang, and Park are considered to be analogous to the claimed invention as they both are in the same field of formatting input strings. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen in view of Wang to incorporate the teachings of Park in order to calculate format the second token based on a first and second character feature amounts, and not based on a second token itself. Doing so would beneficial in cases where a feature amount for a particular token cannot be calculated, such as cases where a particular token is out-of-vocabulary.

6. Claims 6-7 are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen in view of Wang and Park, and further in view of Lin (US PGPUB No. 2020/0050667).

Regarding claim 6, Nguyen in view of Wang and further in view of Park discloses format the second token based on…the second character feature amount (Park, para. 0040 “To extract the features 120 from the input 130, a variety of different engines 140-170 may be utilized to process the natural language content or unstructured text 130 and extract the corresponding features 122-128 for input to the NER computer model 110 ... The tokens may then be passed to the various engines 140-170 for additional natural language processing operations. For example, the NLP embedding engine 140 may process the tokens, e.g., word1, word2, . . . , wordn, from the input 130 to perform a word embedding and/or character embedding.”).
Nguyen, Wang, and Park are considered to be analogous to the claimed invention as they both are in the same field of formatting input strings. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen to incorporate the teachings of Park in order to format the second token based on the second character feature amount. Doing so would beneficial, as character features are important for representing a word, as it provides lexical information to enhance or enrich the word level embeddings in natural language processing applications (Park, para. 0041).
Nguyen in view of Wang and further in view of Park does not specifically disclose wherein the at least one processor is configured to: calculate, for the each first character, an average of the first character feature amount of the each first character and the first token feature amount; [and format the second token based on] the average.
Lin discloses wherein the at least one processor is configured to: calculate, for the each first character, an average of the first character feature amount of the each first character and the first token feature amount (para. 0078 “For a sentence of M words where each word consists of N characters (padding or truncation applied), a word embedding based input w∈custom-character.sup.M×d.sup.w is represented as a sequence of M words, where the value of a word will be filled in by its d.sub.w-dimensional word embedding vector.”; para. 0088 “Therefore, the character feature map is downsampled to size custom-character.sup.S×M×1×d.sup.w. P=[p.sub.1,1,1,1, . . . , p.sub.S,M,1,d.sub.w]”; para. 0092 “Accordingly, the resulting integrated word-character embedding vectors in the dimensions of custom-character.sup.S×M×d.sup.w can be obtained by the following equation:”; Equation 5 comprises an average of character feature P and token feature W); [and format the second token based on] the average (para. 0002 “For example, when a chatbot system processes a user query, the first step is to identify the user intent, and more specifically, is to classify each section of a sentence into broken down categories to understand the intention behind the input it has received.”; para. 0107 “S200: determining an intent class corresponding to the sentence based on the integrated word-character embeddings.”).
Nguyen, Wang, Park, and Lin are considered to be analogous to the claimed invention as Nguyen, Wang, and Park are in the same field of formatting input strings, and Lin is in the same field of generating token and character level features. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen in view of Park to incorporate the teachings of Lin in order to calculate an average of the first character feature amount and the first token feature amount. Doing so would be beneficial, as the averaged feature reflects both token-level and character-level features, making it a rich feature for formatting the query.

Regarding claim 7, Nguyen in view of Wang and Park and further in view of Lin discloses determine a weighting coefficient relating to the average based on an execution result of the matching (Lin, para. 0126 “According to some embodiments of the method, the integration can be performed through an averaging-based approach, where the integrated word-character embedding vectors are obtained by merely averaging the two representations (i.e. the pooled character feature map and the word embedding vectors) elementwisely. This averaging-based approach has an advantage of having relatively small computation time. It is noted that in this substantially weighted average-based approach, the two representations (i.e. the pooled character feature map and the word embedding vectors) can optionally be given an equal weight, but can also optionally be given a different weight in the integration.”); and calculate the average based on the weighting coefficient (para. 0126 “According to some embodiments of the method, the integration can be performed through an averaging-based approach, where the integrated word-character embedding vectors are obtained by merely averaging the two representations (i.e. the pooled character feature map and the word embedding vectors) elementwisely.”).
Nguyen, Wang, Park, and Lin are considered to be analogous to the claimed invention as Nguyen, Wang, and Park are in the same field of formatting input strings, and Lin is in the same field of generating token and character level features. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen in view of Park to incorporate the teachings of Lin in order to use a weighting coefficient in the calculation of the average. Doing so would beneficial, as the relative importance of the first character feature and the first token feature can be adjusted, making the model more flexible.

7. Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Nguyen in view of Wang and further in view of Carrier et al. (US PGPUB No. 2022/0269857, hereinafter Carrier).

Regarding claim 11, Nguyen in view of Wang discloses wherein the at least one processor is configured to: determine whether all of the plurality of tokens match the dictionary data (para. 0115 “…the model generation unit 124 generates matching information indicating {class, matching degree, location} for each token included in the n-gram selected in step S13.”). However, Nguyen does not specifically disclose omit the formatting of the second token when the all of the plurality of tokens are determined to match the dictionary data; and format the second token when only some of the plurality of tokens are determined to match the dictionary data.
Carrier discloses omit the formatting of the second token when the all of the plurality of tokens are determined to match the dictionary data (Fig. 6, if each input string matches a token in dictionary data, then string correction steps 608-620 are not performed; para. 0031 “Each input string matching a regular expression in domain specific vocabulary 206 or the global vocabulary 114 is annotated (at block 604) with an “ignore” semantic type 306.sub.i to output annotated input strings 124. The spellchecker 116 performs a loop of operations at blocks 606 through 624 for each input string 106 in the text not annotated with the ignore semantic type 306.sub.i, where an input string may comprise one or more words. If (at block 606) the input string 106 matches a token or domain specific phrase in the domain specific vocabulary 206, then the input string 106 comprises a processed input string 302, that is indicated as matched 308.sub.i and the input string 302, is outputted (at block 612) in the spellchecked input strings 302.”); and format the second token when only some of the plurality of tokens are determined to match the dictionary data (If an input string does not match, string corrections steps 608-620 are performed; para. 0032 “If (at block 608) the input string 300 does not match a token or domain specific phrase, then a determination is made as to whether the input string 300 is within an edit distance, e.g., a Damerau-Levenshtein distance, etc., of a token or domain specific phrase in the domain specific vocabulary 206. If so, the token/domain specific phrase within the edit distance is indicated (at block 616) as a corrected input string 308.sub.i for the processed input string 302, in the spellchecked input strings 302 and the pre-corrected input string 304.sub.i is saved (at block 618) in the cache 130 for the processed input string 302.sub.i. The pre-corrected input string 304, is associated (at block 620) with the corrected input string 302.sub.i in the processed input string information 300.sub.i. If (at block 614) the input string is not within an edit distance of a token or phrase in the domain specific vocabulary 206, then the spellchecker 116 performs the operations in blocks 608-620 to spellcheck with respect to the global vocabulary 114.”).
Nguyen, Wang, and Carrier are considered to be analogous to the claimed invention as they both are in the same field of formatting text inputs. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen to incorporate the teachings of Carrier in order to omit formatting a second token when all the tokens match dictionary data, and to only format the second token when some of the tokens are determined to match the dictionary data. Doing so would be beneficial, as performing a preliminary check to see if all of the tokens already match dictionary data before performing formatting would save processing time spent unnecessarily reformatting the second token, making the system more efficient.

	8. Claims 12, 14-15, and 21 is rejected under 35 U.S.C. 103 as being unpatentable over Nguyen in view of Wang and further in view of Peng et al. (US 2009/0259643 A1, hereinafter Peng).
Regarding claim 12, Nguyen in view of Wang discloses wherein the at least one processor is configured to: select any one of a plurality of operations based on the first token and the learning model (first matching tokens (Fig.4, “Matching Information”) and learning model (BioBERT) are used to predict an operation to perform for each token (which label to assign to each token); para. 0077-0078 “The matching information includes three elements, which are a class, a matching degree, and a location. [0078] …The “location” is a relative location of a target token in an n-gram. There are four types of token locations, which are single (S) in the case of a 1-gram and beginning (B), inside (I), and ending (E) in the case of a 2-gram or more.”); wherein the plurality of operations comprises a split operation (para. 0077-0078 “The matching information includes three elements, which are a class, a matching degree, and a location. [0078] …The “location” is a relative location of a target token in an n-gram. There are four types of token locations, which are single (S) in the case of a 1-gram and beginning (B), inside (I), and ending (E) in the case of a 2-gram or more.”; split comprises an end token location (E) followed by beginning token location (B)), a merge operation (para. 0077-0078 “The matching information includes three elements, which are a class, a matching degree, and a location. [0078] …The “location” is a relative location of a target token in an n-gram. There are four types of token locations, which are single (S) in the case of a 1-gram and beginning (B), inside (I), and ending (E) in the case of a 2-gram or more.”; merge comprises a beginning token location (B) followed by inside token location (I), or two consecutive inside token locations (I)), a segment operation (para. 0077-0078 “The matching information includes three elements, which are a class, a matching degree, and a location. [0078] …The “location” is a relative location of a target token in an n-gram. There are four types of token locations, which are single (S) in the case of a 1-gram and beginning (B), inside (I), and ending (E) in the case of a 2-gram or more.”; segment comprises a beginning token location (B) followed by inside token location (I), ending with end token location (E)), and a single operation (para. 0077-0078 “The matching information includes three elements, which are a class, a matching degree, and a location. [0078] …The “location” is a relative location of a target token in an n-gram. There are four types of token locations, which are single (S) in the case of a 1-gram and beginning (B), inside (I), and ending (E) in the case of a 2-gram or more.”; single comprises a single token location (S)).
Nguyen in view of Wang does not specifically disclose to then format the second token based on the selected one of the plurality of operations.
Peng teaches format the second token based on the selected one of the plurality of operations (selected operations: must-split and must-join candidates, see Fig. 2 240; para. 0035 “The language model, in ranking at block 236, may also determine which of the query words match those in the must-split and must-join sub-dictionaries 152, 154, at block 240”; these operations are then used to perform formatting of the tokens: para. 0036 “At block 244, the reformulator 140 rewrites the queries with the highest-ranked reformulated words, to include any must-split or must-join candidates that were generated in blocks 212 and 220.”).
Nguyen, Wang, and Peng are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen in view of Wang to incorporate the teachings of Peng in order to format the second token based on the selected one of the plurality of operations. Doing so would be beneficial, as this would address potential user irregularities which would improve search results returned (Peng, para. 0010 and 0013).

Regarding claim 14, Nguyen in view of Wang discloses wherein the learning model is configured to output splitting requirement information on whether the second token is to be split (splitting information constitutes determining if a particular token is the beginning of a new phrase or the end of a phrase: para. 0115 “(S15) If at least one similar named entity is retrieved in step S14, the model generation unit 124 generates matching information indicating {class, matching degree, location} for each token included in the n-gram selected in step S13. The class is a class to which the similar named entity belongs. The matching degree is a flag indicating whether the n-gram and the similar named entity completely match or approximately match with each other. The location is a relative location of the token in the n-gram.”; para. 0078 “The “location” is a relative location of a target token in an n-gram. There are four types of token locations, which are single (S) in the case of a 1-gram and beginning (B), inside (I), and ending (E) in the case of a 2-gram or more.”).
Nguyen in view of Wang does not specifically disclose to then format the second token based on the splitting requirement information.
Peng teaches to format the second token based on the splitting requirement information (splitting information determined via language model: para. 0035 “The language model, in ranking at block 236, may also determine which of the query words match those in the must-split and must-join sub-dictionaries 152, 154, at block 240…”; tokens are formatted based on the splitting requirement: para. 0036 “At block 244, the reformulator 140 rewrites the queries with the highest-ranked reformulated words, to include any must-split or must-join candidates that were generated in blocks 212 and 220.”).
Nguyen, Wang, and Peng are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen in view of Wang to incorporate the teachings of Peng in order to format the second token based on the splitting requirement information. Doing so would be beneficial, as this would address potential user irregularities which would improve search results returned (Peng, para. 0010 and 0013).

	Regarding claim 15, Nguyen in view of Wang and further in view of Peng discloses wherein the at least one processor is configured to split the second token having the splitting requirement information indicating the splitting through use of the dictionary data (Nguyen discloses splitting information through use of dictionary data: para. 0114-0115 “(S14) The model generation unit 124 searches the named entity dictionary 131 for the n-gram selected in step S13. In this case, approximate string matching is performed. The model generation unit 124 calculates the edit distance between the selected n-gram and each of the plurality of named entities included in the named entity dictionary 131 and searches for a similar named entity having the edit distance equal to or less than a threshold. [0115] (S15) If at least one similar named entity is retrieved in step S14, the model generation unit 124 generates matching information indicating {class, matching degree, location}”), to thereby format the second token (Peng teaches utilizing splitting information (must-split candidates) to then format the second token: para. 0035 “The language model, in ranking at block 236, may also determine which of the query words match those in the must-split and must-join sub-dictionaries 152, 154, at block 240…”; para. 0036 “At block 244, the reformulator 140 rewrites the queries with the highest-ranked reformulated words, to include any must-split or must-join candidates that were generated in blocks 212 and 220.”).
Nguyen, Wang, and Peng are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen in view of Wang to incorporate the teachings of Peng in order to thereby format the second token utilizing the splitting information. Doing so would be beneficial, as this would address potential user irregularities which would improve search results returned (Peng, para. 0010 and 0013).

Regarding claim 21, Nguyen in view of Wang and further in view of Peng discloses wherein the learning model outputs the one of the plurality of operations (Nguyen: first matching tokens (Fig.4, “Matching Information”) and learning model (BioBERT) are used to predict an operation to perform for each token (which label to assign to each token), which is output from learning model as token location labels (see also claim mapping for claim 12 for each specific operation); para. 0085 “The machine learning apparatus 100 enters the combined vectors V.sub.1, V.sub.2, V.sub.3, . . . , and V.sub.N into a named entity recognition model 143 and calculates tag scores s.sub.1, s.sub.2, s.sub.3, . . . , and s.sub.N corresponding to the tokens w.sub.1, w.sub.2, w.sub.3, . . . , and w.sub.N… The individual tag information indicates a class and a location such as “Gene/Protein-B” or “Drug-E”. ”; para. 0077-0078 “The matching information includes three elements, which are a class, a matching degree, and a location. [0078] …The “location” is a relative location of a target token in an n-gram. There are four types of token locations, which are single (S) in the case of a 1-gram and beginning (B), inside (I), and ending (E) in the case of a 2-gram or more.”).

	9. Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Nguyen in view of Wang and further in view of Yadav (US PGPUB No. 2021/0019309, hereinafter Yadav).

	Regarding claim 13, Nguyen in view of Wang discloses an execution result of the matching (Fig. 4, “Matching Vectors” D1-Dn) and a processing result of the learning model (Fig. 4, “Word Vectors” T1-Tn). However, Nguyen in view of Wang does not specifically disclose:
	acquire a first score relating to the first token based on an execution result of the matching; 
acquire a second score relating to the first token based on a processing result of the learning model; and 
select one of the execution result of the matching or the processing result of the learning model based on the first score and the second score, and format the first token based on the selected one of the execution result or the processing result.
	Yadav teaches acquire a first score relating to the first token based on an execution result of the matching (para. 0064 “The technique 400 includes determining 430 a first score for a first candidate database query from the set of candidate database queries. The first score may be based on a match between one or more words of the string and a token of the respective sequence of tokens of the candidate database query.”); acquire a second score relating to the first token based on a processing result of the learning model (para. 0065 “The technique 400 includes determining 440 a second score for the first candidate database query. The second score may be determined 440 based on natural language syntax data determined for words of the string. For example, natural language processing may be applied to the string to parse the string into words and determine natural language syntax data (e.g., part-of-speech tags and/or syntax tree data) for the words of the string...”; para. 0128 “For example, the natural language syntax data for words of the string may be determined 520 using a machine learning module (e.g., including a neural network) that has been trained to parse and classify words of a natural language phrases in a string.”); and select one of the execution result of the matching or the processing result of the learning model based on the first score and the second score (para. 0066 “The technique 400 includes selecting 450, based on the first score and the second score, the first candidate database query from the set of candidate database queries. For example, a ranking score for the first candidate database query may be determined (e.g., as described in relation to the technique 500 of FIG. 5) based on the first score and the second score. This ranking score may be compared to ranking scores for other candidate database queries from the set of candidate database queries and/or to a threshold, and, based on these comparisons, the first candidate database query may be selected 450.”), and format the first token based on the selected one of the execution result or the processing result (para. 0067 “The technique 400 includes invoking 460 a search of the database using a query based on the first candidate database query to obtain search results. The first candidate database query, including the sequence of tokens of the database syntax, may specify a logical set of operations for accessing and/or processing data available in one or more databases.”).
Nguyen, Wang, and Yadav are considered to be analogous to the claimed invention as they both are in the same field of formatting input strings. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen to incorporate the teachings of Yadav in order to calculate first and second scores, and to format the first token based on selection of an execution result or processing result. Doing so would beneficial, as a best formatting for the first token can be selected from multiple candidate formats, which would subsequently lead to more meaningful queries and a better user experience.
	
10. Claims 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Nguyen in view of Wang and further in view of Datta et al. (US PGPUB No. 2014/0143254, hereinafter Datta).

Regarding claim 16, Nguyen in view of Wang does not specifically disclose wherein the query is input when an online shopping service is searched, and wherein the at least one processor is configured to: identify a product genre corresponding to the query; and execute the matching based on the product genre.
Datta teaches wherein the query is input when an online shopping service is searched (Fig. 1, “Product Query 120” input into “Online Product Search System 100” for searching), and wherein the at least one processor (Fig. 4, “Processor 2010”) is configured to: identify a product genre corresponding to the query (para. 0034 “In block 320, a query classifier 160 may be used to map the product query 120 to one or more initial product categories.”); and execute the matching based on the product genre (para. 0037 “In block 350, the product category resolution module 150 can select a product category for the product query 120. The product category may be selected for having attributes (according to the category attribute dictionary 170) matching terms, or parts, of the product query 120 along with any arbitrary terms from the product query 120 according to the category ambiguous term dictionary 180. Where more than one of the product categories may be initially identified, a category with the most matches may be preferred. A category that covers all terms, or parts, of the product query 120 in either attributes or matched ambiguous terms may be selected as the category and then also result in the product query 120 to be classified as "fully understood."”).
Nguyen, Wang, and Datta are considered to be analogous to the claimed invention as Nguyen and Wang are in the same field of formatting text inputs and Datta is in the same field of search queries. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen to incorporate the teachings of Datta in order to input the query into an online shopping service, to identify a product genre, and to carry out matching based on the product genre. Doing so would be beneficial, as using product genre-based matching for query reformatting would lead to search queries yielding more relevant search results for the user, improving user experience.
	
Regarding claim 17, Nguyen in view of Wang does not specifically disclose wherein the query is input when an online shopping service is searched, and wherein the at least one processor is configured to execute the matching through use of a product title in the online shopping service as the dictionary data.
Datta teaches wherein the query is input when an online shopping service is searched (Fig. 1, “Product Query 120” input into “Online Product Search System 100” for searching), and wherein the at least one processor (Fig. 4, “Processor 2010”) is configured to execute the matching through use of a product title in the online shopping service as the dictionary data (para. 0024 “Considering the example product query 120 of "52 inch sony lcd," the product query 120 may be initially classified into the "television" product category by the query classifier 160. The product category resolution module 150 may leverage the category attribute dictionary 170 to confirm that "52 inch" is a reasonably diagonal dimension for a television, "sony" is a known television brand, and "lcd" is a common television technology. Together, these resolutions from the category attribute dictionary 170 can support a confirmation of the implied product type "television." In such an instance of an implied product category, it is further strong evidence when no term, or part, of the product query 120 offends the category determination of category C by failing to be resolved within the category attribute dictionary 170 for category C. Such a complete category identification supports characterizing the product query 120 (of "52 inch sony lcd") as being fully understood.”).
Nguyen, Wang, and Datta are considered to be analogous to the claimed invention as Nguyen and Wang are in the same field of formatting text inputs and Datta is in the same field of search queries. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Nguyen to incorporate the teachings of Datta in order to input the query into an online shopping service, and to use a product title for matching with dictionary data. Doing so would be beneficial, as using product title-based matching for query reformatting would lead to search queries yielding more relevant search results for the user, improving user experience.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Carrier et al. (US 2006/0116862 A1): tokenizer applied to multiword (Fig. 9A)
Parikh (US 8,301,437 B2): alphanumeric-based and lexical analysis-based tokenizers utilizing dictionary matching (Fig. 5)
Mengle et al. (US 2016/0062969 A1): automatically rewriting strings according to rewriting rules including splitting of a text string (Fig. 5, para. 0096)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CODY DOUGLAS HUTCHESON whose telephone number is (703)756-1601. The examiner can normally be reached M-F 8:00AM-5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CODY DOUGLAS HUTCHESON/Examiner, Art Unit 2659    

/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659
Read full office action
Prosecution Timeline

Feb 27, 2023
Application Filed
Mar 31, 2025
Non-Final Rejection — §101, §103
Jun 13, 2025
Interview Requested
Jun 25, 2025
Applicant Interview (Telephonic)
Jun 25, 2025
Examiner Interview Summary
Jul 08, 2025
Response Filed
Jul 29, 2025
Final Rejection — §101, §103
Oct 01, 2025
Interview Requested
Oct 08, 2025
Applicant Interview (Telephonic)
Oct 08, 2025
Examiner Interview Summary
Nov 05, 2025
Request for Continued Examination
Nov 14, 2025
Response after Non-Final Action
Feb 19, 2026
Non-Final Rejection — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/330,472
Patent 12603096
VOICE ENHANCEMENT METHODS AND SYSTEMS
2y 5m to grant Granted Apr 14, 2026
18/545,677
Patent 12591750
GENERATIVE LANGUAGE MODEL UNLEARNING
2y 5m to grant Granted Mar 31, 2026
18/163,230
Patent 12579447
TECHNIQUES FOR TWO-STAGE ENTITY-AWARE DATA AUGMENTATION
2y 5m to grant Granted Mar 17, 2026
18/217,880
Patent 12537018
METHOD AND SYSTEM FOR PREDICTING A MENTAL CONDITION OF A SPEAKER
2y 5m to grant Granted Jan 27, 2026
17/877,543
Patent 12530529
DOMAIN-SPECIFIC NAMED ENTITY RECOGNITION VIA GRAPH NEURAL NETWORKS
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
62%
Grant Probability
99%
With Interview (+47.1%)
2y 10m
Median Time to Grant
High
PTA Risk
Based on 24 resolved cases by this examiner. Grant probability derived from career allow rate.