Last updated: May 29, 2026
Application No. 18/323,862
PHONOLOGY-CENTRIC SEARCHING

Non-Final OA §101§103
Filed
May 25, 2023
Examiner
DUGDA, MULUGETA TUJI
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Microsoft Technology Licensing, LLC
OA Round
2 (Non-Final)
Interview Optional

— +19.3% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 82% grant rate with +19.3% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 50 resolved cases, 2023–2026
Examiner Intelligence

DUGDA, MULUGETA TUJI View full profile →
Grants 82% — above average
Career Allowance Rate
41 granted / 50 resolved
+20.0% vs TC avg
Strong +19% interview lift
Without
With
+19.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
10 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
6.2%
-33.8% vs TC avg
§103
89.2%
+49.2% vs TC avg
§102
4.7%
-35.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 50 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
The information disclosure statements (IDS) submitted on 12/08/2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.

Response to Arguments
Applicant’s arguments, see pages 8-11, filed on 09/25/2025, with respect to the 35 U.S.C. 101 rejections have been fully considered but they are not persuasive. 
The Applicant argues that “the claims recite a neural phonemic translation machine learning model trained using phonemic index pairs from different languages (see claim 4 and claim 22), which is not conventional or routine” (Arguments, page 9).
The Examiner respectfully disagrees. The “neural phonemic translation machine learning model” is just additional elements of “neural” and “machine learning model” as per the independent claims and these additional elements can be implemented using laptop computers, desktop computers or tablet computers, or using any general-purpose computers, as indicated in the Spec (Spec, para 0055). 
The Applicant argues that “The claims recite a technological improvement in phonetic search systems, using embedding-based similarity, neural phonemic translation, and dual orthography indexing, all implemented in a computing system” (Arguments, page 10).
The Examiner respectfully disagrees. The Examiner believes that the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. And, also there is no clear indication that the combination of the elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional general purpose computer implementation.
The Applicant further argues that “the amendment aligns with USPTO Example 42 (May 2019 Update), which found that claims involving specifically-designed training and using a neural network to identify facial features were patent-eligible because they recited a specific implementation of a machine learning model (see, e.g., claims 4, 11, 18, and 22). Here, the "executing" operation encompasses not only the comparison of embeddings but also the use of a neural phonemic translation model (as recited in these dependent claims) and the indexing of phonemic variants across languages. This further reinforces that the claimed method is rooted in computer technology and is not abstract” (Arguments, page 11).
The Examiner respectfully disagrees. Mere implementation of machine learning model for identification of facial features will not make the application patent-eligible. This is because, as stated earlier, these can be implemented using any general-purpose computers, laptop computers, desktop computers or tablet computers, as clearly indicated in your Spec (Spec, para 0055)
	Thus, the 35 U.S.C. 101 rejections are maintained.
Applicant’s arguments, see pages 11-16, filed on 09/25/2025, with respect to the 35 U.S.C. 103 rejections of claims 1-22 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
 
The Applicant argues, regarding claim 1, that Elisha teaches one inverted index built from speech-derived phoneme strings and does not teach a second index for a second orthography, nor does it address orthographic diversity or phonemic variants of textual content. The Applicant further argues that Garg teaches variant indexing of textual terms and does not teach phonemic variants tied to a second orthography and so the combination of Elisha and Garg fails to disclose or suggest this recited feature (Arguments, page 12-13).
The Examiner respectfully disagrees.  Elisha discloses, for instance in Figure 2, Indexing unit 220 may create an inverted index or reverse-index in which terms are mapped to one or more documents, and the terms in the inverted index are sorted in ascending or descending lexicographical order, for example, an inverted index is a mapping of terms to a recording and/or to any object related to the recording (e.g., to a phonetic transcription of the recording). Moreover, Elisha, in Figure 2, elements 225 and 230, introduces some embodiments of the invention, a system and method that may include indexing phonetic transcriptions according to one or more phoneme such that a phoneme (or set of phonemes) can be used as a search key in order to search for the phoneme (or set of phonemes) in the indexed phonetic transcriptions. For example, and as shown, a system 200 may include an indexing unit 220. According to some embodiments of the invention, indexing unit 220 receives phonetic transcriptions (e.g., produced by phonetic indexing unit 215) and inverse-indexes the phonetic transcriptions such that a phoneme or set of phonemes can be searched in the revers-indexed phonetic transcriptions using the phoneme (or using a set of phonemes) as a search key (Elisha, para 0053 and 0039). Thus, Ellisha teaches that more than one index can be built by its indexing unit. Moreover, Garg enables performing searches, more particularly, finding results by incremental search using a keypad having overloaded keys as the input device when the input contains orthographic and typographic errors. Garg teaches, in its error models, include one or a combination of generating typographic variants of the descriptive terms that characterize the content items and generating orthographic variants of the descriptive terms that characterize the content items. The variants in Garg help compensate for typical orthographic and typographic misspellings that users make, and these are in essence errors of insertion and it also generates phonetic equivalents of the intended search term (Garg, para 0003 and 0010). Thus, Garg teaches variant indexing of textual terms as well as phonemic variants, and thus, the combination of Elisha and Garg discloses the recited feature.
The Applicant argues, after amending their claims, that none of the cited references teach cross-lingual searching. The Applicant asserts that the Office Action points to Robertson as disclosing multiple languages, but Robertson is simply pointing out the source data for supporting the transcription of each individual language, and it fails to disclose or suggest operating on or comparing tokens of multiple languages, as it operates on one language at a time and does not perform cross-lingual searching or comparisons. The Applicant further asserts that none of the cited references teaches multiple language indexing, particularly comparing phonemic embeddings of different languages (Arguments, page 13, 4th para, and page 14, 3rd para).  
The Examiner respectfully disagrees. Robertson teaches about cross-lingual searching. For instance, Robertson teaches that the choice of dictionary employed for lookup or decision tree training is typically determined by the language being used and it teaches about some of the phonetic dictionaries that are freely available for download on the internet, including BOMP (German), Lexique (French) and the MBRDICO project which provides dictionary resources for several languages, as well as the Unicode Unihan database that provides detailed properties and pronunciations for the characters used in Chinese, Japanese and Korean orthography (Robertson, para 0036-0046). 
	The Applicant also argues that Jia nor Elisha and Garg teach comparing phoneme embeddings corresponding to each phoneme of the input token to phoneme embeddings of each phoneme of each content token (Arguments, page 13, 1st para, and page 14, 2nd para).
The Examiner respectfully disagrees. Jia teaches comparing phoneme embeddings corresponding to each phoneme of the input token to phoneme embeddings of each phoneme of each content token by determining the respective grapheme token representing the respective word of sequence of words corresponding to the respective phoneme token that may include determining that the respective grapheme token includes a corresponding word position embedding that matches the respective word position embedding of the respective phoneme token (Jia, para and 0013).
Because the Applicant changed the scope of the claims, especially by modifying the independent claim limitations, esp. the limitation “the inverted index database includes a first inverted index corresponding to phonemic variants of the content tokens and to the first orthography and a second inverted index corresponding to phonemic variants of the content tokens and to a second orthography of a second language” needs to be addressed with a new reference.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 1, 8 and 15 recite “generating … executing …returning” as drafted cover an abstract idea of data analysis/retrieval and mental steps. More specifically, the “generating a phonemic index for the input token in a first orthography of a first language; executing an approximate matching analysis on content tokens of the inverted index database based on the phonemic index for the input token by comparing phoneme embeddings corresponding to each phoneme of the input token to phoneme embeddings of each phoneme of each content token, wherein the inverted index database includes a first inverted index corresponding to phonemic variants of the content tokens and to the first orthography and a second inverted index corresponding to phonemic variants of the content tokens and to a second orthography of a second language; and returning one or more search results based on the approximate matching analysis” is just a mental process. For an input token of a search query on paper, one can search for a corresponding inverted index database by generating a phonemic index, and executing an approximate matching of the phonemic index of the input token with the content tokens of the inverted index database on a paper. Then one or more search results can be returning based on the approximate matching on a paper, using just paper and pen/pencil and this just requires a mental process and a human can perform these steps without a machine. Moreover, one can make different similarity or distance value and comparison based on the available token data, and also a human can mentally make a comparison decision. The claimed invention is, therefore, directed to an abstract idea and a mental process without significantly more and thus, claims 1, 8 and 15 are rejected under 35 U.S.C. 101.
Similarly, the dependent claims 2-7, 9-14, and 16-20 recite similar claim language as in claims 1, 8 and 15. Claims 2, 9, and 16 recite “generating a score for each content token based on per-phoneme similarity analysis with the input token, wherein the score represents a combination of measurements of similarity between phoneme embeddings corresponding to each phoneme of the input token compared to phoneme embeddings for each corresponding phoneme of each content token,” which requires just data analysis / retrieval step and a simple mathematical procedure and a mental step as well. These steps can be performed by calculating a similarity/distance value for each of the input token of phoneme embeddings with each phoneme of the content token. One can make such different similarity or distance value and comparison based on the available token data, and also a human can mentally make a comparison decision by an observation of the available data. No additional elements are present Thus, these claims 2, 9, and 16 are directed to an abstract idea.
	Claim 3, 10 and 17 recite “generating a phonemic index for a first content token corresponding to the first orthography; and adding the phonemic index for the first content token in the first orthography to an inverted index corresponding to the first orthography in the inverted index database,” which also requires just a mental step. One can create a phonemic index data for a first content token corresponding to the first orthography on a paper and then add those phonemic indices for the first content token in the first orthography to an inverted index corresponding to the first orthography in the inverted index database. This just requires pen/pencil and paper Thus, claims 3, 10 and 17 are directed to an abstract idea.
Claim 4, 11 and 18 which recites “generating a phonemic variant of a first content token corresponding to the second orthography using a neural phonemic translation machine learning model trained using phonemic index pairs corresponding to the first language and the second language,” which also requires just a mental step and the additional elements “neural” and “machine learning model” are explained below in the next couple of paragraphs.. One can generate a second inverted index corresponding to phonemic variants of the first content token using pen/pencil and paper. The neural phonemic translation machine learning model portion of the claim can be easily implemented using conventional/general-purpose computer (Spec., para 0055). Thus, claim 4, 11 and 18 are directed to an abstract idea.

Claim 5, 12 and 19 which recites “biasing the phonemic variant of the first content token toward pronunciation of the second orthography,” which also requires just a simple mental step to analyze a phonemic variant of the first content token toward pronunciation of the second orthography. Individual tokens can be converted into phonemic indices which can bias the resulting phonemic index towards pronunciation in a corresponding orthography. The phonemic index conversion as well as the biasing towards pronunciation of the second orthography can be performed with a paper and pencil as well as with a mental step. Thus, claims 5, 12 and 19 are directed to an abstract idea.
Claim 6, 13 and 20 which recites “generating the phonemic index for a phonemic variant of a first content token corresponding to the second orthography; and adding the phonemic index of the phonemic variant of the first content token corresponding to the second orthography to an inverted index corresponding to the second orthography in the inverted index database,” which requires just a mental step or a simple mathematical procedure. Mentally, one can create the phonemic index for a phonemic variant of a first content token corresponding to the second orthography using paper and pen/pencil. Then one can also include the phonemic index for the first content token corresponding to the second orthography to an inverted index corresponding to the second orthography in the inverted index database using paper and pen/pencil. Thus, claims 6, 13 and 20 are directed to an abstract idea.
Claim 7 and 14 which recites “converting the input token into a phonetic representation; and transcribing the phonetic representation of the input token into the phonemic index for the input token, wherein the phonemic index includes phoneme embeddings corresponding to each phoneme of the input token,” which requires just a simple mental step and this can be implemented  by converting mentally the input token into a phonetic representation that involves transcribing the input token into a phonetic representation, such as using the International Phonetic Alphabet (IPA), the Extended SAM Phonetic Alphabet (X-SAMPA), or some other system of universal symbols that classifies sounds present in different languages around the world.  The step also involves transcribing the phonetic representation of the input token into the phonemic index for the input token that can be done mentally and using pen/pencil and papers. Thus, claim 7 and 14 are directed to an abstract idea.
Thus, claims 1-20 as drafted cover a mental process and abstract idea of data gathering/retrieval and analysis/processing steps, and they are mental processes directed to an abstract idea of implementing mathematical formulae for data processing and data analysis using a conventional/generic (general-purpose) computer as well and thus, all the claims are directed to an abstract idea. 
This judicial exception is not integrated into a practical application. In particular, independent claim 1, 8 and 15 recite additional elements of “processor,” “processor-readable storage media” as per the independent claims. Moreover, dependent claims 4, 11 and 18 recite additional elements of “neural” and “machine learning model” as per the independent claims.   These different additional elements can be implemented using laptop computers, desktop computers or tablet computers, generally, any general-purpose computers, as indicated in the Spec (Spec, para 0055). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional general purpose computer implementation. Claims 1-20, are therefore not drawn to patent eligible subject matter as they are directed to an abstract idea without significantly more. Thus, the claimed  invention is directed to an abstract idea and a mental process without significantly more and thus, claims 1-20 are rejected under 35 U.S.C. 101. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer as noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept (Spec., para 00116). Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. The claims are not patent eligible.
Dependent claims 2-7, 9-14, and 16-20 are also directed toward an abstract idea and do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. Therefore, claims 1-20 do not contain patent eligible subject matter that has been identified by the courts.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 6-10, 13-17 and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Elisha et al. Pat App No. US 20160275945 A1 (Elisha) in view of Gupta et al., "Query expansion for mixed-script information retrieval, SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 2014 (Gupta),, and further in view of Jia et al. Pat App No. US 20220310059 A1 (Jia).

	Regarding Claim 1, Elisha discloses a method of performing a cross-lingual search of search query (Elisha, para 0010, A system and method may transcribe a textual search term into a set of search phoneme strings and use the set of search phoneme strings to conduct a set of searches; Elisha, Figure 2, elements 225 and 230, para 0039, According to some embodiments of the invention, a system and method may include indexing phonetic transcriptions according to one or more phoneme such that a phoneme (or set of phonemes) can be used as a search key in order to search for the phoneme (or set of phonemes) in the indexed phonetic transcriptions. For example and as shown, a system 200 may include an indexing unit 220. According to some embodiments of the invention, indexing unit 220 receives phonetic transcriptions (e.g., produced by phonetic indexing unit 215) and inverse-indexes the phonetic transcriptions such that a phoneme or set of phonemes can be searched in the revers-indexed phonetic transcriptions using the phoneme (or using a set of phonemes) as a search key), the method comprising:
generating a phonemic index of the input token in a first orthography of a first language (Elisha, Figure 3, element 340, para 0056-0057, As shown by block 340, a user may enter a phrase to be searched… A system and method may transcribe a textual search term into a set of search phoneme strings; [i.e., “phrase to be searched” or “textual search term” as “input token”; “transcribing a textual search term into a set of search phoneme strings” as “generating a phonemic index for the input token”]);
executing an approximate matching analysis on content tokens of the inverted index database based on the phonemic index of the input token (Elisha, para 0053, Figure 2, Indexing unit 220 may create an inverted index or reverse-index in which terms are mapped to one or more documents, and the terms in the inverted index are sorted in ascending or descending lexicographical order, for example, an inverted index is a mapping of terms to a recording and/or to any object related to the recording (e.g., to a phonetic transcription of the recording); Elisha, 0059-0060, Figure 3, elements, 330,  355 and 360, As shown by block 355, a system and method may include … set of search phoneme strings. As shown by block 360, confusion matrix and fuzziness parameters may be used in order to expand a query or perform query permutation and fuzzy search… fuzzy search. Generally, confusion matrix, fuzziness parameters may be used to allow some form of error (or distance) between a searched phrase and the phonetic transcription that include the phrase or that include phrases which are close (or similar) to the searched phrase……For example, a confusion matrix may be trained or updated using transcriptions of relevant audio content, e.g., audio related to the same language, vocabulary or accent and as in the speech recordings in database 210; [i.e., “fuzzy search… fuzziness parameters may be used to allow some form of error (or distance)… phonetic transcription…that include phrases which are close (or similar) to the searched phrase” as “search which executes an approximate matching analysis on content tokens …on phonemic index for the input token”; ]), 
returning one or more search results based on the approximate matching analysis (Elisha, para 0071, Figure 3, As shown by block 370, a search platform may return a list of documents wherein the list may be sorted or the documents may be ranked… In some embodiments, a phonetic ranking may be implemented such that a ranking of documents found is according to a match of their content with a phoneme. For example, as described, a textual term received from a user as a search key may be converted to a phoneme and the phoneme may be searched. A ranking unit may rank documents found based on a matching of the content in the documents to the phoneme used in the search; Elisha, para 0060, … A Fuzziness parameter may be a value that sets or determines the degree of error (distance) the user allows between the searched phrase and the indexed transcription).
 Elisha does not specifically disclose wherein the inverted index database include a first inverted index corresponding to phonemic variants of the content tokens and to the first orthography and a second inverted index corresponding to phonemic variants of the content tokens and to a second orthography of a second language.
However, Gupta, in the same field of endeavor, discloses wherein the inverted index database includes a first inverted index corresponding to phonemic variants of the content tokens and to the first orthography and a second inverted index corresponding to phonemic variants of the content tokens and to a second orthography of a second language (Gupta, page 682, col 2, 2nd para – page 683, 1st col, 3rd para, Now we describe the experimental set up for evaluating the effectiveness of the proposed method for retrieval in Mixed-Script space. 5.1 Dataset We used the FIRE 2013 shared task collection on Transliterated Search [26] for experiments and training. The dataset comprises of document collection, queryset (Q) and relevance judgments. The collection (D1) contains 62,888 documents containing song title and lyrics in Roman, Devanagari and mixed scripts. Statistics of the document collection is given in Table 2 (a). The Q contains 25 lyrics search queries for Bollywood songs in Roman script with mean query length of 4.5 words. Table 2 (b) lists a few examples of queries from Q.

    PNG
    media_image1.png
    96
    348
    media_image1.png
    Greyscale

The experimental setup is a standard adhoc retrieval setting.The document collection is first indexed to create an inverted index and the index lexicon is used as mining lexicon. Being this a lyrics retrieval set up, the sequential information among the terms is crucial for effectiveness evaluation, e.g. “love me baby” and “baby love me” are completely different songs. In order to capture the word-ordering we consider word 2-grams as a unit for indexing and retrieval. The non-trivial part of mixed-script IR is query-enrichment to handle the challenges described in Sec. 2. In order to enrich the query with equivalents, we find the equivalents of the query terms as described in Section 4.5 and the word 2-gram query is formulated as shown in [13]. We consider a variety of systems to be compared with the proposed method. The query formulation is similar for all the systems including the retrieval settings like inverted index, retrieval model and mining lexicon except the method of finding the equivalents… The problem of finding equWe ivalents is formulated as searching across the views by learning hashing functions as presented in [20]… An inverted index of hashcodes is prepared for terms in mining lexicon. The equivalents for the query term are found from this index according to the score given by the graph matching algorithm (according to the cosine similarity in the common geometric space) of [30]).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Gupta in the method of Elisha because this would present an extensive empirical analysis of the proposed method along with the evaluation results in an ad-hoc retrieval setting of mixedscript IR where the proposed method achieves significantly better results (12% increase in MRR and 29% increase in MAP) compared to other state-of-the-art baselines(Gupta, Abstract).
	Elisha in view of Gupta do not disclose comparing phoneme embeddings corresponding to each phoneme of the input token to phoneme embeddings of each phoneme of each content token.
	However, Jia, in the same field of endeavor, discloses by comparing phoneme embeddings corresponding to each phoneme of the input token to phoneme embeddings of each phoneme of each content token (Jia, para 0013,  determining the respective grapheme token representing the respective word of sequence of words corresponding to the respective phoneme token may include determining that the respective grapheme token includes a corresponding word position embedding that matches the respective word position embedding of the respective phoneme token; [i.e., “grapheme token” as “input token” and “word position embedding of the respective phoneme token” as “content token”]).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Jia in the method of Elisha in view of Gupta because this would enable the search engine return one or more search results that the device interprets to generate a response for the user (Jia, para 0036).


	Regarding Claim 2, Elisha in view of Gupta, and further in view of Jia disclose the method of claim 1, wherein executing the approximate matching analysis comprises (Elisha, 0059-0060, Figure 3, elements, 330, 355 and 360, … fuzzy search. Generally, confusion matrix, fuzziness parameters may be used to allow some form of error (or distance) between a searched phrase and the phonetic transcription that include the phrase or that include phrases which are close (or similar) to the searched phrase):
generating a score for each content token based on per-phoneme similarity analysis with the input token, wherein the score represents a combination of measurements of similarity between phoneme embeddings corresponding to each phoneme of the input token compared to phoneme embeddings [[for]] each corresponding phoneme of each content token (Elisha, para 0071, Figure 3, As shown by block 370, a search platform may return a list of documents wherein the list may be sorted or the documents may be ranked. For example, Solr provides ranking of results that may be used. In some embodiments, a phonetic ranking may be implemented such that a ranking of documents found is according to a match of their content with a phoneme. For example, as described, a textual term received from a user as a search key may be converted to a phoneme and the phoneme may be searched. A ranking unit may rank documents found based on a matching of the content in the documents to the phoneme used in the search; Elisha, para 0009, A system and method may include, in a set of search phoneme strings, at least one phoneme string based on a pre-configured distance from the textual search term. A system and method may statistically calculate a probability of a recognition error for a phoneme; and based on relating a fuzziness parameter value to the probability, select to include or exclude a phonetic transcription in a result of searching for an element in the set of phonetic transcriptions. A system and method may statistically calculate a probability of a recognition error for a phoneme; and based on relating a fuzziness parameter value to the probability, select to include the phoneme in the set of search phoneme strings).

	Regarding Claim 3, Elisha in view of Gupta, and further in view of Jia discloses the method of claim 1, further comprising:
generating a phonemic index [[for]] of a first content token corresponding to the first orthography (Elisha, para 0049, A phoneme or phonetic lattice is known in the art, for example, a phonetic lattice may be a directed acyclic graph (e.g., as described in http://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton), which represents, records and/or provides the probability of each phoneme to be output at a certain time. Accordingly, using a phoneme or phonetic lattice produced as described, sub-words, words and/or phrases may be assigned, or associated with, a certainty value, a probability value or a score or rank value. For example, a phonetic lattice may include all possible tracks of phonetic transcriptions for an audio file provided as input and, using the probabilities of words and/or phrases, a probability for each track may be calculated); and
adding the phonemic index [[for]] of the first content token in the first orthography to an inverted index corresponding to the first orthography in the inverted index database (Elisha, para 0050, Figure 2, Indexing unit 220 may reverse-index a set of transcriptions. For example, indexing unit 220 may tokenize a string of phonemes into one or more phoneme tokens (terms), and may construct an inverse index that maps a set of documents (e.g., recorded conversations) to tokens. For example, to invert or reverse an indexing of a set of recordings, indexing unit 220 may use any parsing rules or criteria (321) when inverting an index of a set of transcriptions. A user may select a rule or criteria and indexing unit 220 may reverse-index a set of transcriptions based on a rule received from a user (e.g., stored as shown by block 321)).

Regarding Claim 6, Elisha in view of Gupta and further in view of Jia disclose the method of claim 1, further comprising:
generating the phonemic index [[for]] of a phonemic variant of a first content token corresponding to the second orthography (Elisha, para 0049, A phoneme or phonetic lattice is known in the art, for example, a phonetic lattice may be a directed acyclic graph …, which represents, records and/or provides the probability of each phoneme to be output at a certain time. Accordingly, using a phoneme or phonetic lattice produced as described, sub-words, words and/or phrases may be assigned, or associated with, a certainty value, a probability value or a score or rank value. For example, a phonetic lattice may include all possible tracks of phonetic transcriptions for an audio file provided as input and, using the probabilities of words and/or phrases, a probability for each track may be calculated); and
adding the phonemic index [[for]] of the phonemic variant of the first content token corresponding to the second orthography to an inverted index corresponding to the second orthography in the inverted index database (Elisha, para 0050, Figure 2, Indexing unit 220 may reverse-index a set of transcriptions. For example, indexing unit 220 may tokenize a string of phonemes into one or more phoneme tokens (terms), and may construct an inverse index that maps a set of documents (e.g., recorded conversations) to tokens. For example, to invert or reverse an indexing of a set of recordings, indexing unit 220 may use any parsing rules or criteria (321) when inverting an index of a set of transcriptions. A user may select a rule or criteria and indexing unit 220 may reverse-index a set of transcriptions based on a rule received from a user (e.g., stored as shown by block 321)).

Regarding Claim 7, Elisha in view of Gupta, and further in view of Jia disclose the method of claim 1, wherein generating the phonemic index of the input token comprises:
converting the input token into a phonetic representation (Elisha, para 0057, For example, and as shown by block 345, a search phrase may be translated or converted into a set of phonemes… to convert a search phrase into a set of phoneme); and
transcribing the phonetic representation of the input token into the phonemic index [[for]] of the input token, wherein the phonemic index includes phoneme embeddings corresponding to each phoneme of the input token (Elisha, para 58, a query constructor unit may parse the query phonetic sequence using the same method used for indexing as described herein (e.g., using single phonemes or K-phonemes); Elisha, para 0038-0039, A system and method according to some embodiments of the invention include transcribing a set of speech recordings to a set of phoneme strings. A system and method according to some embodiments of the invention may include storing or including phoneme strings in a set of phonetic transcriptions. For example and as shown, a system 200 may include a phonetic indexing unit 215. For example, phonetic indexing unit 215 may be a device similar to computing device 100, e.g., including a processor and memory as described. For example, in some embodiments of the invention, phonetic indexing unit 215 extracts audio content (e.g., speech recordings) from database 210, and/or transcribes the audio content to a string of phonemes. Based on a configuration parameter that includes or indicates acoustic and/or language properties or characteristics, phonetic indexing unit 215 may generate a string of phonemes that match, or best suit, the acoustic and/or language properties of the audio content. In some embodiments, phonetic indexing unit 215 may generate phonetic transcriptions by including phoneme strings in a set of phonetic transcriptions. For example, a set of phoneme strings identified by phonetic indexing unit 215, in a recording of a conversation, may be stored, or included, in a phonetic transcription, e.g., as a file on storage system 130. According to some embodiments of the invention, a system and method may include indexing phonetic transcriptions according to one or more phoneme such that a phoneme (or set of phonemes) can be used as a search key in order to search for the phoneme (or set of phonemes) in the indexed phonetic transcriptions. For example and as shown, a system 200 may include an indexing unit 220. According to some embodiments of the invention, indexing unit 220 receives phonetic transcriptions (e.g., produced by phonetic indexing unit 215) and inverse-indexes the phonetic transcriptions such that a phoneme or set of phonemes can be searched in the revers-indexed phonetic transcriptions using the phoneme (or using a set of phonemes) as a search key. It will be noted that a term used as a search key or search term may be or may include one or more phonemes).

Regarding Claim 8, Elisha discloses a computing system for searching an inverted index database for an input token of a search query (Elisha, para 0010, A system and method may transcribe a textual search term into a set of search phoneme strings and use the set of search phoneme strings to conduct a set of searches; Elisha, Figure 2, elements 225 and 230, para 0039, According to some embodiments of the invention, a system and method may include indexing phonetic transcriptions according to one or more phoneme such that a phoneme (or set of phonemes) can be used as a search key in order to search for the phoneme (or set of phonemes) in the indexed phonetic transcriptions. For example and as shown, a system 200 may include an indexing unit 220. According to some embodiments of the invention, indexing unit 220 receives phonetic transcriptions (e.g., produced by phonetic indexing unit 215) and inverse-indexes the phonetic transcriptions such that a phoneme or set of phonemes can be searched in the revers-indexed phonetic transcriptions using the phoneme (or using a set of phonemes) as a search key), the computing system comprising:
one or more hardware processors (Elisha, para 0022, Computing device 100 may include a controller 105 that may be, for example, a central processing unit processor (CPU));
a phonemic indexer executable by the one or more hardware processors and configured to generate a phonemic index [[for]]of the input token of a first orthography of a first language (Elisha, Figure 3, element 340, para 0056-0057, As shown by block 340, a user may enter a phrase to be searched… A system and method may transcribe a textual search term into a set of search phoneme strings; [i.e., “phrase to be searched” or “textual search term” as “input token”;  “transcribing a textual search term into a set of search phoneme strings” as “generating a phonemic index for the input token””]);
an approximate match analyzer executable by the one or more hardware processors and configured to execute an approximate matching analysis on content tokens of the inverted index database based on the phonemic index of the input token (Elisha, para 0053, Figure 2, Indexing unit 220 may create an inverted index or reverse-index in which terms are mapped to one or more documents, and the terms in the inverted index are sorted in ascending or descending lexicographical order, for example, an inverted index is a mapping of terms to a recording and/or to any object related to the recording (e.g., to a phonetic transcription of the recording); Elisha, 0059-0060, Figure 3, elements, 330,  355 and 360, As shown by block 355, a system and method may include … set of search phoneme strings. As shown by block 360, confusion matrix and fuzziness parameters may be used in order to expand a query or perform query permutation and fuzzy search… fuzzy search. Generally, confusion matrix, fuzziness parameters may be used to allow some form of error (or distance) between a searched phrase and the phonetic transcription that include the phrase or that include phrases which are close (or similar) to the searched phrase……For example, a confusion matrix may be trained or updated using transcriptions of relevant audio content, e.g., audio related to the same language, vocabulary or accent and as in the speech recordings in database 210; [i.e., “fuzzy search… fuzziness parameters may be used to allow some form of error (or distance)… phonetic transcription…that include phrases which are close (or similar) to the searched phrase” as “search which executes an approximate matching analysis on content tokens …on phonemic index for the input token”; ]), 
a score conditioner executable by the one or more hardware processors and configured to return one or more search results based on the approximate matching analysis (Elisha, para 0071, Figure 3, As shown by block 370, a search platform may return a list of documents wherein the list may be sorted or the documents may be ranked… In some embodiments, a phonetic ranking may be implemented such that a ranking of documents found is according to a match of their content with a phoneme. For example, as described, a textual term received from a user as a search key may be converted to a phoneme and the phoneme may be searched. A ranking unit may rank documents found based on a matching of the content in the documents to the phoneme used in the search; Elisha, para 0060, … A Fuzziness parameter may be a value that sets or determines the degree of error (distance) the user allows between the searched phrase and the indexed transcription).
Elisha does not specifically disclose wherein the inverted index database includes a first inverted index corresponding to phonemic variants of the content tokens and to the first orthography and a second inverted index corresponding to phonemic variants of the content tokens and to a second orthography of a second language; and
However, Gupta, in the same field of endeavor, discloses wherein the inverted index database includes a first inverted index corresponding to phonemic variants of the content tokens and to the first orthography and a second inverted index corresponding to phonemic variants of the content tokens and to a second orthography of a second language (Gupta, page 682, col 2, 2nd para – page 683, 1st col, 3rd para, Now we describe the experimental set up for evaluating the effectiveness of the proposed method for retrieval in Mixed-Script space. 5.1 Dataset We used the FIRE 2013 shared task collection on Transliterated Search [26] for experiments and training. The dataset comprises of document collection, queryset (Q) and relevance judgments. The collection (D1) contains 62,888 documents containing song title and lyrics in Roman, Devanagari and mixed scripts. Statistics of the document collection is given in Table 2 (a). The Q contains 25 lyrics search queries for Bollywood songs in Roman script with mean query length of 4.5 words. Table 2 (b) lists a few examples of queries from Q.

    PNG
    media_image1.png
    96
    348
    media_image1.png
    Greyscale

The experimental setup is a standard adhoc retrieval setting.The document collection is first indexed to create an inverted index and the index lexicon is used as mining lexicon. Being this a lyrics retrieval set up, the sequential information among the terms is crucial for effectiveness evaluation, e.g. “love me baby” and “baby love me” are completely different songs. In order to capture the word-ordering we consider word 2-grams as a unit for indexing and retrieval. The non-trivial part of mixed-script IR is query-enrichment to handle the challenges described in Sec. 2. In order to enrich the query with equivalents, we find the equivalents of the query terms as described in Section 4.5 and the word 2-gram query is formulated as shown in [13]. We consider a variety of systems to be compared with the proposed method. The query formulation is similar for all the systems including the retrieval settings like inverted index, retrieval model and mining lexicon except the method of finding the equivalents… The problem of finding equWe ivalents is formulated as searching across the views by learning hashing functions as presented in [20]… An inverted index of hashcodes is prepared for terms in mining lexicon. The equivalents for the query term are found from this index according to the score given by the graph matching algorithm (according to the cosine similarity in the common geometric space) of [30]).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Gupta in the method of Elisha because this would present an extensive empirical analysis of the proposed method along with the evaluation results in an ad-hoc retrieval setting of mixedscript IR where the proposed method achieves significantly better results (12% increase in MRR and 29% increase in MAP) compared to other state-of-the-art baselines(Gupta, Abstract).
	Elisha in view of Gupta do not disclose comparing phoneme embeddings corresponding to each phoneme of the input token to phoneme embeddings of each phoneme of each content token.
	However, Jia, in the same field of endeavor, discloses by comparing phoneme embeddings corresponding to each phoneme of the input token to phoneme embeddings of each phoneme of each content token Jia, para 0013,  determining the respective grapheme token representing the respective word of sequence of words corresponding to the respective phoneme token may include determining that the respective grapheme token includes a corresponding word position embedding that matches the respective word position embedding of the respective phoneme token; [i.e., “grapheme token” as “input token” and “word position embedding of the respective phoneme token” as “content token”]).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Jia in the method of Elisha in view of Gupta because this would enable the search engine return one or more search results that the device interprets to generate a response for the user (Jia, para 0036).

	Regarding Claim 9, Elisha in view of Gupta, and further in view of Jia disclose the computing system of claim 8, wherein the approximate match analyzer (Elisha, 0059-0060, Figure 3, elements, 330, 355 and 360, … fuzzy search. Generally, confusion matrix, fuzziness parameters may be used to allow some form of error (or distance) between a searched phrase and the phonetic transcription that include the phrase or that include phrases which are close (or similar) to the searched phrase) is further configured to:
generate a score [[for]] of each content token based on per-phoneme similarity analysis with the input token, wherein the score represents a combination of measurements of similarity between phoneme embeddings corresponding to each phoneme of the input token compared to phoneme embeddings [[for]] of each corresponding phoneme of each content token (Elisha, para 0071, Figure 3, As shown by block 370, a search platform may return a list of documents wherein the list may be sorted or the documents may be ranked. For example, Solr provides ranking of results that may be used. In some embodiments, a phonetic ranking may be implemented such that a ranking of documents found is according to a match of their content with a phoneme. For example, as described, a textual term received from a user as a search key may be converted to a phoneme and the phoneme may be searched. A ranking unit may rank documents found based on a matching of the content in the documents to the phoneme used in the search; Elisha, para 0009, A system and method may include, in a set of search phoneme strings, at least one phoneme string based on a pre-configured distance from the textual search term. A system and method may statistically calculate a probability of a recognition error for a phoneme; and based on relating a fuzziness parameter value to the probability, select to include or exclude a phonetic transcription in a result of searching for an element in the set of phonetic transcriptions. A system and method may statistically calculate a probability of a recognition error for a phoneme; and based on relating a fuzziness parameter value to the probability, select to include the phoneme in the set of search phoneme strings).
Elisha in view of Garg do not specifically disclose to compare phoneme embeddings corresponding to each phoneme [[for]] of the input token to phoneme embeddings [[for]] of each phoneme of each content token.
 However, Jia, in the same field of endeavor, discloses to compare phoneme embeddings corresponding to each phoneme for the input token to phoneme embeddings for each phoneme of each content token (Jia, para 0013,  determining the respective grapheme token representing the respective word of sequence of words corresponding to the respective phoneme token may include determining that the respective grapheme token includes a corresponding word position embedding that matches the respective word position embedding of the respective phoneme token; [i.e., “grapheme token” as “input token” and “word position embedding of the respective phoneme token” as “content token”]).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Jia in the method of Elisha in view of Garg because this would enable the search engine return one or more search results that the device interprets to generate a response for the user (Jia, para 0036).

	Regarding Claim 10, Elisha in view of Gupta, and further in view of Jia disclose the computing system of claim 8, wherein the phonemic indexer is further configured to generate a phonemic index [[for]] a first content token corresponding to the first orthography (Elisha, para 0049, A phoneme or phonetic lattice is known in the art, for example, a phonetic lattice may be a directed acyclic graph (e.g., as described in http://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton), which represents, records and/or provides the probability of each phoneme to be output at a certain time. Accordingly, using a phoneme or phonetic lattice produced as described, sub-words, words and/or phrases may be assigned, or associated with, a certainty value, a probability value or a score or rank value. For example, a phonetic lattice may include all possible tracks of phonetic transcriptions for an audio file provided as input and, using the probabilities of words and/or phrases, a probability for each track may be calculated) and to add the phonemic index [[for]] the first content token in the first orthography to an inverted index corresponding to the first orthography in the inverted index database (Elisha, para 0050, Figure 2, Indexing unit 220 may reverse-index a set of transcriptions. For example, indexing unit 220 may tokenize a string of phonemes into one or more phoneme tokens (terms), and may construct an inverse index that maps a set of documents (e.g., recorded conversations) to tokens. For example, to invert or reverse an indexing of a set of recordings, indexing unit 220 may use any parsing rules or criteria (321) when inverting an index of a set of transcriptions. A user may select a rule or criteria and indexing unit 220 may reverse-index a set of transcriptions based on a rule received from a user (e.g., stored as shown by block 321)).

	Regarding Claim 13, Elisha in view of Gupta, and further in view of Jia disclose the computing system of claim 8, wherein the phonemic indexer is further configured to generate the phonemic index [[for]] a phonemic variant of a first content token corresponding to the second orthography (Elisha, para 0049, A phoneme or phonetic lattice is known in the art, for example, a phonetic lattice may be a directed acyclic graph …, which represents, records and/or provides the probability of each phoneme to be output at a certain time. Accordingly, using a phoneme or phonetic lattice produced as described, sub-words, words and/or phrases may be assigned, or associated with, a certainty value, a probability value or a score or rank value. For example, a phonetic lattice may include all possible tracks of phonetic transcriptions for an audio file provided as input and, using the probabilities of words and/or phrases, a probability for each track may be calculated) and to add the phonemic index of the phonemic variant [[for]] the first content token corresponding to the second orthography to an inverted index corresponding to the second orthography in the inverted index database (Elisha, para 0050, Figure 2, Indexing unit 220 may reverse-index a set of transcriptions. For example, indexing unit 220 may tokenize a string of phonemes into one or more phoneme tokens (terms), and may construct an inverse index that maps a set of documents (e.g., recorded conversations) to tokens. For example, to invert or reverse an indexing of a set of recordings, indexing unit 220 may use any parsing rules or criteria (321) when inverting an index of a set of transcriptions. A user may select a rule or criteria and indexing unit 220 may reverse-index a set of transcriptions based on a rule received from a user (e.g., stored as shown by block 321)).

	Regarding Claim 14, Elisha in view of Gupta, and further in view of Jia disclose the computing system of claim 8, further comprising:
a phonetic converter executable by the one or more hardware processors and configured to convert the input token into a phonetic representation (Elisha, para 0057, For example, and as shown by block 345, a search phrase may be translated or converted into a set of phonemes… to convert a search phrase into a set of phoneme), wherein the phonemic indexer is further configured to transcribe the phonetic representation of the input token into the phonemic index of the input token, wherein the phonemic index includes phoneme embeddings corresponding to each phoneme of the input token (Elisha, para 58, a query constructor unit may parse the query phonetic sequence using the same method used for indexing as described herein (e.g., using single phonemes or K-phonemes; Elisha, para 0038-0039, A system and method according to some embodiments of the invention include transcribing a set of speech recordings to a set of phoneme strings. A system and method according to some embodiments of the invention may include storing or including phoneme strings in a set of phonetic transcriptions. For example and as shown, a system 200 may include a phonetic indexing unit 215. For example, phonetic indexing unit 215 may be a device similar to computing device 100, e.g., including a processor and memory as described. For example, in some embodiments of the invention, phonetic indexing unit 215 extracts audio content (e.g., speech recordings) from database 210, and/or transcribes the audio content to a string of phonemes. Based on a configuration parameter that includes or indicates acoustic and/or language properties or characteristics, phonetic indexing unit 215 may generate a string of phonemes that match, or best suit, the acoustic and/or language properties of the audio content. In some embodiments, phonetic indexing unit 215 may generate phonetic transcriptions by including phoneme strings in a set of phonetic transcriptions. For example, a set of phoneme strings identified by phonetic indexing unit 215, in a recording of a conversation, may be stored, or included, in a phonetic transcription, e.g., as a file on storage system 130. According to some embodiments of the invention, a system and method may include indexing phonetic transcriptions according to one or more phoneme such that a phoneme (or set of phonemes) can be used as a search key in order to search for the phoneme (or set of phonemes) in the indexed phonetic transcriptions. For example and as shown, a system 200 may include an indexing unit 220. According to some embodiments of the invention, indexing unit 220 receives phonetic transcriptions (e.g., produced by phonetic indexing unit 215) and inverse-indexes the phonetic transcriptions such that a phoneme or set of phonemes can be searched in the revers-indexed phonetic transcriptions using the phoneme (or using a set of phonemes) as a search key. It will be noted that a term used as a search key or search term may be or may include one or more phonemes).

	Regarding Claim 15, Elisha discloses one or more tangible processor-readable storage media embodied with instructions for executing on one or more processors and circuits of a computing device a process searching an inverted index database for an input token of a search query (Elisha, para 0031, machine-readable medium, stored thereon instructions, which may be used to program a computer, controller, or other programmable devices, to perform methods as disclosed herein; Elisha, para 0010, A system and method may transcribe a textual search term into a set of search phoneme strings and use the set of search phoneme strings to conduct a set of searches; Elisha, Figure 2, elements 225 and 230, para 0039, According to some embodiments of the invention, a system and method may include indexing phonetic transcriptions according to one or more phoneme such that a phoneme (or set of phonemes) can be used as a search key in order to search for the phoneme (or set of phonemes) in the indexed phonetic transcriptions. For example and as shown, a system 200 may include an indexing unit 220. According to some embodiments of the invention, indexing unit 220 receives phonetic transcriptions (e.g., produced by phonetic indexing unit 215) and inverse-indexes the phonetic transcriptions such that a phoneme or set of phonemes can be searched in the revers-indexed phonetic transcriptions using the phoneme (or using a set of phonemes) as a search key), the process comprising:
generating a phonemic index [[for]] the input token of a first orthography of a first language (Elisha, Figure 3, element 340, para 0056-0057, As shown by block 340, a user may enter a phrase to be searched… A system and method may transcribe a textual search term into a set of search phoneme strings; [i.e., “phrase to be searched” or “textual search term” as “input token”;  “transcribing a textual search term into a set of search phoneme strings” as “generating a phonemic index for the input token””]);
executing an approximate matching analysis on content tokens of the inverted index database based on the phonemic index [[for]] the input token (Elisha, para 0053, Figure 2, Indexing unit 220 may create an inverted index or reverse-index in which terms are mapped to one or more documents, and the terms in the inverted index are sorted in ascending or descending lexicographical order, for example, an inverted index is a mapping of terms to a recording and/or to any object related to the recording (e.g., to a phonetic transcription of the recording); Elisha, 0059-0060, Figure 3, elements, 330,  355 and 360, As shown by block 355, a system and method may include … set of search phoneme strings. As shown by block 360, confusion matrix and fuzziness parameters may be used in order to expand a query or perform query permutation and fuzzy search… fuzzy search. Generally, confusion matrix, fuzziness parameters may be used to allow some form of error (or distance) between a searched phrase and the phonetic transcription that include the phrase or that include phrases which are close (or similar) to the searched phrase……For example, a confusion matrix may be trained or updated using transcriptions of relevant audio content, e.g., audio related to the same language, vocabulary or accent and as in the speech recordings in database 210; [i.e., “fuzzy search… fuzziness parameters may be used to allow some form of error (or distance)… phonetic transcription…that include phrases which are close (or similar) to the searched phrase” as “search which executes an approximate matching analysis on content tokens …on phonemic index for the input token”), 
returning one or more search results based on the approximate matching analysis (Elisha, para 0071, Figure 3, As shown by block 370, a search platform may return a list of documents wherein the list may be sorted or the documents may be ranked… In some embodiments, a phonetic ranking may be implemented such that a ranking of documents found is according to a match of their content with a phoneme. For example, as described, a textual term received from a user as a search key may be converted to a phoneme and the phoneme may be searched. A ranking unit may rank documents found based on a matching of the content in the documents to the phoneme used in the search; Elisha, para 0060, … A Fuzziness parameter may be a value that sets or determines the degree of error (distance) the user allows between the searched phrase and the indexed transcription)
Elisha does not specifically disclose wherein the inverted index database includes a first inverted index corresponding to phonemic variants of the content tokens and to the first orthography and a second inverted index corresponding to phonemic variants of the content tokens and to a second orthography of a second language.
However, Gupta, in the same field of endeavor, discloses wherein the inverted index database includes a first inverted index corresponding to phonemic variants of the content tokens and to the first orthography and a second inverted index corresponding to phonemic variants of the content tokens and to a second orthography of a second language (Gupta, page 682, col 2, 2nd para – page 683, 1st col, 3rd para, Now we describe the experimental set up for evaluating the effectiveness of the proposed method for retrieval in Mixed-Script space. 5.1 Dataset We used the FIRE 2013 shared task collection on Transliterated Search [26] for experiments and training. The dataset comprises of document collection, queryset (Q) and relevance judgments. The collection (D1) contains 62,888 documents containing song title and lyrics in Roman, Devanagari and mixed scripts. Statistics of the document collection is given in Table 2 (a). The Q contains 25 lyrics search queries for Bollywood songs in Roman script with mean query length of 4.5 words. Table 2 (b) lists a few examples of queries from Q.

    PNG
    media_image1.png
    96
    348
    media_image1.png
    Greyscale

The experimental setup is a standard adhoc retrieval setting.The document collection is first indexed to create an inverted index and the index lexicon is used as mining lexicon. Being this a lyrics retrieval set up, the sequential information among the terms is crucial for effectiveness evaluation, e.g. “love me baby” and “baby love me” are completely different songs. In order to capture the word-ordering we consider word 2-grams as a unit for indexing and retrieval. The non-trivial part of mixed-script IR is query-enrichment to handle the challenges described in Sec. 2. In order to enrich the query with equivalents, we find the equivalents of the query terms as described in Section 4.5 and the word 2-gram query is formulated as shown in [13]. We consider a variety of systems to be compared with the proposed method. The query formulation is similar for all the systems including the retrieval settings like inverted index, retrieval model and mining lexicon except the method of finding the equivalents… The problem of finding equWe ivalents is formulated as searching across the views by learning hashing functions as presented in [20]… An inverted index of hashcodes is prepared for terms in mining lexicon. The equivalents for the query term are found from this index according to the score given by the graph matching algorithm (according to the cosine similarity in the common geometric space) of [30])).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Gupta in the method of Elisha because this would present an extensive empirical analysis of the proposed method along with the evaluation results in an ad-hoc retrieval setting of mixedscript IR where the proposed method achieves significantly better results (12% increase in MRR and 29% increase in MAP) compared to other state-of-the-art baselines(Gupta, Abstract).
	Elisha in view of Gupta do not disclose comparing phoneme embeddings corresponding to each phoneme of the input token to phoneme embeddings of each phoneme of each content token.
	However, Jia, in the same field of endeavor, discloses by comparing phoneme embeddings corresponding to each phoneme of the input token to phoneme embeddings of each phoneme of each content token Jia, para 0013,  determining the respective grapheme token representing the respective word of sequence of words corresponding to the respective phoneme token may include determining that the respective grapheme token includes a corresponding word position embedding that matches the respective word position embedding of the respective phoneme token; [i.e., “grapheme token” as “input token” and “word position embedding of the respective phoneme token” as “content token”]).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Jia in the method of Elisha in view of Gupta because this would enable the search engine return one or more search results that the device interprets to generate a response for the user (Jia, para 0036).
 
	Regarding Claim 16, Elisha in view of Gupta, and further in view of Jia disclose the one or more tangible processor-readable storage media of claim 15 wherein executing the approximate matching analysis (Elisha, 0059-0060, Figure 3, elements, 330,  355 and 360, … fuzzy search. Generally, confusion matrix, fuzziness parameters may be used to allow some form of error (or distance) between a searched phrase and the phonetic transcription that include the phrase or that include phrases which are close (or similar) to the searched phrase) comprises:
generating a score for each content token based on per-phoneme similarity analysis with the input token, wherein the score represents a combination of measurements of similarity between phoneme embeddings corresponding to each phoneme of the input token compared to phoneme embeddings [[for]] of each corresponding phoneme of each content token (Elisha, para 0071, Figure 3, As shown by block 370, a search platform may return a list of documents wherein the list may be sorted or the documents may be ranked. For example, Solr provides ranking of results that may be used. In some embodiments, a phonetic ranking may be implemented such that a ranking of documents found is according to a match of their content with a phoneme. For example, as described, a textual term received from a user as a search key may be converted to a phoneme and the phoneme may be searched. A ranking unit may rank documents found based on a matching of the content in the documents to the phoneme used in the search; Elisha, para 0009, A system and method may include, in a set of search phoneme strings, at least one phoneme string based on a pre-configured distance from the textual search term. A system and method may statistically calculate a probability of a recognition error for a phoneme; and based on relating a fuzziness parameter value to the probability, select to include or exclude a phonetic transcription in a result of searching for an element in the set of phonetic transcriptions. A system and method may statistically calculate a probability of a recognition error for a phoneme; and based on relating a fuzziness parameter value to the probability, select to include the phoneme in the set of search phoneme strings).
Elisha in view of Gupta do not specifically disclose comparing phoneme embeddings corresponding to each phoneme [[for]] of the input token to phoneme embeddings [[for]] of each phoneme of each content token. 
However, Jia, in the same field of endeavor, discloses comparing phoneme embeddings corresponding to each phoneme for the input token to phoneme embeddings for each phoneme of each content token (Jia, para 0013,  determining the respective grapheme token representing the respective word of sequence of words corresponding to the respective phoneme token may include determining that the respective grapheme token includes a corresponding word position embedding that matches the respective word position embedding of the respective phoneme token; [i.e., “grapheme token” as “input token” and “word position embedding of the respective phoneme token” as “content token”]).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Jia in the method of Elisha in view of Gupta because this would enable the search engine return one or more search results that the device interprets to generate a response for the user (Jia, para 0036).

	Regarding Claim 17, Elisha in view of Gupta, and further in view of Jia disclose the one or more tangible processor-readable storage media of claim 15, wherein the process further comprises:
generating a phonemic index [[for]] of a first content token corresponding to the first orthography (Elisha, para 0049, A phoneme or phonetic lattice is known in the art, for example, a phonetic lattice may be a directed acyclic graph (e.g., as described in http://en.wikipedia.org/wiki/Deterministic_acyclic_finite_state_automaton), which represents, records and/or provides the probability of each phoneme to be output at a certain time. Accordingly, using a phoneme or phonetic lattice produced as described, sub-words, words and/or phrases may be assigned, or associated with, a certainty value, a probability value or a score or rank value. For example, a phonetic lattice may include all possible tracks of phonetic transcriptions for an audio file provided as input and, using the probabilities of words and/or phrases, a probability for each track may be calculated); and
adding the phonemic index [[for]] of the first content token in the first orthography to an inverted index corresponding to the first orthography in the inverted index database (Elisha, para 0050, Figure 2, Indexing unit 220 may reverse-index a set of transcriptions. For example, indexing unit 220 may tokenize a string of phonemes into one or more phoneme tokens (terms), and may construct an inverse index that maps a set of documents (e.g., recorded conversations) to tokens. For example, to invert or reverse an indexing of a set of recordings, indexing unit 220 may use any parsing rules or criteria (321) when inverting an index of a set of transcriptions. A user may select a rule or criteria and indexing unit 220 may reverse-index a set of transcriptions based on a rule received from a user (e.g., stored as shown by block 321)).

	Regarding Claim 20, Elisha in view of Gupta, and further in view of Jia disclose the one or more tangible processor-readable storage media of claim 15, wherein the process further comprises:
generating the phonemic index [[for]] of a phonemic variant of a first content token corresponding to the second orthography (Elisha, para 0049, A phoneme or phonetic lattice is known in the art, for example, a phonetic lattice may be a directed acyclic graph …, which represents, records and/or provides the probability of each phoneme to be output at a certain time. Accordingly, using a phoneme or phonetic lattice produced as described, sub-words, words and/or phrases may be assigned, or associated with, a certainty value, a probability value or a score or rank value. For example, a phonetic lattice may include all possible tracks of phonetic transcriptions for an audio file provided as input and, using the probabilities of words and/or phrases, a probability for each track may be calculated); and
adding the phonemic index of the phonemic variant [[for]] the first content token corresponding to the second orthography to an inverted index corresponding to the second orthography in the inverted index database (Elisha, para 0050, Figure 2, Indexing unit 220 may reverse-index a set of transcriptions. For example, indexing unit 220 may tokenize a string of phonemes into one or more phoneme tokens (terms), and may construct an inverse index that maps a set of documents (e.g., recorded conversations) to tokens. For example, to invert or reverse an indexing of a set of recordings, indexing unit 220 may use any parsing rules or criteria (321) when inverting an index of a set of transcriptions. A user may select a rule or criteria and indexing unit 220 may reverse-index a set of transcriptions based on a rule received from a user (e.g., stored as shown by block 321)).  


Regarding Claim 21, Elisha in view of Gupta, and further in view of Jia disclose the method of claim 1.
Furthermore, Jia teaches:
wherein the phonemic index of the input token includes phoneme embeddings corresponding to each phoneme of the input token (Jia, para 0006, each token of the plurality of tokens of the input encoder embedding represents a combination of one of a grapheme token embedding or a phoneme token embedding, a segment embedding, a word position embedding, and/or a position embedding. In these examples, identifying the respective word of the sequence of words corresponding to the respective phoneme token may include identifying the respective word of the sequence of words corresponding to the respective phoneme token based on a respective word position embedding associated with the respective phoneme token. Here, determining the respective grapheme token representing the respective word of sequence of words corresponding to the respective phoneme token may include determining that the respective grapheme token includes a corresponding word position embedding that matches the respective word position embedding of the respective phoneme token).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Jia in the method of Elisha in view of Garg because this would enable the search engine return one or more search results that the device interprets to generate a response for the user (Jia, para 0036).


Claims 4, 11 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Elisha et al. Pat App No. US 20160275945 A1 (Elisha) in view of Gupta, and further in view of Jia, and further in view of Robertson Pat App No. US 20110106792 A1 (Robertson).

	Regarding Claim 4, Elisha in view of Gupta, and further in view of Jia disclose the method of claim 
	Gupta further discloses:  
generating a phonemic variant of a first content token corresponding to the second orthography (Gupta, page 681, 4th para, Phonemes of the language can be captured by the character n-grams. Consider the feature set F = {f1, . . . , fK} containing character grams of scripts si for all i ∈ {1, ., r} and |F| = K…; Gupta, page 682, 2nd para, the size of phonemes (captured by the character n-grams) for a language is finite and fairly small (only for languages with finite set of alphabets e.g. English with alphabets a-z). Hence, enough evidence for all the phonemes is found even in a small to moderate size training data, which increases the suitability of our approach to the problem; Gupta, page 685, 3rd  para, A similar method that uses both stemming and grapheme-to-phoneme conversion is used by [25] to develop a proof-of-concept for a multilingual search engine for 10 Indian languages).
Elisha in view of Gupta and Jia does not specifically disclose using a neural phonemic translation machine learning model trained using phonemic index pairs corresponding to the first language and the second language. 
However, Robertson, in the same field of endeavor, discloses using a neural phonemic translation machine learning model trained using phonemic index pairs corresponding to the first language and the second language (Robertson, para 0036-0046, In a preferred embodiment, an approach developed by Kevin Lenzo and Vincent Pagel at Carnegie Melon University (CMU) department of linguistics is used, which uses a phonetic dictionary for direct look-up of phonetic transcriptions whenever possible, but falls back on a set of transcription rules to handle out of dictionary cases. The CMU pronouncing dictionary is publicly available and downloadable from the Internet and is a machine readable pronunciation dictionary for North American English that contains 125,000 words and names together with their phonetic transcriptions in the phoneme set shown in FIG. 1. For words that are not in this dictionary, phonetic transcription is handled by using the large number of transcriptions provided within the dictionary to train a decision tree that is capable of making an accurate determination of the likely pronunciation of any unincluded words. The process for training a decision tree in this manner is described in detail in Pagel V., Lenzo K., and Black A. W. (1998) "Letter to sound rules for accented lexicon compression." Proc. ICSLP, Sydney, Australia, the contents of which are incorporated herein by reference. The CMU pronouncing dictionary comes with a number of Perl scripts which can be used for constructing text-to-phoneme (TTP) decision trees using the Iterative Dichotomiser 3 (ID3) tree learning algorithm... In a preferred embodiment, once the decision tree has been constructed, each word from the original phonetic dictionary is run through the decision tree and compared with the predicted pronunciation within the dictionary. Any words that are correctly predicted by the tree can be eliminated from the dictionary, resulting in a smaller dictionary containing only those words transcribed incorrectly...FIG. 2 illustrates the CMU Dictionary 20, which is used to train the transcription decision tree 21. Once the decision tree is finalised, the words in the CMU Dictionary are passed through the decision tree. Those words which the decision tree correctly transcribes are eliminated from the reduced dictionary. Only those words in the CMU that are not correctly transcribed are retained to form the reduced dictionary 22. In use for transcription, a word 23 is input and the reduced dictionary 22 is first checked. If the input word is in the reduced dictionary, the corresponding phoneme string 24 is output. If the input word is not in the reduced dictionary, the decision tree 21 is then used to transcribe the input word into a string of phonemes 24. The original CMU pronouncing dictionary is 3,507 kb in size, whereas the corresponding decision tree and reduced dictionary occupy just 723 kb and 546 kb respectively. Eliminating words from the dictionary that are correctly transcribed by the decision tree therefore saves considerable memory resources. However, as an alternative, phonetic transcription can be carried out simply by using a phonetic dictionary such as the CMU dictionary. As a further alternative, phonetic transcription can be carried out solely using a decision tree, without reference to any form of dictionary. The choice of dictionary employed for lookup or decision tree training is typically determined by the language being used. Many machine-readable phonetic dictionaries are freely available for download on the internet. Alternatives include BOMP (German), Lexique (French) and the MBRDICO project which provides dictionary resources for several languages. The Unicode Unihan database provides detailed properties and pronunciations for the characters used in Chinese, Japanese and Korean orthography…In order to support phonetic indexing and retrieval of words and names from a database, the present invention incorporates a system for the generation of phonetic index keys. These keys are based on the phoneme transcriptions described above with reference to FIG. 2).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Robertson in the method of Elisha in view of Gupta and Jia because this would enable this invention provides a system and method for ranking or scoring the degree of similarity between two words based on a comparison of phonemes (Robertson, para 0002).

Regarding Claim 11, Elisha in view of Gupta, and further in view of Jia disclose the computing system of claim 8.
	Gupta further discloses:
to generate a phonemic variant of a first content token corresponding to the second orthography (Gupta, page 681, 4th para, Phonemes of the language can be captured by the character n-grams. Consider the feature set F = {f1, . . . , fK} containing character grams of scripts si for all i ∈ {1, ., r} and |F| = K…; Gupta, page 682, 2nd para, the size of phonemes (captured by the character n-grams) for a language is finite and fairly small (only for languages with finite set of alphabets e.g. English with alphabets a-z). Hence, enough evidence for all the phonemes is found even in a small to moderate size training data, which increases the suitability of our approach to the problem; Gupta, page 685, 3rd  para, A similar method that uses both stemming and grapheme-to-phoneme conversion is used by [25] to develop a proof-of-concept for a multilingual search engine for 10 Indian languages). 
Elisha in view of Gupta and Jia does not specifically disclose a neural phonemic translation machine learning model executable by the one or more hardware processors and configured …wherein the neural phonemic translation machine learning model is trained using phonemic index pairs corresponding to the first language and the second language. 
However, Robertson, in the same field of endeavor, discloses a neural phonemic translation machine learning model executable by the one or more hardware processors and configured …wherein the neural phonemic translation machine learning model is trained using phonemic index pairs corresponding to the first language and the second language ( Robertson, para 0055, These elements are implemented as software modules running on one or more computer processors that are coupled to the name database 40 and key index 41; Robertson, para 0036-0046, In a preferred embodiment, an approach developed by Kevin Lenzo and Vincent Pagel at Carnegie Melon University (CMU) department of linguistics is used, which uses a phonetic dictionary for direct look-up of phonetic transcriptions whenever possible, but falls back on a set of transcription rules to handle out of dictionary cases. The CMU pronouncing dictionary is publicly available and downloadable from the Internet and is a machine readable pronunciation dictionary for North American English that contains 125,000 words and names together with their phonetic transcriptions in the phoneme set shown in FIG. 1. For words that are not in this dictionary, phonetic transcription is handled by using the large number of transcriptions provided within the dictionary to train a decision tree that is capable of making an accurate determination of the likely pronunciation of any unincluded words. The process for training a decision tree in this manner is described in detail in Pagel V., Lenzo K., and Black A. W. (1998) "Letter to sound rules for accented lexicon compression." Proc. ICSLP, Sydney, Australia, the contents of which are incorporated herein by reference. The CMU pronouncing dictionary comes with a number of Perl scripts which can be used for constructing text-to-phoneme (TTP) decision trees using the Iterative Dichotomiser 3 (ID3) tree learning algorithm... In a preferred embodiment, once the decision tree has been constructed, each word from the original phonetic dictionary is run through the decision tree and compared with the predicted pronunciation within the dictionary. Any words that are correctly predicted by the tree can be eliminated from the dictionary, resulting in a smaller dictionary containing only those words transcribed incorrectly...FIG. 2 illustrates the CMU Dictionary 20, which is used to train the transcription decision tree 21. Once the decision tree is finalised, the words in the CMU Dictionary are passed through the decision tree. Those words which the decision tree correctly transcribes are eliminated from the reduced dictionary. Only those words in the CMU that are not correctly transcribed are retained to form the reduced dictionary 22. In use for transcription, a word 23 is input and the reduced dictionary 22 is first checked. If the input word is in the reduced dictionary, the corresponding phoneme string 24 is output. If the input word is not in the reduced dictionary, the decision tree 21 is then used to transcribe the input word into a string of phonemes 24. The original CMU pronouncing dictionary is 3,507 kb in size, whereas the corresponding decision tree and reduced dictionary occupy just 723 kb and 546 kb respectively. Eliminating words from the dictionary that are correctly transcribed by the decision tree therefore saves considerable memory resources. However, as an alternative, phonetic transcription can be carried out simply by using a phonetic dictionary such as the CMU dictionary. As a further alternative, phonetic transcription can be carried out solely using a decision tree, without reference to any form of dictionary. The choice of dictionary employed for lookup or decision tree training is typically determined by the language being used. Many machine-readable phonetic dictionaries are freely available for download on the internet. Alternatives include BOMP (German), Lexique (French) and the MBRDICO project which provides dictionary resources for several languages. The Unicode Unihan database provides detailed properties and pronunciations for the characters used in Chinese, Japanese and Korean orthography…In order to support phonetic indexing and retrieval of words and names from a database, the present invention incorporates a system for the generation of phonetic index keys. These keys are based on the phoneme transcriptions described above with reference to FIG. 2).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Robertson in the method of Elisha in view of Gupta and Jia because this would enable this invention provides a system and method for ranking or scoring the degree of similarity between two words based on a comparison of phonemes (Robertson, para 0002).

	Regarding Claim 18, Elisha in view of Gupta, and further in view of Jia disclose the one or more tangible processor-readable storage media of claim 15.
	Gupta further discloses:
generating a phonemic variant of a first content token corresponding to the second orthography (Gupta, page 681, 4th para, Phonemes of the language can be captured by the character n-grams. Consider the feature set F = {f1, . . . , fK} containing character grams of scripts si for all i ∈ {1, ., r} and |F| = K…; Gupta, page 682, 2nd para, the size of phonemes (captured by the character n-grams) for a language is finite and fairly small (only for languages with finite set of alphabets e.g. English with alphabets a-z). Hence, enough evidence for all the phonemes is found even in a small to moderate size training data, which increases the suitability of our approach to the problem; Gupta, page 685, 3rd  para, A similar method that uses both stemming and grapheme-to-phoneme conversion is used by [25] to develop a proof-of-concept for a multilingual search engine for 10 Indian languages). 
Elisha in view of Gupta and Jia does not specifically disclose using a neural phonemic translation machine learning model trained using phonemic index pairs corresponding to the first language and the second language 
However, Robertson, in the same field of endeavor, discloses using a neural phonemic translation machine learning model trained using phonemic index pairs corresponding to the first language and the second language 0036-0046, In a preferred embodiment, an approach developed by Kevin Lenzo and Vincent Pagel at Carnegie Melon University (CMU) department of linguistics is used, which uses a phonetic dictionary for direct look-up of phonetic transcriptions whenever possible, but falls back on a set of transcription rules to handle out of dictionary cases. The CMU pronouncing dictionary is publicly available and downloadable from the Internet and is a machine readable pronunciation dictionary for North American English that contains 125,000 words and names together with their phonetic transcriptions in the phoneme set shown in FIG. 1. For words that are not in this dictionary, phonetic transcription is handled by using the large number of transcriptions provided within the dictionary to train a decision tree that is capable of making an accurate determination of the likely pronunciation of any unincluded words. The process for training a decision tree in this manner is described in detail in Pagel V., Lenzo K., and Black A. W. (1998) "Letter to sound rules for accented lexicon compression." Proc. ICSLP, Sydney, Australia, the contents of which are incorporated herein by reference. The CMU pronouncing dictionary comes with a number of Perl scripts which can be used for constructing text-to-phoneme (TTP) decision trees using the Iterative Dichotomiser 3 (ID3) tree learning algorithm... In a preferred embodiment, once the decision tree has been constructed, each word from the original phonetic dictionary is run through the decision tree and compared with the predicted pronunciation within the dictionary. Any words that are correctly predicted by the tree can be eliminated from the dictionary, resulting in a smaller dictionary containing only those words transcribed incorrectly...FIG. 2 illustrates the CMU Dictionary 20, which is used to train the transcription decision tree 21. Once the decision tree is finalised, the words in the CMU Dictionary are passed through the decision tree. Those words which the decision tree correctly transcribes are eliminated from the reduced dictionary. Only those words in the CMU that are not correctly transcribed are retained to form the reduced dictionary 22. In use for transcription, a word 23 is input and the reduced dictionary 22 is first checked. If the input word is in the reduced dictionary, the corresponding phoneme string 24 is output. If the input word is not in the reduced dictionary, the decision tree 21 is then used to transcribe the input word into a string of phonemes 24. The original CMU pronouncing dictionary is 3,507 kb in size, whereas the corresponding decision tree and reduced dictionary occupy just 723 kb and 546 kb respectively. Eliminating words from the dictionary that are correctly transcribed by the decision tree therefore saves considerable memory resources. However, as an alternative, phonetic transcription can be carried out simply by using a phonetic dictionary such as the CMU dictionary. As a further alternative, phonetic transcription can be carried out solely using a decision tree, without reference to any form of dictionary. The choice of dictionary employed for lookup or decision tree training is typically determined by the language being used. Many machine-readable phonetic dictionaries are freely available for download on the internet. Alternatives include BOMP (German), Lexique (French) and the MBRDICO project which provides dictionary resources for several languages. The Unicode Unihan database provides detailed properties and pronunciations for the characters used in Chinese, Japanese and Korean orthography…In order to support phonetic indexing and retrieval of words and names from a database, the present invention incorporates a system for the generation of phonetic index keys. These keys are based on the phoneme transcriptions described above with reference to FIG. 2).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Robertson in the method of Elisha in view of Gupta and Jia because this would enable this invention provides a system and method for ranking or scoring the degree of similarity between two words based on a comparison of phonemes (Robertson, para 0002).

Claims 5, 12 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Elisha in view of Gupta, and further in view of Jia, further in view of Robertson, and further in view of Lawrence Pat App No. GB 2343037 A.

	Regarding Claim 5, Elisha in view of Gupta, and further in view of Jia and Robertson disclose the method of claim 4.
	Elisha in view of Gupta, Jia and Robertson do not specifically disclose biasing the phonemic variant of the first content token toward pronunciation of the second orthography.
However, Lawrence, in the same field of endeavor, discloses biasing the phonemic variant of the first content token toward pronunciation of the second orthography (Lawrence, 3rd page, 10th para – 4th page, 3rd para, Using the sorted weightings table the most likely pronunciation is constructed by using the first phonemic variant for each cluster. For “woz” this is "w o z” . A search of the dictionary, using the pronunciation as the key, finds that the orthography pronounced "w o z" is "was". This spelling is saved in the list of possible words presented to the end user. The sorted extract of the weightings table is used to find the next most likely pronunciation. The second entry contains oh "for the second cluster. The pronunciation of "w oh z" is spelt "woes". The third entry contains "E" (schwa). There is no entry in the dictionary for the pronunciation "w E z". When the seventh entry in the table is reached it contains the second phonemic variant for llwll…  when the twelfth entry is reached it contains the second phonemic variant for nz". The spelling generator constructs a possible pronunciation using entry 4 for"w", entry 1 for"o"and entry 12 for "z"thus creating w o s". when the twelfth entry is reached it contains the second phonemic variant for nz". The spelling generator constructs a possible pronunciation using entry 4 for"w", entry 1 for"o"and entry 12 for "z"thus creating w o s". There is no entry in the dictionary for nw o sn. The next entries are generated by using"s"for z and all the entries up to entry 12. when entry 14 is reached it contains the third phonemic variant for nWn. The spelling generator uses"v"for the pronunciation of nwn and then uses each variant for the second and third clusters in the order they appear in the table; [“weightings” as “biases”]).
	Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Lawrence in the method of Elisha in view of Gupta, Jia and Robertson because this would enable searching for possible spellings in the dictionary such that the spellings from the pronunciations with the greatest (heavier) weightings are selected before those with lesser (lighter) weightings (Lawrence, 4th page, 4th para).

	Regarding Claim 12, Elisha in view of Gupta, and further in view of Jia and Robertson disclose the computing system of claim 11.
	Elisha in view of Gupta and Jia and Robertson do not specifically disclose wherein the phonemic indexer is further configured to bias the phonemic variant of the first content token toward pronunciation of the second orthography.
	However, Lawrence, in the same field of endeavor, discloses wherein the phonemic indexer is further configured to bias the phonemic variant of the first content token toward pronunciation of the second orthography (Lawrence, 3rd page, 10th para – 4th page, 3rd para, Using the sorted weightings table the most likely pronunciation is constructed by using the first phonemic variant for each cluster. For “woz” this is "w o z” . A search of the dictionary, using the pronunciation as the key, finds that the orthography pronounced "w o z" is "was". This spelling is saved in the list of possible words presented to the end user. The sorted extract of the weightings table is used to find the next most likely pronunciation. The second entry contains oh "for the second cluster. The pronunciation of "w oh z" is spelt "woes". The third entry contains "E" (schwa). There is no entry in the dictionary for the pronunciation "w E z". When the seventh entry in the table is reached it contains the second phonemic variant for llwll…  when the twelfth entry is reached it contains the second phonemic variant for nz". The spelling generator constructs a possible pronunciation using entry 4 for"w", entry 1 for"o"and entry 12 for "z"thus creating w o s". when the twelfth entry is reached it contains the second phonemic variant for nz". The spelling generator constructs a possible pronunciation using entry 4 for"w", entry 1 for"o"and entry 12 for "z"thus creating w o s". There is no entry in the dictionary for nw o sn. The next entries are generated by using"s"for z and all the entries up to entry 12. when entry 14 is reached it contains the third phonemic variant for nWn. The spelling generator uses"v"for the pronunciation of nwn and then uses each variant for the second and third clusters in the order they appear in the table; [“weightings” as “biases”]).
	Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Lawrence in the method of Elisha in view of Gupta, Jia and Robertson because this would enable searching for possible spellings in the dictionary such that the spellings from the pronunciations with the greatest (heavier) weightings are selected before those with lesser (lighter) weightings (Lawrence, 4th page, 4th para).

	Regarding Claim 19, Elisha in view of Gupta, and further in view of Jia and Robertson disclose the one or more tangible processor-readable storage media of claim 18.
	Elisha in view of Gupta, Jia and Robertson do not specifically disclose biasing the phonemic variant of the first content token toward pronunciation of the second orthography.
However, Lawrence, in the same field of endeavor, discloses biasing the phonemic variant of the first content token toward pronunciation of the second orthography (Lawrence, 3rd page, 10th para – 4th page, 3rd para, Using the sorted weightings table the most likely pronunciation is constructed by using the first phonemic variant for each cluster. For “woz” this is "w o z” . A search of the dictionary, using the pronunciation as the key, finds that the orthography pronounced "w o z" is "was". This spelling is saved in the list of possible words presented to the end user. The sorted extract of the weightings table is used to find the next most likely pronunciation. The second entry contains oh "for the second cluster. The pronunciation of "w oh z" is spelt "woes". The third entry contains "E" (schwa). There is no entry in the dictionary for the pronunciation "w E z". When the seventh entry in the table is reached it contains the second phonemic variant for llwll…  when the twelfth entry is reached it contains the second phonemic variant for nz". The spelling generator constructs a possible pronunciation using entry 4 for"w", entry 1 for"o"and entry 12 for "z"thus creating w o s". when the twelfth entry is reached it contains the second phonemic variant for nz". The spelling generator constructs a possible pronunciation using entry 4 for"w", entry 1 for"o"and entry 12 for "z"thus creating w o s". There is no entry in the dictionary for nw o sn. The next entries are generated by using"s"for z and all the entries up to entry 12. when entry 14 is reached it contains the third phonemic variant for nWn. The spelling generator uses"v"for the pronunciation of nwn and then uses each variant for the second and third clusters in the order they appear in the table; [“weightings” as “biases”])).
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Lawrence in the method of Elisha in view of Gupta, Jia and Robertson because this would enable searching for possible spellings in the dictionary such that the spellings from the pronunciations with the greatest (heavier) weightings are selected before those with lesser (lighter) weightings (Lawrence, 4th page, 4th para).


Claim 22 is rejected under 35 U.S.C. 103 as being unpatentable over Elisha in view of Gupta, and further in view of Jia, further in view of Lee et al. 2021, "Phonetic Variation Modeling and a Language Model Adaptation for Korean English Code-Switching Speech Recognition" Applied Sciences 11, no. 6 (Lee).


Regarding Claim 22, Elisha in view of Gupta, and further in view of Jia disclose the method of claim 1.
Elisha in view of Gupta, Jia and Robertson do not specifically disclose wherein each phonemic variant is generated by a neural phonemic translation machine learning model trained using phonemic index pairs corresponding to the first language and the second language.  
However, Lee, in the same field of endeavor, discloses wherein each phonemic variant is generated by a neural phonemic translation machine learning model trained using phonemic index pairs corresponding to the first language and the second language (Lee, 2nd page, Figure 1. Figure caption, an application of the proposed method: English–Korean automatic speech translator; Lee, 4th page, 1st para, Unlike other languages, Konglish seems to be more severe in its phonetic variations. We should consider the phonetic variations between English pronounced by a Korean who has difficulty speaking English and English pronounced by a Korean who speaks English at a native-like level; Lee, 9th page, 5th para, The base LM was trained based on recurrent neural network (RNN) LM using Korean and English text corpora. The dev. Set consisted of 168 Economy sentences and 1895 lecture sentences. Other parameters and hardware settings are described in Table 4. These are based on the hyperparameters of Librispeech, with some values adjusted; Recently, bidirectional encoder representations from transformers (BERT) [26] represented the semantic relationships of words in an embedding space well… This idea can apply to English–Korean automatic speech translator in Figure 1).  
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to incorporate the method of Lawrence in the method of Elisha in view of Gupta, Jia and Robertson because this would enable the application of English–Korean automatic speech translator in Figure 1 and the recognition rate between English and Korean improves via the model with the LM domain adaptation (Lee, 8th page, 4th para).


Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MULUGETA T. DUGDA whose telephone number is (703)756-1106. The examiner can normally be reached Mon - Fri, 4:30am - 7:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached at 571-270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MULUGETA TUJI DUGDA/Examiner, Art Unit 2653                                                                                                                                                                                                        
/DOUGLAS GODBOLD/Primary Examiner, Art Unit 2655
Read full office action
Prosecution Timeline

May 25, 2023
Application Filed
Jun 25, 2025
Non-Final Rejection mailed — §101, §103
Sep 04, 2025
Interview Requested
Sep 12, 2025
Examiner Interview Summary
Sep 12, 2025
Applicant Interview (Telephonic)
Sep 25, 2025
Response Filed
Dec 16, 2025
Final Rejection mailed — §101, §103
Feb 17, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

18/041,756
Patent 12620387
VOICE GENERATION METHOD AND APPARATUS, DEVICE, AND COMPUTER READABLE MEDIUM
3y 2m to grant Granted May 05, 2026
17/912,112
Patent 12597424
METHOD AND APPARATUS FOR DETERMINING SKILL FIELD OF DIALOGUE TEXT
3y 6m to grant Granted Apr 07, 2026
17/912,912
Patent 12592244
REDUCED-BANDWIDTH SPEECH ENHANCEMENT WITH BANDWIDTH EXTENSION
3y 6m to grant Granted Mar 31, 2026
17/662,896
Patent 12579366
DEVELOPMENT PLATFORM FOR FACILITATING THE OPTIMIZATION OF NATURAL-LANGUAGE-UNDERSTANDING SYSTEMS
3y 10m to grant Granted Mar 17, 2026
18/015,732
Patent 12573417
A COMPUTER-IMPLEMENTED METHOD OF PROVIDING DATA FOR AN AUTOMATED BABY CRY ASSESSMENT
3y 1m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
82%
Grant Probability
99%
With Interview (+19.3%)
2y 11m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 50 resolved cases by this examiner. Grant probability derived from career allowance rate.