DETAILED ACTION
Notice of AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Elections/Restrictions
Applicants election with traverse of claims 21-40 in the reply filed on 1/27/2026 is acknowledged. Claims 86-129 have been withdrawn from further consideration pursuant to 37 CFR 1.142 (b), as being drawn to the nonelected group.
Applicant argued in pages 21-23, that the restriction requirement as improper under MPEP §§802.01, 803, and 806.05(c). Applicant argued that all claims are directed to a single general inventive concept such as the use of domain-adapted embedding vectors to improve semantic processing in domain-specific databases. Examiner respectfully disagrees. Invention 1, 2 and 3 are using embedding vector, but they are performing different processes to achieve different goal. As the applicant stated, invention 1 is related to the generation of domain adapted vector, invention 2 is related to the building, updating a domain specific dictionary of embedding vectors and invention 3 is related to the application of embeddings for semantic query at a domain specific database using embedding vector. The diagrams and the paragraphs in the specifications are also separated as 3 inventions as stated above. Such as Figs 2A, 2B, specifications, para.[0045]-[0065] illustrates invention 2, which is related to the building, updating a domain specific dictionary of embedding vectors, with a heading “ The domain dictionary building system”. Figs. 3A-4B, specifications, para.[0066]-[0082] illustrates invention 1, which is generation of domain adapted vector, with a heading “The Domain-Adapted Embedding Vector Generator”. Figs. 5A , 5B, specifications, para. [0083]-[0108], illustrates invention 3, which is related to the application of embeddings for semantic query at a domain specific database using embedding vector, with a heading “Using Domain-Adapted Embedding Vectors in Semantic Search”. All of them are using embedding vector, but the inventions are different.
Applicant further argued that, Invention I and Invention 2 are classified under G06F16/3344 in the previous office action of restriction, evidencing shared subject matter. G06F16/3344 is related to “using natural language analysis”, which is very broad classification and doesn’t make the inventions same.
Applicant further argued that the inventions are linked under MPEP § 806.05(c), which states that restriction is improper where claims are directed to a combination and subcombination. Examiner respectfully disagrees. Applicant amended the claims 21 and 86 after the restriction to relate invention 1 and 2 as combination and subcombination. Since the amendment is done after the restriction, restriction issued in previous office action is proper.
Applicant also argued that requiring restriction would result in duplicative examination of the same subject matter across multiple divisional applications, contrary to MPEP § 803, which discourages restriction when inventions are not "independent and distinct." But since Invention 1, 2 and 3 are distinct because they have a materially different design, mode of operation, function or effect, they do not overlap in scope and are not obvious variants and there would be a serious search and/or examination burden if restriction were not required, based on MPEP 806.05(j), examiner believes the election requirement is still deemed proper and is therefore made FINAL.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 21-40 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.
The Independent claim 21 recites “receiving an input phrase for generating a domain-adapted embedding vector for the phrase”; “scanning the input phrase to identify one or more domain-specific terms from the input phrase that are included in a domain-specific dictionary”; “obtaining, from the domain-specific dictionary, one or more domain-adapted embedding vectors for each domain-specific term of the one or more domain-specific terms from the domain- specific dictionary”; “generating a generic embedding vector for the input phrase using a large language model, wherein the generic embedding vector is configured for storage in a domain-specific dictionary and subsequent use in semantic query processing”; “and combining the generic embedding vector with the one or more domain-adapted embedding vectors to provide the domain-adapted embedding vector for the input phrase”. The limitations above as drafted, is a process that, under its broadest reasonable interpretation, covers a mental process, as this could be performed in the human mind or with the aid of pen and paper.
The limitation of " receiving ... ", "scanning ... ", "obtaining ... ",” generating…”, “combining…” as drafted covers mental activities. More specifically, a human can receive an input phrase or sentence, can find out the specific domain or area for those phrase or sentence, can generate an embedded vector which can be a number or format or certain way to represent the phrase in certain domain, can identify more similar term, can locate similar embedding vector from a database or dictionary which can be any type of written documentation, can generate a generic vector by using certain knowledge repository, which can be a human or book and by previously identified similar vectors, save it for further use and combine the generic and previously identified similar vector to provide a domain specific vector for the input. The above steps, as drafted, is a process that under its broadest reasonable interpretation, covers performance of the limitation in the mind. There is, nothing in the claim element precludes the step from practically being performed in the human mind. Additionally, the mere nominal recitation of a generic computer appliance does not take the claim limitation out of the mental processes grouping. Thus, the claim recites a mental process.
The claim recites the additional limitation of “large language model”, for performing the method, which is recited at a high level of generality and are recited as performing generic computer functions routinely used in computer applications. Throughout the specification LLM or large language model is recited as generic component and not sufficient to amount to significantly more than the judicial exception. All those are recited at a high level of generality and are recited as performing generic computer functions routinely used in computer applications. This is no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
Thus, taken alone, the additional elements do not amount to significantly more than the above identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds
nothing that is not already present when looking at the elements taken individually. There is no indication
that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Claim 21 is therefore not drawn to eligible subject matter as this is directed to an abstract idea without
significantly more than the abstract idea.
The Independent claims 30, 37 recite “receiving an input phrase for generating a domain-adapted embedding vector for the phrase”; “scanning the input phrase to identify one or more domain-specific terms from the input phrase that are included in a domain-specific dictionary”; “obtaining, from the domain-specific dictionary, one or more domain-adapted embedding vectors for each domain-specific term of the one or more domain-specific terms from the domain- specific dictionary”; “generating a generic embedding vector for the input phrase using a large language model;” “and combining the generic embedding vector with the one or more domain-adapted embedding vectors to provide the domain-adapted embedding vector for the input phrase”. The limitations above as drafted, is a process that, under its broadest reasonable interpretation, covers a mental process, as this could be performed in the human mind or with the aid of pen and paper.
The limitations of " receiving ... ", "scanning ... ", "obtaining ... ",” generating…”, “combining…” as drafted cover mental activities. More specifically, a human can receive an input phrase or sentence, can find out the specific domain or area for those phrase or sentence, can generate an embedded vector which can be a number or format or certain way to represent the phrase in certain domain, can identify more similar term, can locate similar embedding vector from a database or dictionary which can be any type of written documentation, can generate a generic vector by using certain knowledge repository, which can be a human or book and by previously identified similar vectors and combine the generic and previously identified similar vector to provide a domain specific vector for the input. The above steps, as drafted, is a process that under its broadest reasonable interpretation, covers performance of the limitation in the mind. There is, nothing in the claim element precludes the step from practically being performed in the human mind. Additionally, the mere nominal recitation of a generic computer appliance does not take the claim limitation out of the mental processes grouping. Thus, the claim recites a mental process.
The claims recite the additional limitations of “non-transitory computer-readable storage medium”, “large language model”. Claim 37 recites the additional limitations of “a computing device” and “a computer-readable storage device” for performing the method. All those are recited at a high level of generality and are recited as performing generic computer functions routinely used in computer applications. Throughout the specification LLM or large language model is recited as generic component and not sufficient to amount to significantly more than the judicial exception. All those are recited at a high level of generality and are recited as performing generic computer functions routinely used in computer applications. This is no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea.
Thus, taken alone, the additional elements do not amount to significantly more than the above identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds
nothing that is not already present when looking at the elements taken individually. There is no indication
that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Claims 30, 37 are therefore not drawn to eligible subject matter as this is directed to an abstract idea without
significantly more than the abstract idea.
Claims 22, 31, 38 recite “wherein scanning the input phrase comprises: tokenizing the input phrase in sequences of tokens; and determining whether the sequences of tokens match domain-specific terms in the domain specific dictionary”, breaking down the input phrase as token sequence and finding out the matching based on tokens could be performed in the human mind or with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 22, 31, 38 do not recite any additional limitations. The claims as drafted, are not patent eligible.
Claims 23, 32, 39 recite “wherein the generic embedding vector is a non-domain adapted embedding vector generated for the input phrase as a whole”, to find out that the generic embedding vector is a non-domain vector is an evaluation, observation and could be performed in the human mind or with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 23, 32, 39 do not recite any additional limitations. The claims as drafted, are not patent eligible.
Claims 24, 33, 40 recite “wherein generating the generic embedding vector comprises: updating the input phrase by removing the domain-specific term or substituting the domain-specific term for generic replacements corresponding to a generic definition or description of the respective domain-specific term; and generating the generic embedding vector based on the updated input phrase ”, updating the input phrase by removing or substituting the domain specific term, could be performed in the human mind or with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 24, 33, 40 do not recite any additional limitations. The claims as drafted, are not patent eligible.
Claim 25, 34 recite “wherein combining the generic embedding vector with the one or more domain-adapted embedding vectors comprises: performing a mean vector calculation of the generic embedding vector with the one or more domain-adapted embedding vectors to provide the domain-adapted embedding vector”, performing a calculation for combining two vectors could be performed with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 25, 34 do not recite any additional limitations. The claims as drafted, are not patent eligible.
Claim 26, 35 recite “wherein the mean vector calculation includes applying of weight factors to respective vectors based on frequency of occurrence within the phrase”, defining how the calculation need to be performed, could be performed in the human mind or with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 26, 35 do not recite any additional limitations. The claims as drafted, are not patent eligible.
Claim 27, 36 recite “building the domain-specific dictionary comprising: identifying a list of domain-specific terms; for each domain-specific term, gathering textual content comprising a description or a definition of the respective domain-specific term; and calculating, using a pre-trained language model, a domain-adapted embedding vector based on the textual content gathered for the domain-specific term; and storing the domain-specific term and the domain-adapted embedding vector into the domain-specific dictionary”, to determine how to construct a dictionary by having a list, by gathering specific documentation, calculating a vector, saving them in the dictionary, could be performed with the aid of pen and paper. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claims 27, 36 do not recite any additional limitations. The claims as drafted, are not patent eligible.
Claim 28 recites “providing the domain-adapted embedding vector of the input phrase to a semantic search engine for executing of a semantic search over a vector search database to retrieve content related to the input phrase using the domain-adapted embedding vector of the input phrase”, to perform a search based on a query could be performed in the human mind or with the aid of pen and paper. The claim recites additional limitations of “search engine”. Server is specified in specification, para.[0083], [0085] as performing generic computer functions, which is not sufficient to amount to significantly more than the judicial exception. The claim 28 as drafted, is not patent eligible.
Claim 29 recites “providing the domain-adapted embedding vector for storage into a domain-specific vector search database, the domain-specific vector search database being provided for use for semantic search execution by an embedding vector similarity search”, to determine that search can be performed by similarity search, could be an observation, evaluation and could be performed in the human mind or with the aid of pen and paper. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception, as claim 29 does not recite any additional limitations. The claim as drafted, is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 21, 22, 27, 28, 29, 30, 31, 36, 37, 38 are rejected under 35 U.S.C. 103 as being unpatentable over Ling et al. ( US 20210232768 A1), hereinafter referenced as Ling, in view of Batina et al. (US 20240289361 A1), hereinafter referenced as Batina.
Regarding Claim 21, Ling teaches a computer implemented method, the method comprising:
receiving an input phrase for generating a domain-adapted embedding vector for the phrase ( Ling: Para.[0070], [0072], Fig. 2 illustrates a method for lexicon embedding and extra tagging embedding generation. Receiving the phrase “ prostate cancer” in a sentence " ... new diagnoses of prostate cancer ... " to generate lexicon embedding);
scanning the input phrase to identify one or more domain-specific terms from the input phrase that are included in a domain-specific dictionary ( Ling: Para.[0072], Fig. 2, the phrase "prostate cancer" is mapped in TRIE dictionary 220, which is built from domain specific vocabulary database 210 and the dictionary 220 is queried when the input sentence with input phrase is received);
obtaining, from the domain-specific dictionary, one or more domain-adapted embedding vectors for each domain-specific term of the one or more domain-specific terms from the domain specific dictionary ( Ling: Para.[0072], Fig. 2, based on any matching results of the input phrase, the query provides a tagging sequence as output. The tagging results 235 are further used to generate the lexicon embedding vector 160. This is accomplished by creating an entry for the tagged phrase, "prostate cancer" in this example, in the lexicon embedding vector matrix 160);
generating a generic embedding vector for the input phrase [using a large language model], ( Ling: Para.[0073], Generating the extra tagging embedding ( generic embedding) may utilize a clinical NLP engine 250 instead of using a vocabulary database. For each input sentence 200, the clinical NLP engine 250 is queried 260, and the tagging sequence is output. The tagging results 270 are further used to generate the extra tagging embedding vector 150 ( generic embedding vector));
and combining the generic embedding vector with the one or more domain-adapted embedding vectors to provide the domain-adapted embedding vector for the input phrase (Ling: Para.[0075], [0076], Fig.3, The lexicon embedding vector 160 ( domain adapted embedding vector) and the extra tagging embedding vector 150 ( generic embedding vector), in combination, may be called domain knowledge embedding to provide the domain-adapted embedding vector for the input phrase “ prostate cancer”).
Ling while teaching the method of claim 21, fails to explicitly teach the claimed, generating a generic embedding vector for the input phrase using a large language model, wherein the generic embedding vector is configured for storage in a domain-specific dictionary and subsequent use in semantic query processing
However, Batina does teach the claimed, generating a generic embedding vector for the input phrase using a large language model ( Batina: Para.[0064], Fig. 6, generating embedding vector 60 by using large language model, which is a learned numerical representation of a token that captures some semantic meaning of the text segment represented by the token 56),
wherein the generic embedding vector is configured for storage in a domain-specific dictionary and subsequent use in semantic query processing ( Batina: Para.[0077]-[0082], Fig. 1, system 100 includes a generative AI model 112, a search engine 114, an embeddings module 116, and a media database 130. Embedding module 116 creates/ stores the vector representation of data. The search engine 114 is configured to perform vector search based on search query).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Batina’s teaching of search engine technology and, more particularly, to search techniques that leverage use of large language models (LLMs ), into the system and method of machine learning model with evolving domain specific lexicon feature of text annotation, taught by Ling, because, the use of LLM in processing user inputs and performing vector search by generating data that complements/ enhances user input would improve the search technology (Batina, Para.[0033],[0034]).
Regarding Claim 30, Ling teaches non-transitory computer-readable storage medium coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations ( Ling: Para.[0083]-[0088], non-transitory computer-readable storage medium may store instructions for execution by the processor) , the operations comprising
receiving an input phrase for generating a domain-adapted embedding vector for the phrase ( Ling: Para.[0070], [0072], Fig. 2 illustrates a method for lexicon embedding and extra tagging embedding generation. Receiving the phrase “ prostate cancer” in a sentence " ... new diagnoses of prostate cancer ... " to generate lexicon embedding);
scanning the input phrase to identify one or more domain-specific terms from the input phrase that are included in a domain-specific dictionary ( Ling: Para.[0072], Fig. 2, the phrase "prostate cancer" is mapped in TRIE dictionary 220, which is built from domain specific vocabulary database 210 and the dictionary 220 is queried when the input sentence with input phrase is received);
obtaining, from the domain-specific dictionary, one or more domain-adapted embedding vectors for each domain-specific term of the one or more domain-specific terms from the domain specific dictionary ( Ling: Para.[0072], Fig. 2, based on any matching results of the input phrase, the query provides a tagging sequence as output. The tagging results 235 are further used to generate the lexicon embedding vector 160. This is accomplished by creating an entry for the tagged phrase, "prostate cancer" in this example, in the lexicon embedding vector matrix 160);
generating a generic embedding vector for the input phrase [using a large language model], ( Ling: Para.[0073], Generating the extra tagging embedding ( generic embedding) may utilize a clinical NLP engine 250 instead of using a vocabulary database. For each input sentence 200, the clinical NLP engine 250 is queried 260, and the tagging sequence is output. The tagging results 270 are further used to generate the extra tagging embedding vector 150 ( generic embedding vector));
and combining the generic embedding vector with the one or more domain-adapted embedding vectors to provide the domain-adapted embedding vector for the input phrase (Ling: Para.[0075], [0076], Fig.3, The lexicon embedding vector 160 ( domain adapted embedding vector) and the extra tagging embedding vector 150 ( generic embedding vector), in combination, may be called domain knowledge embedding to provide the domain-adapted embedding vector for the input phrase “ prostate cancer”).
Ling while teaching the non-transitory computer-readable storage medium of claim 30, fails to explicitly teach the claimed, generating a generic embedding vector for the input phrase using a large language model.
However, Batina does teach the claimed, generating a generic embedding vector for the input phrase using a large language model ( Batina: Para.[0064], Fig. 6, generating embedding vector 60 by using large language model, which is a learned numerical representation of a token that captures some semantic meaning of the text segment represented by the token 56),
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Batina’s teaching of search engine technology and, more particularly, to search techniques that leverage use of large language models (LLMs ), into the system and method of machine learning model with evolving domain specific lexicon feature of text annotation, taught by Ling, because, the use of LLM in processing user inputs and performing vector search by generating data that complements/ enhances user input would improve the search technology (Batina, Para.[0033],[0034]).
Regarding Claim 37, Ling teaches a system, comprising: a computing device; and a computer-readable storage device coupled to the computing device and having instructions stored thereon which, when executed by the computing device, cause the computing device to perform operations ( Ling: Para.[0083]-[0088], non-transitory computer-readable storage medium may store instructions for execution by the processor, which may be implemented on multiprocessor computer systems, distributed computer systems, and cloud computing systems) , the operations comprising:
receiving an input phrase for generating a domain-adapted embedding vector for the phrase ( Ling: Para.[0070], [0072], Fig. 2 illustrates a method for lexicon embedding and extra tagging embedding generation. Receiving the phrase “ prostate cancer” in a sentence " ... new diagnoses of prostate cancer ... " to generate lexicon embedding);
scanning the input phrase to identify one or more domain-specific terms from the input phrase that are included in a domain-specific dictionary ( Ling: Para.[0072], Fig. 2, the phrase "prostate cancer" is mapped in TRIE dictionary 220, which is built from domain specific vocabulary database 210 and the dictionary 220 is queried when the input sentence with input phrase is received);
obtaining, from the domain-specific dictionary, one or more domain-adapted embedding vectors for each domain-specific term of the one or more domain-specific terms from the domain specific dictionary ( Ling: Para.[0072], Fig. 2, based on any matching results of the input phrase, the query provides a tagging sequence as output. The tagging results 235 are further used to generate the lexicon embedding vector 160. This is accomplished by creating an entry for the tagged phrase, "prostate cancer" in this example, in the lexicon embedding vector matrix 160);
generating a generic embedding vector for the input phrase [using a large language model], ( Ling: Para.[0073], Generating the extra tagging embedding ( generic embedding) may utilize a clinical NLP engine 250 instead of using a vocabulary database. For each input sentence 200, the clinical NLP engine 250 is queried 260, and the tagging sequence is output. The tagging results 270 are further used to generate the extra tagging embedding vector 150 ( generic embedding vector));
and combining the generic embedding vector with the one or more domain-adapted embedding vectors to provide the domain-adapted embedding vector for the input phrase (Ling: Para.[0075], [0076], Fig.3, The lexicon embedding vector 160 ( domain adapted embedding vector) and the extra tagging embedding vector 150 ( generic embedding vector), in combination, may be called domain knowledge embedding to provide the domain-adapted embedding vector for the input phrase “ prostate cancer”).
Ling while teaching the system of claim 37, fails to explicitly teach the claimed, generating a generic embedding vector for the input phrase using a large language model.
However, Batina does teach the claimed, generating a generic embedding vector for the input phrase using a large language model ( Batina: Para.[0064], Fig. 6, generating embedding vector 60 by using large language model, which is a learned numerical representation of a token that captures some semantic meaning of the text segment represented by the token 56),
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Batina’s teaching of search engine technology and, more particularly, to search techniques that leverage use of large language models (LLMs ), into the system and method of machine learning model with evolving domain specific lexicon feature of text annotation, taught by Ling, because, the use of LLM in processing user inputs and performing vector search by generating data that complements/ enhances user input would improve the search technology (Batina, Para.[0033],[0034]).
Regarding Claim 22, Ling in view of Batina teach the method of claim 21. Ling further teaches, wherein scanning the input phrase comprises: tokenizing the input phrase in sequences of tokens ( Ling: Para.[0063], [0064], Fig. 1, the system receives the input as a sequence of vectors ( x1, x2, …, xn) ( tokens)) ;
and determining whether the sequences of tokens match domain-specific terms in the domain specific dictionary ( Ling: Para.[0072], Fig.2, an input sentence 200 is received and the dictionary 220 is queried 230. Based on any matching results, the query provides a tagging sequence as output) .
Claim 31 is non-transitory computer-readable storage medium claim performing the steps in method claim 22 above and as such, claim 31 is similar in scope and content to claim 22 and therefore, claim 31 is rejected under similar rationale as presented against claim 22 above.
Claim 38 is system claim performing the steps in method claim 22 above and as such, claim 38 is similar in scope and content to claim 22 and therefore, claim 38 is rejected under similar rationale as presented against claim 22 above.
Regarding Claim 27, Ling in view of Batina teach the method of claim 21. Ling further teaches comprising: building the domain-specific dictionary comprising: identifying a list of domain-specific terms ( Ling: Para.[0052],[0053], illustrates the training of a model on a well-labeled dataset while being capable to apply the trained model to a new unlabeled dataset without losing important domain-specific features for the new dataset. These embodiments train a LSTM-CRF model for disorder annotation based on well-labeled scientific article text data. The LSTM-CRF model further encodes domain-specific lexicon features from a general dictionary. Additionally, the LSTM-CRF model encodes evolving feedback from the unlabeled corpus. Thus, even though the LSTM-CRF model is trained on one specific dataset, the LSTM-CRF model may be applied to a different dataset with evolving lexicon features) ;
for each domain-specific term, gathering textual content comprising a description or a definition of the respective domain-specific term ( Ling: Para.[0053],[0059]-[0061], well-labeled scientific article text data, hybrid clinical NLP engine, disease vocabulary may be used for each domain specific term to improve the neural network based method for disorder annotation).
and calculating, [using a pre-trained language model], a domain-adapted embedding vector based on the textual content gathered for the domain-specific term ( Ling: Para.[0072], Fig. 2, based on any matching results of the input phrase, the query provides a tagging sequence as output. The tagging results 235 ( from the textual documents) are further used to generate the lexicon embedding vector 160. This is accomplished by creating an entry for the tagged phrase, "prostate cancer" in this example, in the lexicon embedding vector matrix 160);
and storing the domain-specific term [and the domain-adapted embedding vector] into the domain-specific dictionary ( Ling: Para.[0072], Dictionary 220 is built with domain specific vocabulary database 210 and new entries are added to, entries are deleted from, or entries are updated in the dictionary);
Batina further teaches, and storing [the domain-specific term and] the domain-adapted embedding vector] into the domain-specific dictionary ( Batina: Para.[0077]-[0082], Fig. 1, system 100 includes a generative AI model 112, a search engine 114, an embeddings module 116, and a media database 130. Embedding module 116 creates/ stores the vector representation of data).
and calculating, using a pre-trained language model, a domain-adapted embedding vector based on the textual content gathered for the domain-specific term ( Batina: Para.[0064],[0068], Fig. 6, generating embedding vector by using pre trained large language model, which has been fine-tuned with training datasets based on text-based chats).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Batina’s teaching of search engine technology and, more particularly, to search techniques that leverage use of large language models (LLMs ), into the system and method of machine learning model with evolving domain specific lexicon feature of text annotation, taught by Ling, because, the use of LLM in processing user inputs and performing vector search by generating data that complements/ enhances user input would improve the search technology (Batina, Para.[0033],[0034]).
Claim 36 is non-transitory computer-readable storage medium claim performing the steps in method claim 27 above and as such, claim 36 is similar in scope and content to claim 27 and therefore, claim 36 is rejected under similar rationale as presented against claim 27 above.
Regarding Claim 28, Ling in view of Batina teach the method of claim 21. Batina further teaches, comprising: providing the domain-adapted embedding vector of the input phrase to a semantic search engine for executing of a semantic search over a vector search database to retrieve content related to the input phrase using the domain-adapted embedding vector of the input phrase ( Batina: Para.[0081],[0082], the search engine 114 may perform a search of the relevant search space using the query data. The search may be a keyword search, a vector similarity search, or a hybrid search. the search engine 114 is configured to perform vector searches. A vector search uses vector embeddings for representing and searching content. Para.[0064], The embedding 60 represents the text segment corresponding to the token 56 in a way such that embeddings corresponding to semantically-related text are closer to each other in a vector space than embeddings corresponding to semantically-unrelated text).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Batina’s teaching of search engine technology and, more particularly, to search techniques that leverage use of large language models (LLMs ), into the system and method of machine learning model with evolving domain specific lexicon feature of text annotation, taught by Ling, because, the use of LLM in processing user inputs and performing vector search by generating data that complements/ enhances user input would improve the search technology (Batina, Para.[0033],[0034]).
Regarding Claim 29, Ling in view of Batina teach the method of claim 21. Batina further teaches, comprising: providing the domain-adapted embedding vector for storage into a domain-specific vector search database, the domain-specific vector search database being provided for use for semantic search execution by an embedding vector similarity search ( Batina: Para.[0077]-[0082], Fig. 1, system 100 includes a generative AI model 112, a search engine 114, an embeddings module 116, and a media database 130. Embedding module 116 creates/ stores the vector representation of data. The search engine 114 is configured to perform vector search based on search query. The search may be a keyword search, a vector similarity search, or a hybrid search. The search space may comprise, for example, private or public repositories of data, document libraries, etc. or an embedding space corresponding to such data sources).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Batina’s teaching of search engine technology and, more particularly, to search techniques that leverage use of large language models (LLMs ), into the system and method of machine learning model with evolving domain specific lexicon feature of text annotation, taught by Ling, because, the use of LLM in processing user inputs and performing vector search by generating data that complements/ enhances user input would improve the search technology (Batina, Para.[0033],[0034]).
Claims 23-26, 32-35, 39 and 40 are rejected under 35 U.S.C. 103 as being unpatentable over Ling et al. ( US 20210232768 A1), hereinafter referenced as Ling, in view of Batina et al. (US 20240289361 A1), hereinafter referenced as Batina, further in view of Yu et al. (US 11030999 B1), hereinafter referenced as Yu.
Regarding Claim 23, Ling in view of Batina teach the method of claim 21. Ling in view of Batina, fail to teach explicitly the claimed, wherein the generic embedding vector is a non- domain adapted embedding vector generated for the input phrase as a whole.
However, Yu does teach the claimed, wherein the generic embedding vector is a non- domain adapted embedding vector generated for the input phrase as a whole ( Yu: Column 4, lines 58-63, "global word embedding data" ( generic embedding) may refer to one or more word embeddings that are not configured with respect to any particular skill. That is, global word embedding data may represent relationships between words based on how the words are used in natural language that is not specific to any particular skill).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Yu’s teaching of the generation and use of word embeddings as part of natural language understanding (NLU) processing performed by a natural language processing system, into the system and method, taught by Ling in view of Batina, because, the generation of word embeddings using text corpuses including text (representing spoken user inputs) output from ASR processing and/ or text corresponding to typed natural language inputs would improve the accuracy of human computer interactions. (Yu, Column 3, lines 6-51).
Claim 32 is non-transitory computer-readable storage medium claim performing the steps in method claim 23 above and as such, claim 32 is similar in scope and content to claim 23 and therefore, claim 32 is rejected under similar rationale as presented against claim 23 above.
Claim 39 is system claim performing the steps in method claim 23 above and as such, claim 39 is similar in scope and content to claim 23 and therefore, claim 39 is rejected under similar rationale as presented against claim 23 above.
Regarding Claim 24, Ling in view of Batina teach the method of claim 21. Ling in view of Batina, fail to teach explicitly the claimed, wherein generating the generic embedding vector comprises: updating the input phrase by removing the domain-specific term or substituting the domain-specific term for generic replacements corresponding to a generic definition or description of the respective domain-specific term; and generating the generic embedding vector based on the updated input phrase.
However, Yu does teach the claimed, wherein generating the generic embedding vector comprises: updating the input phrase by removing the domain-specific term or substituting the domain-specific term for generic replacements corresponding to a generic definition or description of the respective domain-specific term ( Yu: Column 21, lines 57-67, Fig.9, The NLU component 760 may include a light slot filler component 952 can take text from slots represented in the NLU hypotheses output by the pruning component 950 and alter them to make the text more easily processed by downstream components. Light slot filler component 952 can replace words with other words or values that may be more easily understood by downstream components);
and generating the generic embedding vector based on the updated input phrase ( Yu: Column 3, lines 6-11, 64-67, Figs. 1A, 1B shows how the system 100 may generate embedding vectors from any sources ( text or audio)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Yu’s teaching of the generation and use of word embeddings as part of natural language understanding (NLU) processing performed by a natural language processing system, into the system and method, taught by Ling in view of Batina, because, the generation of word embeddings using text corpuses including text (representing spoken user inputs) output from ASR processing and/ or text corresponding to typed natural language inputs would improve the accuracy of human computer interactions. (Yu, Column 3, lines 6-51).
Claim 33 is non-transitory computer-readable storage medium claim performing the steps in method claim 24 above and as such, claim 33 is similar in scope and content to claim 24 and therefore, claim 33 is rejected under similar rationale as presented against claim 24 above.
Claim 40 is system claim performing the steps in method claim 24 above and as such, claim 40 is similar in scope and content to claim 24 and therefore, claim 40 is rejected under similar rationale as presented against claim 24 above.
Regarding Claim 25, Ling in view of Batina teach the method of claim 21. Ling in view of Batina fail to explicitly teach the claimed, wherein combining the generic embedding vector with the one or more domain-adapted embedding vectors comprises: performing a mean vector calculation of the generic embedding vector with the one or more domain-adapted embedding vectors to provide the domain-adapted embedding vector.
However, Yu does teach the claimed, wherein combining the generic embedding vector with the one or more domain-adapted embedding vectors comprises: performing a mean vector calculation of the generic embedding vector with the one or more domain-adapted embedding vectors to provide the domain-adapted embedding vector ( Yu: Column 4, lines 52-63, column 13, lines 29-54, Fig. 2, word embedding component 210 may perform one or more of the embedding techniques/functions to generate phrase embedding data. Each word may be associated with a respective value and a respective weight. The weight of a word may be configured based on how frequent the word is used in natural language. Global word embedding data ( generic embedding) can be weighted to transform the global word embedding data into word embedding data specific to a skill ( domain adapted embedding)).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Yu’s teaching of the generation and use of word embeddings as part of natural language understanding (NLU) processing performed by a natural language processing system, into the system and method, taught by Ling in view of Batina, because, the generation of word embeddings using text corpuses including text (representing spoken user inputs) output from ASR processing and/ or text corresponding to typed natural language inputs would improve the accuracy of human computer interactions. (Yu, Column 3, lines 6-51).
Claim 34 is non-transitory computer-readable storage medium claim performing the steps in method claim 25 above and as such, claim 34 is similar in scope and content to claim 25 and therefore, claim 34 is rejected under similar rationale as presented against claim 25 above.
Regarding Claim 26, Ling in view of Batina, further in view of Yu teach the method of claim 25. Yu further teaches, wherein the mean vector calculation includes applying of weight factors to respective vectors based on frequency of occurrence within the phrase ( Yu: Column 13, lines 29-54, Fig. 2, word embedding component 210 may perform one or more of the embedding techniques/functions to generate phrase embedding data. Each word may be associated with a respective value and a respective weight. The weight of a word may be configured based on how frequent the word is used in natural language. the more frequent the word is used, the smaller the word's weight, and vice versa. As an example, the word "the" may be associated with a relatively
small weight).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Yu’s teaching of the generation and use of word embeddings as part of natural language understanding (NLU) processing performed by a natural language processing system, into the system and method, taught by Ling in view of Batina, because, the generation of word embeddings using text corpuses including text (representing spoken user inputs) output from ASR processing and/ or text corresponding to typed natural language inputs would improve the accuracy of human computer interactions. (Yu, Column 3, lines 6-51).
Claim 35 is non-transitory computer-readable storage medium claim performing the steps in method claim 26 above and as such, claim 35 is similar in scope and content to claim 26 and therefore, claim 35 is rejected under similar rationale as presented against claim 26 above.
Conclusion
Listed below are the prior arts made of record and not relied upon but are considered pertinent to applicant's disclosure.
Alshinnawi et al. (US 9424253 B2) teaches a method, system and computer program product for the domain specific normalization of a corpus of text. In an embodiment of the invention, a method for domain specific normalization of a corpus of text is provided, including an industrial, organization, demographic or geographic domain. The method includes loading a corpus of text in memory of a computer and determining a domain for the corpus of text. The method also includes retrieving a lexicon of replacement words for the determined domain. Finally, the method includes text simplifying the corpus of text using the retrieved lexicon. In one aspect of the embodiment, the domain is determined through inference based upon words already presence in the corpus of text. In another aspect of the embodiment, the domain is determined based upon meta-data provided with the corpus of text.
Osuala et al. (US 11983208 B2) teaches a method, computer system, and a computer program product for searching. The method may include receiving a word and a context of the word. The context may include additional words. A first word embedding may be generated by inputting a sequence into a word embedding model that resultantly outputs the first word embedding. The sequence may include the word and the context that are concatenated to each other in the sequence. The first word embedding may be compared with other word embeddings. The other word embeddings may have been generated by inputting respective text portions of other texts into the word embedding model. A candidate match of the other texts may be presented. A respective word embedding of the candidate match may be, of the other word embeddings, most similar to the first word embedding according to the comparing..
Acharya et al. (US 10872601 B1) teaches a natural language understanding (NLU) system that uses a reduced dimensionality of word embedding features to configure compressed NLU models that use reduced computing resources for NLU tasks. A modified NLU model may include a compressed vocabulary data structure of word embedding data vectors that include a set of values corresponding to a reduced dimensionality of the original word embedding features, resulting in a smaller sized vocabulary data structure and reduced size of the vocabulary data structure. Further components of the modified NLU model perform matrix operations to expand the dimensionality of the reduced word embedding data vectors up to the expected dimensionality of later layers of the NLU model. Additional training and reweighting can adjust for potential loses in performance resulting from reductions in the word embedding features. Thus the modified NLU model can achieve similar performance to an original NLU model with reductions in use of computing resources.
Li et al. (CN 109284397 A ) teaches an invention which is suitable for the technical field of natural language processing, claims a field dictionary construction method, device, equipment and storage medium, the method comprises: respectively performing a word vector model training for selecting the general corpus and the domain corpus. obtaining the corresponding common word vector space model and the domain word vector space model, calculating the word semantic similarity seed word vector of universal word vector space model and the domain in the word vector space model corresponding to the universal word vector and domain word vector field with the initial seed dictionary. according to the calculated word semantic similarity, selecting the corresponding universal word vector or field word vector by expanding the initial seed dictionary field to obtain the corresponding domain dictionary, by discovering new words into word vocabulary algorithm for screening in the field dictionary so as to finish the construction of field dictionary, so as to expand the vocabulary dictionary of field, and improves the accuracy of the field words to the dictionary, so as to improve the accuracy of field dictionary.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NADIRA SULTANA whose telephone number is (571)272-4048. The examiner can normally be reached M-F,7:30 am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached on (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NADIRA SULTANA/Examiner, Art Unit 2653