Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The Amendment filed February 3, 2026, has been entered. Claims 1 – 20 are pending in the application.
Response to Arguments
Applicant’s arguments, filed February 3, 2026, with respect to claims 1 – 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Specification
The disclosure is objected to because it contains an embedded hyperlink and/or other form of browser-executable code in paragraph 0074. Applicant is required to delete the embedded hyperlink and/or other form of browser-executable code; references to websites should be limited to the top-level domain name without any prefix such as http:// or other browser-executable code. See MPEP § 608.01.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 4 – 5, 11 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Yao et al. ("Clarifying Ambiguous Keywords with Personal Word Embeddings for Personalized Search"), hereinafter Yao, in view of Mikolov et al. (US Patent No. 9,037,464), hereinafter Mikolov, and Huang et al. ("Improving Word Representations via Global Context and Multiple Word Prototypes"), hereinafter Huang.
Regarding claim 1, Yao discloses a computer-implemented method comprising:
creating, [by one or more processors], a basic corpus for a first user using a first set of data sources, wherein the basic corpus includes one or more basic words and one or more vectors of the one or more basic words (Section 3.1, lines 1-8, "In this article, we propose to train personal word embeddings that mainly contain the user interested meanings to disambiguate the query keywords and achieve search results personalization. According to existing works [5, 17, 18], user interests are mainly reflected by the user’s issued queries and clicked documents under each query. Therefore, the simplest approach to obtain personal word representations containing user interests is to train a personal language model on the user’s corpus that consists of the user-issued queries and clicked documents [3, 35].We implement this approach by training a personalWord2vec model [31] for each user with her own query log as one of our baselines, introduced as PPWE in Section 4.2."; Section 3.1.1, lines 1-10, "In the pre-training model, there is a personal word embedding layer. In this layer, we keep personal word embedding matrices for each individual user and a shared global embedding matrix. We use two signs to identify a word, i.e., the word and the corresponding user ID, so the same word of different users are identified as different words. For example, the word “Apple” in the word embedding matrix of the user ui should be represented as “Apple + ui ,” while “Apple + uj” is for the user uj . In this layer, the personal word embeddings are only pre-trained with the corresponding user’s search log data. Thus, the well-trained embedding of a specific word is not a general representation of various meanings of this word in the overall logs, but mainly the personalized meaning that the user already knows and can reflect the user interests."; Personal word embeddings that mainly contain the user interested meanings read on a basic corpus for a first user, the user’s search log data reads on a first set of data sources, words from the user’s search log data read on one or more basic words, and personal word embeddings read on one or more vectors of the one or more basic words.);
extracting, [by the one or more processors], a set of text from a second set of data sources associated with the first user (Section 3.3, lines 1-6, "The personal word embeddings pre-trained from the user’s current query log have contained the user interests reflected in the search history. In real-world application scenarios, users will continuously issue new queries and their interests are dynamically changing in the streaming setting. To ensure that our personal word embeddings contain the latest user interests, we should fine-tune the personal word embeddings according to the newly issued queries along with the search process, keeping the ranking model fixed."; Newly issued queries read on a set of text from a second set of data sources associated with the first user.).
Yao does not specifically disclose: one or more processors; responsive to finding an unknown word included in the set of text extracted, updating, by the one or more processors, the basic corpus, wherein the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus.
Mikolov teaches:
one or more processors (Column 10, lines 63-66, "Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit.");
responsive to finding an unknown word included in the set of text extracted, updating, by the one or more processors, the basic corpus, wherein the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus (Column 4, lines 11-20, "This specification generally describes systems that can be used to generate numeric representations of words in a high-dimensional space. The numeric representations are continuous high-dimensional representations, i.e., words are represented by floating point numbers in a high-dimensional space, e.g., as high-dimensional vectors of floating point numbers. The systems can be trained so that positions of the representations in the high-dimensional space generated by the systems reflect semantic and syntactic similarities between the words they represent."; Column 5, lines 22-54, "FIG. 2 is a flow diagram of an example process 200 for predicting a word based on surrounding words. For convenience, the process 200 will be described as being performed by a system of one or more computers located in one or more locations. For example, a word prediction system, e.g., the word prediction system 200 of FIG. 2, appropriately programmed, can perform the process 200.The system obtains a set of input words (step 202). The set of input words are words from a sequence of words that includes an unknown word whose value is to be predicted. That is, if the sequence includes an unknown word at position t, the set of input words may be the words at position t-N, . . . , t-1, t+1, . . . , and t+N in the sequence. In some implementations, the input words are tokenized before being received by the system, e.g., so that known compounds are treated as a single word by the system. The system processes the words using an embedding function (step 204) to generate a numeric representation of the words. For example, the embedding function may be a combining embedding function. A combining embedding function maps each word in the sequence of words to a respective continuous high-dimensional representation, e.g., to a respective high-dimensional vector of floating point numbers, based on current parameter values of the embedding function, e.g., as stored in a lookup table, and then merges the respective floating point vectors into a single merged vector. The combining embedding function can merge the respective floating point vectors using a linear function, e.g., a sum, average, or weighted linear combination of the respective floating point vectors, or using a nonlinear function, e.g., a component-wise maximum or a norm-constrained linear combination, for example."; A sequence including an unknown word at position t reads on finding an unknown word included in the set of text extracted, and predicting the value of a vector representation of an unknown word by mapping each word in a sequence of words surrounding the unknown word to a vector representation and using a combining embedding function to merge the vectors using an average, reads on replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus and registering the unknown word.).
Mikolov is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao to incorporate the teachings of Mikolov to predict the value of a vector representation of an unknown word by mapping each word in a sequence of words surrounding the unknown word to a vector representation and using a combining embedding function to merge the vectors using an average, as taught by Mikolov, to update personal word embeddings as disclosed by Yao. Doing so would allow for the positions of representations in high-dimensional space to reflect semantic and syntactic similarities between the words they represent (Mikolov; Column 4, lines 11-20).
Yao in view of Mikolov does not specifically disclose: responsive to finding a known word included in the set of text extracted, determining, by the one or more processors, a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus.
Huang teaches:
responsive to finding a known word included in the set of text extracted, determining, by the one or more processors, a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus (Section 1, lines 1-5, “Vector-space models (VSM) represent word meanings with vectors that capture semantic and syntactic information of words. These representations can be used to induce similarity measures by computing distances between the vectors”; Section 3, lines 24-40, “In order to learn multiple prototypes, we first gather the fixed-sized context windows of all occurrences of a word (we use 5 words before and after the word occurrence). Each context is represented by a weighted average of the context words’ vectors, where again, we use idf-weighting as the weighting function, similar to the document context representation described in Section 2.2. We then use spherical k-means to cluster these context representations, which has been shown to model semantic relations well (Dhillon and Modha, 2001). Finally, each word occurrence in the corpus is re-labeled to its associated cluster and is used to train the word representation for that cluster. Similarity between a pair of words (w,w’) using the multi-prototype approach can be computed with or without context”; Section 3, lines 44-49, “p(c,w, i) is the likelihood that word w is in its cluster i given context c, μi(w) is the vector representing the i-th cluster centroid of w, and d(v, v’) is a function computing similarity between two vectors, which can be any of the distance functions presented by Curran (2004).”); Occurrences of a word reads on finding a known word, a context represented by a weighted average of the vectors of the context words read on the average vector of the one or more basic words, computing the similarity between the vector of the word and the vector of the context cluster using a distance function reads on determining a distance between a vector of the known word and the average vector of the one or more basic words, and learning multiple prototypes reads on registering the known word.).
Huang is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao in view of Mikolov to incorporate the teachings of Huang to learn multiple prototypes of a polysemous word by taking fixed-sized context windows of all occurrences of a word and computing the similarity between the vector of the word and the vector of the context cluster using a distance function, where a context is represented by a weighted average of the vectors of the context words, as taught by Huang, to update personal word embeddings as disclosed by Yao. Doing so would allow for learning word embeddings that better capture the semantics of words by incorporating both local and global document context and accounting for homonymy and polysemy by learning multiple embeddings per word (Huang; Abstract, lines 9-20).
Regarding claim 4, Yao in view of Mikolov and Huang discloses the computer-implemented method as claimed in claim 1.
Yao further disclose:
wherein the second set of data sources includes at least one of a group of historical information acquired from a user computing device of the first user and a group of information input into the user computing device by the first user (Section 3.3, lines 1-6, "The personal word embeddings pre-trained from the user’s current query log have contained the user interests reflected in the search history. In real-world application scenarios, users will continuously issue new queries and their interests are dynamically changing in the streaming setting. To ensure that our personal word embeddings contain the latest user interests, we should fine-tune the personal word embeddings according to the newly issued queries along with the search process, keeping the ranking model fixed."; Newly issued queries by a user read on a group of information input into the user computing device by the first user.).
Regarding claim 5, Yao in view of Mikolov and Huang discloses the computer-implemented method as claimed in claim 4.
Yao further disclose:
wherein the group of historical information acquired from the user computing device of the first user and the group of information input into the user computing device by the first user includes at least one of a web browsing history of the user computing device, an email history of the user computing device, a chat history of the user computing device, and a text message history of the user computing device (Section 3.3, lines 1-6, "The personal word embeddings pre-trained from the user’s current query log have contained the user interests reflected in the search history. In real-world application scenarios, users will continuously issue new queries and their interests are dynamically changing in the streaming setting. To ensure that our personal word embeddings contain the latest user interests, we should fine-tune the personal word embeddings according to the newly issued queries along with the search process, keeping the ranking model fixed."; Newly issued queries by a user read on a web browsing history of the user computing device.).
Regarding claim 11, Yao discloses a computer program product comprising:
program instructions to create a basic corpus for a first user using a first set of data sources, wherein the basic corpus includes one or more basic words and one or more vectors of the one or more basic words (Section 3.1, lines 1-8, "In this article, we propose to train personal word embeddings that mainly contain the user interested meanings to disambiguate the query keywords and achieve search results personalization. According to existing works [5, 17, 18], user interests are mainly reflected by the user’s issued queries and clicked documents under each query. Therefore, the simplest approach to obtain personal word representations containing user interests is to train a personal language model on the user’s corpus that consists of the user-issued queries and clicked documents [3, 35].We implement this approach by training a personalWord2vec model [31] for each user with her own query log as one of our baselines, introduced as PPWE in Section 4.2."; Section 3.1.1, lines 1-10, "In the pre-training model, there is a personal word embedding layer. In this layer, we keep personal word embedding matrices for each individual user and a shared global embedding matrix. We use two signs to identify a word, i.e., the word and the corresponding user ID, so the same word of different users are identified as different words. For example, the word “Apple” in the word embedding matrix of the user ui should be represented as “Apple + ui ,” while “Apple + uj” is for the user uj . In this layer, the personal word embeddings are only pre-trained with the corresponding user’s search log data. Thus, the well-trained embedding of a specific word is not a general representation of various meanings of this word in the overall logs, but mainly the personalized meaning that the user already knows and can reflect the user interests."; Personal word embeddings that mainly contain the user interested meanings read on a basic corpus for a first user, the user’s search log data reads on a first set of data sources, words from the user’s search log data read on one or more basic words, and personal word embeddings read on one or more vectors of the one or more basic words.);
program instructions to extract a set of text from a second set of data sources associated with the first user (Section 3.3, lines 1-6, "The personal word embeddings pre-trained from the user’s current query log have contained the user interests reflected in the search history. In real-world application scenarios, users will continuously issue new queries and their interests are dynamically changing in the streaming setting. To ensure that our personal word embeddings contain the latest user interests, we should fine-tune the personal word embeddings according to the newly issued queries along with the search process, keeping the ranking model fixed."; Newly issued queries read on a set of text from a second set of data sources associated with the first user.).
Yao does not specifically disclose: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media; responsive to finding an unknown word included in the set of text extracted, program instructions to update the basic corpus, wherein the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus.
Mikolov teaches:
one or more computer readable storage media and program instructions stored on the one or more computer readable storage media (Column 10, line 63 – Column 11, line 4, "Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.”);
responsive to finding an unknown word included in the set of text extracted, program instructions to update the basic corpus, wherein the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus (Column 4, lines 11-20, "This specification generally describes systems that can be used to generate numeric representations of words in a high-dimensional space. The numeric representations are continuous high-dimensional representations, i.e., words are represented by floating point numbers in a high-dimensional space, e.g., as high-dimensional vectors of floating point numbers. The systems can be trained so that positions of the representations in the high-dimensional space generated by the systems reflect semantic and syntactic similarities between the words they represent."; Column 5, lines 22-54, "FIG. 2 is a flow diagram of an example process 200 for predicting a word based on surrounding words. For convenience, the process 200 will be described as being performed by a system of one or more computers located in one or more locations. For example, a word prediction system, e.g., the word prediction system 200 of FIG. 2, appropriately programmed, can perform the process 200.The system obtains a set of input words (step 202). The set of input words are words from a sequence of words that includes an unknown word whose value is to be predicted. That is, if the sequence includes an unknown word at position t, the set of input words may be the words at position t-N, . . . , t-1, t+1, . . . , and t+N in the sequence. In some implementations, the input words are tokenized before being received by the system, e.g., so that known compounds are treated as a single word by the system. The system processes the words using an embedding function (step 204) to generate a numeric representation of the words. For example, the embedding function may be a combining embedding function. A combining embedding function maps each word in the sequence of words to a respective continuous high-dimensional representation, e.g., to a respective high-dimensional vector of floating point numbers, based on current parameter values of the embedding function, e.g., as stored in a lookup table, and then merges the respective floating point vectors into a single merged vector. The combining embedding function can merge the respective floating point vectors using a linear function, e.g., a sum, average, or weighted linear combination of the respective floating point vectors, or using a nonlinear function, e.g., a component-wise maximum or a norm-constrained linear combination, for example."; A sequence including an unknown word at position t reads on finding an unknown word included in the set of text extracted, and predicting the value of a vector representation of an unknown word by mapping each word in a sequence of words surrounding the unknown word to a vector representation and using a combining embedding function to merge the vectors using an average, reads on replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus and registering the unknown word.).
Mikolov is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao to incorporate the teachings of Mikolov to predict the value of a vector representation of an unknown word by mapping each word in a sequence of words surrounding the unknown word to a vector representation and using a combining embedding function to merge the vectors using an average, as taught by Mikolov, to update personal word embeddings as disclosed by Yao. Doing so would allow for the positions of representations in high-dimensional space to reflect semantic and syntactic similarities between the words they represent (Mikolov; Column 4, lines 11-20).
Yao in view of Mikolov does not specifically disclose: responsive to finding a known word included in the set of text extracted, program instructions to determine a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus.
Huang teaches:
responsive to finding a known word included in the set of text extracted, program instructions to determine a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus (Section 1, lines 1-5, “Vector-space models (VSM) represent word meanings with vectors that capture semantic and syntactic information of words. These representations can be used to induce similarity measures by computing distances between the vectors”; Section 3, lines 24-40, “In order to learn multiple prototypes, we first gather the fixed-sized context windows of all occurrences of a word (we use 5 words before and after the word occurrence). Each context is represented by a weighted average of the context words’ vectors, where again, we use idf-weighting as the weighting function, similar to the document context representation described in Section 2.2. We then use spherical k-means to cluster these context representations, which has been shown to model semantic relations well (Dhillon and Modha, 2001). Finally, each word occurrence in the corpus is re-labeled to its associated cluster and is used to train the word representation for that cluster. Similarity between a pair of words (w,w’) using the multi-prototype approach can be computed with or without context”; Section 3, lines 44-49, “p(c,w, i) is the likelihood that word w is in its cluster i given context c, μi(w) is the vector representing the i-th cluster centroid of w, and d(v, v’) is a function computing similarity between two vectors, which can be any of the distance functions presented by Curran (2004).”); Occurrences of a word reads on finding a known word, a context represented by a weighted average of the vectors of the context words read on the average vector of the one or more basic words, computing the similarity between the vector of the word and the vector of the context cluster using a distance function reads on determining a distance between a vector of the known word and the average vector of the one or more basic words, and learning multiple prototypes reads on registering the known word.).
Huang is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao in view of Mikolov to incorporate the teachings of Huang to learn multiple prototypes of a polysemous word by taking fixed-sized context windows of all occurrences of a word and computing the similarity between the vector of the word and the vector of the context cluster using a distance function, where a context is represented by a weighted average of the vectors of the context words, as taught by Huang, to update personal word embeddings as disclosed by Yao. Doing so would allow for learning word embeddings that better capture the semantics of words by incorporating both local and global document context and accounting for homonymy and polysemy by learning multiple embeddings per word (Huang; Abstract, lines 9-20).
Regarding claim 16, Yao discloses a computer system comprising:
program instructions to create a basic corpus for a first user using a first set of data sources, wherein the basic corpus includes one or more basic words and one or more vectors of the one or more basic words (Section 3.1, lines 1-8, "In this article, we propose to train personal word embeddings that mainly contain the user interested meanings to disambiguate the query keywords and achieve search results personalization. According to existing works [5, 17, 18], user interests are mainly reflected by the user’s issued queries and clicked documents under each query. Therefore, the simplest approach to obtain personal word representations containing user interests is to train a personal language model on the user’s corpus that consists of the user-issued queries and clicked documents [3, 35].We implement this approach by training a personalWord2vec model [31] for each user with her own query log as one of our baselines, introduced as PPWE in Section 4.2."; Section 3.1.1, lines 1-10, "In the pre-training model, there is a personal word embedding layer. In this layer, we keep personal word embedding matrices for each individual user and a shared global embedding matrix. We use two signs to identify a word, i.e., the word and the corresponding user ID, so the same word of different users are identified as different words. For example, the word “Apple” in the word embedding matrix of the user ui should be represented as “Apple + ui ,” while “Apple + uj” is for the user uj . In this layer, the personal word embeddings are only pre-trained with the corresponding user’s search log data. Thus, the well-trained embedding of a specific word is not a general representation of various meanings of this word in the overall logs, but mainly the personalized meaning that the user already knows and can reflect the user interests."; Personal word embeddings that mainly contain the user interested meanings read on a basic corpus for a first user, the user’s search log data reads on a first set of data sources, words from the user’s search log data read on one or more basic words, and personal word embeddings read on one or more vectors of the one or more basic words.);
program instructions to extract a set of text from a second set of data sources associated with the first user (Section 3.3, lines 1-6, "The personal word embeddings pre-trained from the user’s current query log have contained the user interests reflected in the search history. In real-world application scenarios, users will continuously issue new queries and their interests are dynamically changing in the streaming setting. To ensure that our personal word embeddings contain the latest user interests, we should fine-tune the personal word embeddings according to the newly issued queries along with the search process, keeping the ranking model fixed."; Newly issued queries read on a set of text from a second set of data sources associated with the first user.).
Yao does not specifically disclose: one or more computer readable storage media and program instructions stored on the one or more computer readable storage media; responsive to finding an unknown word included in the set of text extracted, program instructions to update the basic corpus, wherein the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus.
Mikolov teaches:
one or more computer processors; one or more computer readable storage media; program instructions collectively stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors (Column 10, line 63 - Column 11, line 4, "Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.”);
responsive to finding an unknown word included in the set of text extracted, program instructions to update the basic corpus, wherein the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus (Column 4, lines 11-20, "This specification generally describes systems that can be used to generate numeric representations of words in a high-dimensional space. The numeric representations are continuous high-dimensional representations, i.e., words are represented by floating point numbers in a high-dimensional space, e.g., as high-dimensional vectors of floating point numbers. The systems can be trained so that positions of the representations in the high-dimensional space generated by the systems reflect semantic and syntactic similarities between the words they represent."; Column 5, lines 22-54, "FIG. 2 is a flow diagram of an example process 200 for predicting a word based on surrounding words. For convenience, the process 200 will be described as being performed by a system of one or more computers located in one or more locations. For example, a word prediction system, e.g., the word prediction system 200 of FIG. 2, appropriately programmed, can perform the process 200.The system obtains a set of input words (step 202). The set of input words are words from a sequence of words that includes an unknown word whose value is to be predicted. That is, if the sequence includes an unknown word at position t, the set of input words may be the words at position t-N, . . . , t-1, t+1, . . . , and t+N in the sequence. In some implementations, the input words are tokenized before being received by the system, e.g., so that known compounds are treated as a single word by the system. The system processes the words using an embedding function (step 204) to generate a numeric representation of the words. For example, the embedding function may be a combining embedding function. A combining embedding function maps each word in the sequence of words to a respective continuous high-dimensional representation, e.g., to a respective high-dimensional vector of floating point numbers, based on current parameter values of the embedding function, e.g., as stored in a lookup table, and then merges the respective floating point vectors into a single merged vector. The combining embedding function can merge the respective floating point vectors using a linear function, e.g., a sum, average, or weighted linear combination of the respective floating point vectors, or using a nonlinear function, e.g., a component-wise maximum or a norm-constrained linear combination, for example."; A sequence including an unknown word at position t reads on finding an unknown word included in the set of text extracted, and predicting the value of a vector representation of an unknown word by mapping each word in a sequence of words surrounding the unknown word to a vector representation and using a combining embedding function to merge the vectors using an average, reads on replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus and registering the unknown word.).
Mikolov is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao to incorporate the teachings of Mikolov to predict the value of a vector representation of an unknown word by mapping each word in a sequence of words surrounding the unknown word to a vector representation and using a combining embedding function to merge the vectors using an average, as taught by Mikolov, to update personal word embeddings as disclosed by Yao. Doing so would allow for the positions of representations in high-dimensional space to reflect semantic and syntactic similarities between the words they represent (Mikolov; Column 4, lines 11-20).
Yao in view of Mikolov does not specifically disclose: responsive to finding a known word included in the set of text extracted, program instructions to determine a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus.
Huang teaches:
responsive to finding a known word included in the set of text extracted, program instructions to determine a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus (Section 1, lines 1-5, “Vector-space models (VSM) represent word meanings with vectors that capture semantic and syntactic information of words. These representations can be used to induce similarity measures by computing distances between the vectors”; Section 3, lines 24-40, “In order to learn multiple prototypes, we first gather the fixed-sized context windows of all occurrences of a word (we use 5 words before and after the word occurrence). Each context is represented by a weighted average of the context words’ vectors, where again, we use idf-weighting as the weighting function, similar to the document context representation described in Section 2.2. We then use spherical k-means to cluster these context representations, which has been shown to model semantic relations well (Dhillon and Modha, 2001). Finally, each word occurrence in the corpus is re-labeled to its associated cluster and is used to train the word representation for that cluster. Similarity between a pair of words (w,w’) using the multi-prototype approach can be computed with or without context”; Section 3, lines 44-49, “p(c,w, i) is the likelihood that word w is in its cluster i given context c, μi(w) is the vector representing the i-th cluster centroid of w, and d(v, v’) is a function computing similarity between two vectors, which can be any of the distance functions presented by Curran (2004).”); Occurrences of a word reads on finding a known word, a context represented by a weighted average of the vectors of the context words read on the average vector of the one or more basic words, computing the similarity between the vector of the word and the vector of the context cluster using a distance function reads on determining a distance between a vector of the known word and the average vector of the one or more basic words, and learning multiple prototypes reads on registering the known word.).
Huang is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao in view of Mikolov to incorporate the teachings of Huang to learn multiple prototypes of a polysemous word by taking fixed-sized context windows of all occurrences of a word and computing the similarity between the vector of the word and the vector of the context cluster using a distance function, where a context is represented by a weighted average of the vectors of the context words, as taught by Huang, to update personal word embeddings as disclosed by Yao. Doing so would allow for learning word embeddings that better capture the semantics of words by incorporating both local and global document context and accounting for homonymy and polysemy by learning multiple embeddings per word (Huang; Abstract, lines 9-20).
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Mikolov and Huang, and further in view of Wadhwa et al. (US Patent No. 11,461,822), hereinafter Wadhwa.
Regarding claim 2, Yao in view of Mikolov and Huang discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose: wherein creating the basic corpus for the first user using the first set of data sources further comprises: tagging, by the one or more processors, each basic word of the one or more basic words with a flag.
Wadhwa teaches:
tagging, by the one or more processors, each basic word of the one or more basic words with a flag (Column 11, lines 4-16, "Review customization computing device 102 may also generate item cluster data 380 identifying and characterizing clustered word embeddings of reviews for a plurality of items. For example, review customization computing device 102 may apply a dependency parser, or a part-of-speech tagger, to identify (e.g., tag) words or phrases of the reviews. Review customization device 102 may then generate aspects identifying portions of the reviews (e.g., words). Review customization computing device 102 may then generate word embeddings for the generated aspects by applying to the identified portions of the reviews a neural network model that is trained to reconstruct linguistic contexts of words, for example."; Applying a part-of-speech tagger to identify words or phrases reads on tagging each basic word with a flag.).
Wadhwa is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao in view of Mikolov and Huang to incorporate the teachings of Wadhwa to apply a part-of-speech tagger to identify words or phrases. Doing so would allow for generating word embeddings for review customization (Wadhwa; Column 11, lines 4-19).
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Mikolov and Huang, and further in view of Soper et al. ("When Polysemy Matters: Modeling Semantic Categorization with Word Embeddings"), hereinafter Soper.
Regarding claim 3, Yao in view of Mikolov and Huang discloses the computer-implemented method as claimed in claim 1, but does not specifically disclose: wherein creating the basic corpus for the first user using the first set of data sources further comprises: separating, by the one or more processors, a first basic word from the basic corpus if the basic word is polysemous; and clustering, by the one or more processors, the first basic word with a second basic word based on a degree of similarity.
Soper teaches:
separating, by the one or more processors, a first basic word from the basic corpus if the basic word is polysemous (Section 1, lines 19-25, "Our particular interest is on the role of polysemy in semantic categorization. Because words generally have multiple distinct senses, categorization decisions will depend on which sense of a word is being considered. Representing the distinct senses of polysemous words, then, should be important to modeling how humans categorize words."; Determining which sense of a word is being considered and representing the distinct senses of polysemous words read on separating a first basic word from the basic corpus if the basic word is polysemous.);
and clustering, by the one or more processors, the first basic word with a second basic word based on a degree of similarity (Section 2, lines 13-18, "In the present paper, we are interested in semantic category induction. Instead of grouping instances of a word into distinct senses, or documents into topics, the goal of semantic categorization is to group unique words into semantically related clusters."; Section 5.2, lines 43-49, "MPro BERT, by contrast, puts freeze in two clusters: one related to cooking (as in the ground truth) and another cluster with words like stop, delay, arrest and restrict, which seems to correspond to the figurative sense of freeze. Thus factoring out different senses allows MPro BERT to give a more accurate and reasonable categorization."; Grouping unique words into semantically related clusters, where different senses of polysemous words are put in different clusters, reads on clustering the first basic word with a second basic word based on a degree of similarity.).
Soper is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao in view of Mikolov and Huang to incorporate the teachings of Soper to determine which sense of a word is being considered, represent the distinct senses of polysemous words, and group unique words into semantically related clusters, where different senses of polysemous words are put in different clusters. Doing so would allow for significantly improve the predictions of embedding models by accounting for polysemy (Soper; Section 7, lines 11-14).
Claim 6 – 7, 12 – 13 and 17 – 18 are rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Mikolov and Huang, and further in view of Kosaka (US Patent Application Publication No. 2023/0083959).
Regarding claim 6, Yao in view of Mikolov and Huang discloses the computer-implemented method as claimed in claim 1.
Yao further discloses:
wherein extracting the set of text from the second set of data sources associated with the first user further comprises: creating, by the one or more processors, a first word group from the set of text (Section 3.1, lines 1-3, "In this article, we propose to train personal word embeddings that mainly contain the user interested meanings to disambiguate the query keywords and achieve search results personalization."; Section 3.3, lines 1-6, "The personal word embeddings pre-trained from the user’s current query log have contained the user interests reflected in the search history. In real-world application scenarios, users will continuously issue new queries and their interests are dynamically changing in the streaming setting. To ensure that our personal word embeddings contain the latest user interests, we should fine-tune the personal word embeddings according to the newly issued queries along with the search process, keeping the ranking model fixed."; Query keywords from newly issued queries read on a first word group from the set of text.).
Yao in view of Mikolov and Huang does not specifically disclose: wherein extracting the set of text from the second set of data sources associated with the first user further comprises: dividing, by the one or more processors, the set of text into one or more words using morphological analysis.
Kosaka teaches:
dividing, by the one or more processors, the set of text into one or more words using morphological analysis (Paragraph 0078, lines 15-20, "Alternatively, for example, the character string obtaining unit 32 may execute the OCR process in units of divided words by dividing a text included in a document image of a non-fixed form document into words by using a well-known morphological analysis method.").
Kosaka is considered to be analogous to the claimed invention because it is in the same field of extracting words from text. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao in view of Mikolov and Huang to incorporate the teachings of Kosaka to divide a text from a document into words by using morphological analysis. Doing so would allow for extracting text from a document image obtained by using an image scanner device (Kosaka; Paragraph 0002, lines 1-9).
Regarding claim 7, Yao in view of Mikolov, Huang, and Kosaka discloses the computer-implemented method as claimed in claim 6.
Yao further discloses:
subsequent to extracting the set of text from the second set of data sources associated with the first user, processing, by the one or more processors, a known word from the first word group created (Section 3.1, lines 1-3, "In this article, we propose to train personal word embeddings that mainly contain the user interested meanings to disambiguate the query keywords and achieve search results personalization."; Section 3.3, lines 1-6, "The personal word embeddings pre-trained from the user’s current query log have contained the user interests reflected in the search history. In real-world application scenarios, users will continuously issue new queries and their interests are dynamically changing in the streaming setting. To ensure that our personal word embeddings contain the latest user interests, we should fine-tune the personal word embeddings according to the newly issued queries along with the search process, keeping the ranking model fixed."; Training personal word embeddings for query keywords from newly issued queries reads on processing a known word from the first word group created.).
Mikolov further teaches:
subsequent to extracting the set of text from the second set of data sources associated with the first user, processing, by the one or more processors, the unknown word from the first word group created (Column 4, lines 11-20, "This specification generally describes systems that can be used to generate numeric representations of words in a high-dimensional space. The numeric representations are continuous high-dimensional representations, i.e., words are represented by floating point numbers in a high-dimensional space, e.g., as high-dimensional vectors of floating point numbers. The systems can be trained so that positions of the representations in the high-dimensional space generated by the systems reflect semantic and syntactic similarities between the words they represent."; Column 5, lines 22-54, "FIG. 2 is a flow diagram of an example process 200 for predicting a word based on surrounding words. For convenience, the process 200 will be described as being performed by a system of one or more computers located in one or more locations. For example, a word prediction system, e.g., the word prediction system 200 of FIG. 2, appropriately programmed, can perform the process 200.The system obtains a set of input words (step 202). The set of input words are words from a sequence of words that includes an unknown word whose value is to be predicted. That is, if the sequence includes an unknown word at position t, the set of input words may be the words at position t-N, . . . , t-1, t+1, . . . , and t+N in the sequence. In some implementations, the input words are tokenized before being received by the system, e.g., so that known compounds are treated as a single word by the system. The system processes the words using an embedding function (step 204) to generate a numeric representation of the words. For example, the embedding function may be a combining embedding function. A combining embedding function maps each word in the sequence of words to a respective continuous high-dimensional representation, e.g., to a respective high-dimensional vector of floating point numbers, based on current parameter values of the embedding function, e.g., as stored in a lookup table, and then merges the respective floating point vectors into a single merged vector. The combining embedding function can merge the respective floating point vectors using a linear function, e.g., a sum, average, or weighted linear combination of the respective floating point vectors, or using a nonlinear function, e.g., a component-wise maximum or a norm-constrained linear combination, for example."; Predicting the value of a vector representation of an unknown word reads on processing the unknown word from the first word group created.).
Mikolov is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao in view of Mikolov, Huang, and Kosaka to further incorporate the teachings of Mikolov to predict the value of a vector representation of an unknown word. Doing so would allow for the positions of representations in high-dimensional space reflecting semantic and syntactic similarities between the words they represent (Mikolov; Column 4, lines11-20).
Regarding claim 12, arguments analogous to claim 6 are applicable.
Regarding claim 13, arguments analogous to claim 7 are applicable.
Regarding claim 17, arguments analogous to claim 6 are applicable.
Regarding claim 18, arguments analogous to claim 7 are applicable.
Claim 8, 14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Yao in view of Mikolov, Huang, and Kosaka, and further in view of Lin et al. (US Patent No. 10,846,319), hereinafter Lin.
Regarding claim 8, Yao in view of Mikolov, Huang, and Kosaka discloses the computer-implemented method as claimed in claim 7, but does not specifically disclose: wherein processing the unknown word from the first word group created further comprises: extracting, by the one or more processors, a third basic word from the first word group created; classifying, by the one or more processors, the third basic word into a basic word group; and calculating, by the one or more processors, the average vector for the basic word group.
Lin teaches:
extracting, by the one or more processors, a third basic word from the first word group created; classifying, by the one or more processors, the third basic word into a basic word group; and calculating, by the one or more processors, the average vector for the basic word group (Column 2, line 55 - Column 3, line 2, "Accordingly, techniques and systems for online dictionary extension of word vectors are described that are configured to provide online extension of existing word vector dictionaries. These techniques support extending a word vector dictionary to include a new word without retraining the word vector embedding or altering the previously computed word vectors. To do so, the dictionary extension system receives an existing word vector dictionary and a new word not included in the word vector dictionary. Association and co-occurrence information is then estimated by the dictionary extension system for the new word with respect to the existing words in the word vector dictionary. This is done by estimating co-occurrence information for a large word set based on the existing word vector dictionary and sparse co-occurrence information over a small word set."; Column 8, line 66 - Column 9, line 19, "A new word vector is approximated for the input word based on word vectors from the set of word vectors that are associated with the one or more words (block 408). Continuing the on-going example, both ‘dog’ and ‘hound’ are associated with word vectors in the set of word vectors. Based on the relationship between ‘wolf’ and each of ‘dog’ and ‘hound’, a word vector for ‘wolf’ is approximated based on the word vector for ‘dog’ and the word vector for ‘hound’. This can be performed, for instance, by the vector approximation module 116 as described in greater detail with respect to FIG. 3. The approximation is performed based on the existing word vectors but without altering the existing word vectors. For example, the word vectors for ‘dog’ and ‘hound’ are unchanged by the approximation, and new word vectors for ‘dog’ and ‘hound’ are not created. Instead, the existing word vectors are leveraged to approximate a ‘close-enough’ word vector for the input word. In the on-going example, ‘wolf’ is determined to be similar to both ‘dog’ and ‘hound’, and an approximate word vector for ‘wolf’ is created such as by creating a weighted average of the word vectors for ‘dog’ and ‘hound’."; Receiving a new word not included in a word vector dictionary reads on extracting a third basic word from the first word group created, approximating a new word vector for the input word based on word vectors from a set of word vectors that are associated with the word reads on classifying the third basic word into a basic word group, and creating a weighted average of the associated words reads on calculating the average vector for the basic word group.).
Lin is considered to be analogous to the claimed invention because it is in the same field of vector word representations. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Yao in view of Mikolov, Huang, and Kosaka to incorporate the teachings of Lin to receive a new word not included in a word vector dictionary, approximate a new word vector for the input word based on word vectors from a set of word vectors that are associated with the word, and create a weighted average of the associated words. Doing so would allow for a system utilizing word vectors to incorporate new words in a computationally cheap and efficient manner (Lin; Column 1, line 58 - Column 2, line 3).
Regarding claim 14, arguments analogous to claim 8 are applicable.
Regarding claim 19, arguments analogous to claim 8 are applicable.
Allowable Subject Matter
Claim 9 – 10, 15 and 20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The primary reason claim 9 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims is the inclusion of the limitations “responsive to determining the distance does exceed a first threshold, registering, by the one or more processors, the known word in the first personal corpus as a polysemous word” and “responsive to determining the distance does not exceed the first threshold, updating, by the one or more processors, the vector of the known word by replacing the vector of the known word with an average of the vector of the known word and the average vector for the basic word group”, in combination with the limitations to create a basic corpus for a user using a first set of data sources, where the basic corpus includes one or more basic words and one or more vectors of the one or more basic words, extract a set of text from a second set of data sources associated with the user, update the basic corpus in response to finding an unknown word included in the set of text extracted, where the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus, and determine a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus in response to finding a known word included in the set of text extracted.
The primary reason claim 10 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims is the inclusion of the limitations “obtaining, by the one or more processors, a plurality of unique words other than the basic words from the second set of data sources associated with the first user”, “determining, by the one or more processors, among the plurality of unique words, one or more common words are included in a second personal corpus of the second user”, “extracting, by the one or more processors, a second word group and a third word group having a vector close to a common word of the one or more common words included in the first personal corpus of the first user and the second personal corpus of the second user, respectively” and “responsive to the similarity between the second word group and the third word group not exceeding a second threshold, sending, by the one or more processors, a notification to the first user, or sending, by the one or more processors, a word that has the vector close to the common word and is selected from the first word group to the second user together with a set of textual information”, in combination with the limitations to create a basic corpus for a user using a first set of data sources, where the basic corpus includes one or more basic words and one or more vectors of the one or more basic words, extract a set of text from a second set of data sources associated with the user, update the basic corpus in response to finding an unknown word included in the set of text extracted, where the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus, and determine a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus in response to finding a known word included in the set of text extracted.
The primary reason claim 15 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims is the inclusion of the limitations “responsive to determining the distance does exceed a first threshold, program instructions to register the known word in the first personal corpus as a polysemous word” and “responsive to determining the distance does not exceed the first threshold, program instructions to update the vector of the known word by replacing the vector of the known word with an average of the vector of the known word and the average vector for the basic word group”, in combination with the limitations to create a basic corpus for a user using a first set of data sources, where the basic corpus includes one or more basic words and one or more vectors of the one or more basic words, extract a set of text from a second set of data sources associated with the user, update the basic corpus in response to finding an unknown word included in the set of text extracted, where the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus, and determine a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus in response to finding a known word included in the set of text extracted.
The primary reason claim 20 would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims is the inclusion of the limitations “responsive to determining the distance does exceed a first threshold, program instructions to register the known word in the first personal corpus as a polysemous word” and “responsive to determining the distance does not exceed the first threshold, program instructions to update the vector of the known word by replacing the vector of the known word with an average of the vector of the known word and the average vector for the basic word group”, in combination with the limitations to create a basic corpus for a user using a first set of data sources, where the basic corpus includes one or more basic words and one or more vectors of the one or more basic words, extract a set of text from a second set of data sources associated with the user, update the basic corpus in response to finding an unknown word included in the set of text extracted, where the basic corpus is updated by replacing a vector of the unknown word with an average vector of the one or more basic words in the basic corpus created and registering the unknown word in a first personal corpus, and determine a distance between a vector of the known word and the average vector of the one or more basic words in order to register the known word in the first personal corpus in response to finding a known word included in the set of text extracted.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JAMES BOGGS/Examiner, Art Unit 2657
/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657