DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
1. Claims 1-15 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claims 1, 14, and 15, “A method”, “An apparatus”, and “a computer program product” are recited in claims 1, 14, and 15 respectively. Claims 1 and 14 are directed to one of the four statutory categories of invention (process and machine; Step 1: YES), while claim 15 is directed to a non-statutory category of invention (see rejection of claim 15 for being directed to non-statutory subject matter below). Claims 1, 14, and 15, under their broadest reasonable interpretation, recite mental processes or mathematical concepts which fall into the category of abstract idea (Step 2A Prong 1: YES).
The following limitations, under their broadest reasonable interpretation, recite mental processes or mathematical concepts:
obtaining a target sentence: a person observes and reads a particular sentence
generating an initial target sentence representation of the target sentence…: a person can write down initial information about a target sentence using pen and paper
…pretrained through a contrastive context prediction mechanism: training utilizing a contrastive context prediction mechanism amounts to a mathematical calculcation
generating a target sentence representation of the target sentence for cross-lingual retrieval based on the initial target sentence representation through cross-lingual calibration: a person writes down a representation using the initial target sentence representation that has been calibrated (e.g. written a particular way) for purposes of retrieving cross-lingually
Claims 1, 14, and 15 do not contain any additional elements which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). The only additional limitations are “generating…through an encoder, the encoder pretrained…” (claims 1, 14, and 15), “An apparatus…comprising: at least one processor; and a memory storing computer-executable instructions that, when executed, cause the at least one processor to” (claim 14), and “A computer program product…comprising a computer program that is executed by at least one processor for” (claim 15). These limitations are recited at a high level of generality and amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Therefore, claims 1, 14, and 15 are directed to abstract ideas.
Claims 1, 14, and 15 do not contain any additional elements which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations amount to mere instructions to implement the judicial exception using a generic computer. Even when viewed in combination, mere instructions to implement the judicial exception using a generic computer do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 1, 14, and 15 are not patent eligible.
Regarding claims 2-13, “The method” is recited, which is directed to one of the four statutory categories of invention (process; Step 1: YES). However, the claims, under their broadest reasonable interpretation, recite further mental processes or mathematical concepts which fall into the category of abstract idea (Step 2A Prong 1: YES).
The following limitations, under their broadest reasonable interpretation, recite further mental processes or mathematical concepts:
Claim 2:
wherein the target sentence is a sentence in a first language, and the target sentence representation is suitable for performing a cross-lingual retrieval task across the first language and a second language: a person writes down information about the target sentence in the first language using pen and paper, the information being written in a way to enable retrieving cross-lingually in a second language
Claim 3:
pretraining…through the contrastive context prediction mechanism with a training dataset: pretraining utilizing a contrastive context prediction mechanism amounts to mathematical calculations
wherein the training dataset is obtained through: obtaining a plurality of sentence pairs, each sentence pair including two sentences located in the same context window; and combining the plurality of sentence pairs into the training dataset: a person writes down multiple combined sentence pairs, with each pair having sentences in a same context window
Claim 4:
wherein the two sentences are two sentences in the same language: a person writes down sentences pairs that are both in a same language, like both in English.
Claim 5:
identifying a plurality of center sentences in at least one document; for each center sentence in the at least one document, extracting a context sentence from the context window, and combining the center sentence and the context sentence into a sentence pair corresponding to the center sentence; and obtaining the plurality of sentence pairs corresponding to the plurality of center sentences: a person reads a document and picks center sentences, and then selects context sentences in a context window (e.g. selects sentences close to the center sentence), and writes down each pair as the plurality of sentence pairs
Claim 6:
for each center pair in the plurality of sentence pairs, generating a sub-contrastive prediction loss corresponding to the sentence pair based on the contrastive context prediction mechanism; generating a contrastive prediction loss corresponding to the training dataset based on a plurality of sub-contrastive prediction loss corresponding to the plurality of sentence pairs; and optimizing the encoder through at least minimizing the contrastive prediction loss: generating sub-contrastive and contrastive losses in order to minimize a prediction loss amounts to mathematical calculations.
Claim 7:
predicting an initial center sentence representation of the center sentence…predicting an initial context sentence representation of the context sentence through…generating a center sentence representation of the center sentence based on the initial center sentence representation… generating a context sentence representation of the context sentence based on the initial context sentence representation…generating the sub-contrastive prediction loss based at least one the center sentence representation and the context sentence representation: the predicting and generating steps can be performed by a person (a person can write down initial center and context sentence representations and center and context sentence representations using those initial representations, using pen and paper). Generating the sub-contrastive prediction loss amounts to mathematical calculations.
Claim 7 contains the additional limitations “through the encoder”, “through a first projection head”, and “through a second projection head”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 8:
Claim 8 recites “wherein the first projection head includes at least a first batch normalization layer, the second projection head includes at least a second batch normalization layer, and the first batch normalization layer and the second batch normalization layer are in different batch normalization modes at the same time”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 9:
wherein the center sentence and the context sentence are sentences in a third language, a previous representation set corresponding to a previous training dataset is stored…and the generating the sub-contrastive prediction losses comprises: extracting a language-specific representation set for the third language from the previous representation set; and generating the sub-contrastive prediction loss based at least on the center sentence representation, the context sentence representation, and the language-specific representation set: a person can write down a language specific representation set for the third language from a previous representation set that was stored on paper. Generating the sub-contrastive prediction loss amounts to mathematical calculations.
Claim 9 contains the additional limitation “stored in a memory bank”, which amounts to mere instructions to implement the judicial exception using a generic computer.
Claim 10:
generating the target sentence representation through performing, on the initial sentence representation, at least one of shifting, scaling, and rotating: performing shifting/scaling/and rotating of sentence representations amounts to mathematical calculations
Claim 11:
wherein the target sentence is a sentence in a first language, and the shifting comprises: subtracting a predetermined mean from a current sentence representation, the predetermined mean computed based on a set of representations corresponding to a set of sentences in the first language: subtracting a mean from a sentence representations amounts to a mathematical calculcation
Claim 12:
wherein the target sentence is a sentence in a first language, and the scaling comprises: dividing a current sentence representation by a predetermined variance, the predetermined variance computed based on a set of representations corresponding to a set of sentences in the first language: dividing a sentence representation by variance amounts to a mathematical calculcation
Claim 13:
wherein the target sentence is a sentence in a first language, and the target sentence representation is to be used for performing a cross-lingual retrieval task across the first language and a second language, and the rotating comprises: rotating a current sentence representation based on a predetermined rotation matrix between the first language and the second language: rotating a sentence representation according to a rotation matrix amounts to a mathematical calculcation.
Claims 2-13 do not contain any additional limitations which integrate the judicial exception into a practical application (Step 2A Prong 2: NO). As discussed above, the only additional limitations are mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination, do not integrate the judicial exception into a practical application as they do not impose any meaningful limits on practicing the abstract idea. Therefore, claims 2-13 are directed to abstract ideas.
Claims 2-13 do not contain any additional limitations which amount to significantly more than the judicial exception (Step 2B: NO). As discussed above, the only additional limitations are mere instructions to implement the judicial exception using a generic computer, which even when viewed in combination, do not amount to significantly more than the judicial exception as they do not provide an inventive concept. Therefore, claims 2-13 are not patent eligible.
2. Claim 15 is rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim does not fall within at least one of the four categories of patent eligible subject matter because it recites a “computer program product”. The applicant’s specification does not provide a special definition for a “computer program product”; thus, using its plain meaning, the term includes data signals per se as one potential form of the media. As such, they are non-statutory subject matter.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
3. Claims 1-3, 6, and 14-15 are rejected under 35 U.S.C. 103 as being unpatentable over Niu et al. (US 2023/0153542 A1, hereinafter Niu) in view of Fu et al. (NPL ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs, hereinafter Fu).
Regarding claim 1, Niu discloses A method for sentence representation generation for cross-lingual retrieval, comprising: obtaining a target sentence (para. 0042 “In some examples, the cross-lingual transfer module 430, may receive an input 440, e.g., such as an input text in a source language and/or target language, via a data interface 415.”); generating an initial target sentence representation of the target sentence through an encoder (Fig. 4 “Cross-Lingual Transfer Module 230”; para. 0042 “The cross-lingual transfer module 430 may generate an output 450 such as an alignment with a sentence in the target language corresponding to the input 440.”), the encoder pretrained through a contrastive context prediction mechanism (para. 0029 “To address this challenge, a contrastive learning approach can be adopted and the aligner model 110 is trained on a classification task with in-batch negatives.”; para. 0035-0036 “At step 308, a loss objective is computed based on computed pairwise token-level similarities associated with the positive input pair and the plurality of negative input pairs. For example, the similarity scores are used as logits and pair each positive logit with all negative ones. These logits are then used to compute a contrastive loss among the positive pairs and the negative pairs. [0036] At step 310, the pretrained multi-lingual model is updated based on the loss objective.”); generating a target sentence representation of the target sentence for cross-lingual retrieval… (para. 0047 “In some embodiments, the aligner model may be tested via cross-lingual sentence retrieval tasks, which retrieve a matching sentence in the target language from a collection of sentences.”).
Niu does not specifically disclose [generating a target sentence representation of the target sentence for cross-lingual retrieval] based on the initial target sentence representation through cross-lingual calibration.
Fu teaches generating a target sentence representation based on the initial target sentence representation through cross-lingual calibration (Fig. 2; pg. 4 section 3.4 “With regard to obtaining the sentence represntations, we adopt pre-trained word vectors…we adopt simple (weighted) averages of word vectors, which are surprisingly powerful, although our method could also be applied to other sentence embedding methods. Given source sentence embeddings {x} and target sentence embeddings {y} acquired as above, we can train the generators Gx and Gy through the joint loss function in Equation 5. Subsequently, we evaluate the obtained transformation via standard sentence retrieval task. For each source sentence embedding, we compute its k neural neighbors in terms of the distance function fd among all target embeddings. The corresponding k target sentences are regarded as the candidate set of mapping results.”).
Niu and Fu are considered to be analogous to the claimed invention as
they both are in the same field of cross-lingual embedding. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu to incorporate the teachings of Fu in order to generate a target sentence representation of the target sentence for cross-lingual retrieval specifically based on the initial target sentence representation through cross-lingual calibration. Doing so would improve performance on retrieval tasks (pg. 8, Conclusion).
Regarding claim 2, Niu in view of Fu discloses wherein the target sentence is in a first language (Niu, para. 0042 “In some examples, the cross-lingual transfer module 430, may receive an input 440, e.g., such as an input text in a source language and/or target language, via a data interface 415.”), and the target sentence representation is suitable for performing a cross-lingual retrieval task across the first language and a second language (Fu, pg. 4 section 3.4 “Given source sentence embeddings {x} and target sentence embeddings {y} acquired as above, we can train the generators Gx and Gy through the joint loss function in Equation 5. Subsequently, we evaluate the obtained transformation via standard sentence retrieval task. For each source sentence embedding, we compute its k neural neighbors in terms of the distance function fd among all target embeddings. The corresponding k target sentences are regarded as the candidate set of mapping results.”; pg. 4, section 4.1 “We focus on German and English as well as Spanish and English translation retrieval.”).
Niu and Fu are considered to be analogous to the claimed invention as
they both are in the same field of cross-lingual embedding. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu to incorporate the teachings of Fu in order to specifically have the target sentence representation be suitable for performing a cross-lingual retrieval task across the first and second languages, for the same rationale as given in claim 1.
Regarding claim 3, Niu in view of Fu discloses pretraining the encoder through the contrastive context prediction mechanism with a training dataset (Niu, para. 0029 “To address this challenge, a contrastive learning approach can be adopted and the aligner model 110 is trained on a classification task with in-batch negatives.”; para. 0030-0031 “Fig. 3 is a simplified logic flow diagram illustrating a method of training an aligner model…At step 302, a training dataset is received…”), wherein the training dataset is obtained through: obtaining a plurality of sentence pairs, each sentence pair including two sentences located in the same context window (para. 0032 “In one embodiment, the training data set may be (1) an English-centered dataset such as OPUS-100; (2) a non-English-centered language dataset, e.g., the v2021-08-07 Tatoeba Challenge. OPUS-100 is English-centered, meaning that all training pairs include English on either the source or target side. The corpus covers 100 languages (including English). The languages for training are selected based on the volume of parallel data available in OPUS. The OPUS collection is comprised of multiple corpora, ranging from movie subtitles to GNOME documentation to the Bible. OPUS-100 contains approximately 55 M sentence pairs. For example, 99 language pairs are chosen for training the aligner model, 44 of which are chosen from 1M sentence pairs of training data, 73 chosen from at least 100 k, and 95 chosen from at least 10 k. Following OPUS-100's choice, the training data for each language pair in New-Tatoeba is capped at 1 M to make it easier to compare with OPUS-trained models.”); and combining the plurality of sentence pairs into the training dataset (sentence pairs selected to make up training data: para. 0032 “In one embodiment, the training data set may be (1) an English-centered dataset such as OPUS-100; (2) a non-English-centered language dataset, e.g., the v2021-08-07 Tatoeba Challenge. OPUS-100 is English-centered, meaning that all training pairs include English on either the source or target side. The corpus covers 100 languages (including English). The languages for training are selected based on the volume of parallel data available in OPUS. The OPUS collection is comprised of multiple corpora, ranging from movie subtitles to GNOME documentation to the Bible. OPUS-100 contains approximately 55 M sentence pairs. For example, 99 language pairs are chosen for training the aligner model, 44 of which are chosen from 1M sentence pairs of training data, 73 chosen from at least 100 k, and 95 chosen from at least 10 k. Following OPUS-100's choice, the training data for each language pair in New-Tatoeba is capped at 1 M to make it easier to compare with OPUS-trained models.”).
Regarding claim 6, Niu in view of Fu discloses for each sentence pair in the plurality of sentence pairs, generating a sub-contrastive prediction loss corresponding to the sentence pair based on the contrastive context prediction mechanism (para. 0029 “To address this challenge, a contrastive learning approach can be adopted and the aligner model 110 is trained on a classification task with in-batch negatives. For example, for the batch of sentences S… in a source language, and a batch of sentences T… in a target language, wither S.sub.i is aligned with T.sub.i for each I, a pairwise semantic similarity between S and T is computed to obtain N similarities for the positive alignments, and N.sup.2-N similarities for the negative ones (in total N.sup.2 similarities computed). During training, these similarity scores are used as logits and pair each positive log with all negative ones. These logits are then sed to compute the contrastive loss 120… ”); generating a contrastive prediction loss corresponding to the training dataset based on a plurality of sub-contrastive prediction loss corresponding to the plurality of sentence pairs (para. 0029 “To address this challenge, a contrastive learning approach can be adopted and the aligner model 110 is trained on a classification task with in-batch negatives. For example, for the batch of sentences S… in a source language, and a batch of sentences T… in a target language, wither S.sub.i is aligned with T.sub.i for each I, a pairwise semantic similarity between S and T is computed to obtain N similarities for the positive alignments, and N.sup.2-N similarities for the negative ones (in total N.sup.2 similarities computed). During training, these similarity scores are used as logits and pair each positive log with all negative ones. These logits are then used to compute the contrastive loss 120… ”); and optimizing the encoder through at least minimizing the contrastive prediction loss (para. 0036 “At step 310, the pretrained multi-lingual model is updated based on the loss objective.”; para. 0029 “These logits are then used to compute the contrastive loss 120, which is then used to update the aligner model 110 via backpropagation path 125.”).
Regarding claim 14, claim 14 is a system claim with limitations similar to those recited in method claim 1, and is thus rejected under similar rationale.
Additionally, Niu teaches An apparatus for sentence representation generation for cross-lingual retrieval, comprising: at least one processor; and a memory storing computer-executable instructions that, when executed, cause the at least one processor to (para. 0006 “According to a third aspect of the disclosure, an electronic device is provided and includes: at least one processor; and a memory communicatively connected to the at least one processor; in which the memory is stored with instructions executable by the at least one processor, and when the instructions are performed by the at least one processor, the at least one processor is caused to perform the method for generating a cross-lingual textual semantic model according to the first aspect, or the method for determining a textual semantic according to the second aspect.”).
Regarding claim 15, claim 15 is a computer product claim with limitations similar to those recited in method claim 1, and is thus rejected under similar rationale.
Additionally, Niu discloses A computer program product for sentence representation generation for cross-lingual retrieval, comprising a computer program that is executed by at least one processor for (para. 0007 “According to a fourth aspect of disclosure, a non-transitory computer-readable storage medium stored with computer instructions is provided, in which the computer instructions are configured to cause a computer to perform the method for generating a cross-lingual textual semantic model according to the first aspect or perform the method for determining a textual semantic according to the second aspect.”).
4. Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Niu in view of Fu, and further in view of Goswami et al. (NPL Cross-Lingual Sentence Embedding using Multi-Task Learning, hereinafter Goswami).
Regarding claim 4, Niu in view of Fu does not specifically disclose wherein the two sentences are two sentences in the same language.
Goswami teaches wherein the two sentences are two sentences in the same language (pg. 3 section 3. “In this section, we describe the components and working of the proposed Dual Encoder with Anchor Model (DuEAM) architecture for multilingual sentence embeddings, trained using an unsupervised multi-task joint loss function…”; pg. 5 section 5 “Weakly Supervised Data”: “This training data contains both monolingual and cross-lingual sentence pairs, where the monolingual sentence pairs are same as those of XNLI (without annotated labels)…”).
Niu, Fu, and Goswami are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu in view of Fu to incorporate the teachings of Goswami in order to have the two sentences be sentences in the same language. Doing so would be beneficial, as this would allow the model to learn semantic textual similarity between monolingual datasets which is a major task for sentence embedding (pg. 5, section 6.1, 1st para.).
5. Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Niu in view of Fu, and further in view of Kim (NPL [Part 1] Predicting on Text Pairs with Transformers: Cross-Encoding with BERT).
Regarding claim 5, Niu in view of Fu does not specifically disclose identifying a plurality of center sentences in at least one document; for each center sentence in the plurality of center sentences, determining a context window centered on the center sentence in the at least one document, extracting a context sentence form the context window, and combining the center sentence and the context sentence into a sentence pair corresponding to the center sentence, and obtaining the plurality of sentence pairs corresponding to the plurality of center sentences.
Kim teaches identifying a plurality of center sentences in at least one document (pg. 4 4th para. “…one of the main ways it trained itself was through looking at sentence pairs…From their massive text data-set”; see figure on pg. 4 “Sentence 1” reads on a center sentence in a document (data-set corpus)); for each center sentence in the plurality of center sentences, determining a context window centered on the center sentence in the at least one document (pg. 4, 5th para. “…they made pairs on sentences that were next to each other in the corpus…”; context window is sentence that is adjacent to a first sentence), extracting a context sentence form the context window (pg. 4, 5th para. “…they made pairs on sentences that were next to each other in the corpus…”; context window is sentence that is adjacent to a first sentence; see figure on pg. 4, “Sentence 2”), and combining the center sentence and the context sentence into a sentence pair corresponding to the center sentence (pg. 4, 5th para. “From their massive text data-set, they made these pairs of sentences that were next to each other in the corpus…”) and obtaining the plurality of sentence pairs corresponding to the plurality of center sentences (pg. 4, 5th para. “From their massive text data-set, they made these pairs of sentences that were next to each other in the corpus…”; see pseudo-code, sentence pairs used for training model).
Niu, Fu, and Kim are considered to be analogous to the claimed invention as
they both are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu in view of Fu to incorporate the teachings of Kim in order to identify a plurality of center sentences in at least one document, determine a context window centered on the center sentence, extract a context sentence from the context window, and combine the center and context sentences to obtain the plurality of sentence pairs. Doing so would enable the model to learn the notion of text sequences (Kim, pg. 4).
6. Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Niu in view of Fu, and further in view of Chen et al., (NPL “A Simple Framework for Contrastive Learning of Visual Representations”, hereinafter Chen).
Regarding claim 7, Niu in view of Fu discloses wherein the sentence pair includes a center sentence and a context sentence (para. 0032 “In one embodiment, the training data set may be (1) an English-centered dataset such as OPUS-100; (2) a non-English-centered language dataset, e.g., the v2021-08-07 Tatoeba Challenge. OPUS-100 is English-centered, meaning that all training pairs include English on either the source or target side. The corpus covers 100 languages (including English). The languages for training are selected based on the volume of parallel data available in OPUS. The OPUS collection is comprised of multiple corpora, ranging from movie subtitles to GNOME documentation to the Bible. OPUS-100 contains approximately 55 M sentence pairs. For example, 99 language pairs are chosen for training the aligner model, 44 of which are chosen from 1M sentence pairs of training data, 73 chosen from at least 100 k, and 95 chosen from at least 10 k. Following OPUS-100's choice, the training data for each language pair in New-Tatoeba is capped at 1 M to make it easier to compare with OPUS-trained models.”), and the generating a sub-contrastive prediction loss corresponding to the sentence pair based on the contrastive context prediction mechanism comprises: predicting an initial center sentence representation of the center sentence through the encoder (para. 0029 “To address this challenge, a contrastive learning approach can be adopted and the aligner model 110 is trained on a classification task with in-batch negatives. For example, for the batch of sentences S… in a source language, and a batch of sentences T… in a target language, wither S.sub.i is aligned with T.sub.i for each I, a pairwise semantic similarity between S and T is computed to obtain N similarities for the positive alignments, and N.sup.2-N similarities for the negative ones (in total N.sup.2 similarities computed). During training, these similarity scores are used as logits and pair each positive log with all negative ones. These logits are then used to compute the contrastive loss 120… ”; para. 0034 “At step 306, a pretrained multi-lingual model may be used to compute a pairwise token-level similarity between the two sentences within each positive input pair or negative input pair. For example, the pairwise token-level similarity between two sentences may be computed as the BERT score described in relation to FIG. 2.”; this similarity computation requires a first sentence (Si, Fig. 2) to be embedded via the encoder 105 to obtain contextual embedding); predicting an initial context sentence representation of the context sentence through the encoder (para. 0029 “To address this challenge, a contrastive learning approach can be adopted and the aligner model 110 is trained on a classification task with in-batch negatives. For example, for the batch of sentences S… in a source language, and a batch of sentences T… in a target language, wither S.sub.i is aligned with T.sub.i for each I, a pairwise semantic similarity between S and T is computed to obtain N similarities for the positive alignments, and N.sup.2-N similarities for the negative ones (in total N.sup.2 similarities computed). During training, these similarity scores are used as logits and pair each positive log with all negative ones. These logits are then used to compute the contrastive loss 120… ”; para. 0034 “At step 306, a pretrained multi-lingual model may be used to compute a pairwise token-level similarity between the two sentences within each positive input pair or negative input pair. For example, the pairwise token-level similarity between two sentences may be computed as the BERT score described in relation to FIG. 2.”; this similarity computation requires a second sentence (Tj, Fig. 2) to be embedded via the encoder 105 to obtain contextual embedding); generating a center sentence representation of the center sentence based on the initial center sentence representation… (Fu teaches generating a first sentence representation utilizing an initial first sentence representation: Fig. 2 caption: “It learns two generators Gx and Gy to approximate the joint distribution of vectors from both language. Gx projects sentence embeddings x from language X to Y…”; see Gx(X) in Fig. 2); generating a context sentence representation of the context sentence based on the initial context sentence representation…(Fu teaches generating a second sentence representation utilizing an initial second sentence representation: Fig. 2 caption: “It learns two generators Gx and Gy to approximate the joint distribution of vectors from both language... while, conversely, Gy projects sentence embeddings y from language Y to X…”; see Gy(Y) in Fig. 2); and generating the sub-contrastive prediction loss based at least one the center sentence … and the context sentence…(Niu, para. 0029 “To address this challenge, a contrastive learning approach can be adopted and the aligner model 110 is trained on a classification task with in-batch negatives.”; para. 0035-0036 “At step 308, a loss objective is computed based on computed pairwise token-level similarities associated with the positive input pair and the plurality of negative input pairs. For example, the similarity scores are used as logits and pair each positive logit with all negative ones. These logits are then used to compute a contrastive loss among the positive pairs and the negative pairs. [0036] At step 310, the pretrained multi-lingual model is updated based on the loss objective.”).
Niu and Fu are considered to be analogous to the claimed invention as
they both are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu to incorporate the teachings of Fu in order to generate a center sentence representation of the center sentence based on the initial center sentence representation, and to generate a context sentence representation of the context sentence based on the initial context sentence representation. It would have been obvious to combine given the same rationale for claim 1.
However, Niu and Fu discloses generating a sub-contrastive loss using initial center and context sentence representations (representations that have not been passed through projection heads; see above claim mapping). Niu in view of Fu does not specifically disclose [generating a center sentence representation…] through a first projection head; [generating a context sentence representation…] through a second projection head; and [generating the sub-contrastive prediction loss based at least on the] first representation and the second representation.
Chen teaches generating a first representation through a first projection head (Fig. 2, hi passed through projection head g(.) to obtain zi; pg. 2 section 2.1, 3rd bullet pt: “A small neural network projection head g(.) that maps representations to the space where contrastive loss is applied.”; 4th bullet pt: “A contrastive loss function for a contrastive prediction task…Given a set {xk} including a positive pair of examples…”) and generating a second representation through a second projection head (Fig. 2, hj passed through projection head g(.) to obtain zj) and generating the sub-contrastive prediction loss based at least on the first representation and the second representation (pg. 2, last bullet pt. “A contrastive loss function defined for a contrastive prediction task. Given a set…including a positive pair of examples…the contrastive prediction task aims to identify xj…for a given xi”; Equation 1 utilizes first and second representations (zi and zj) for a sub-contrastive loss).
Niu, Fu, and Chen are considered to be analogous to the claimed invention as
Niu and Fu are in the same field of natural language processing and Chen is in the same field of contrastive learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu in view of Fu to incorporate the teachings of Chen in order to specifically obtain a center sentence and context sentence representation through a first and second projection head respectively, and to use these representations to generate the sub-contrastive loss. Doing so would improve the representation quality of the hidden representation preceding the projection head (pg. 6, section 4.2).
7. Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Niu in view of Fu and Chen, and further in view of He et al. (NPL Momentum Contrast for Unsupervised Visual Representation Learning, hereinafter He).
Regarding claim 8, Niu in view of Fu and Chen discloses the first projection head and the second projection head (Chen, g(.); see previous mapping). However, Niu in view of Fu and Chen does not specifically disclose [wherein the first projection head] includes at least a first batch normalization layer, [the second projection head includes at least a second batch normalization layer], and the first batch normalization layer and the second batch normalization layer are in different batch normalization modes at the same time.
He teaches a first encoder includes at least a first batch normalization layer (pg. 4, section 3.3 “Shuffling BN”: “Our encoders Fq and Fk both have Batch Normalization (BN) [37] as in the standard ResNet [33].”), the second projection head includes at least a second batch normalization layer (pg. 4, section 3.3 “Shuffling BN”: “Our encoders Fq and Fk both have Batch Normalization (BN) [37] as in the standard ResNet [33].”), and the first batch normalization layer and the second batch normalization layer are in different batch normalization modes at the same time (pg. 4, section 3.3 “Shuffling BN”: “In experiments, we found that using BN prevents the model from learning good representations, as similarly reported in [35] (which avoids BN). The model appears to ‘cheat’ the pretext task and easily finds a low-loss solution. This is possibly because the intra-batch communication among samples (caused by BN) leaks information. We resolve this problem by shuffling BN. We train with multiple GPUs and perform BN on the samples independelty for each GPU (as done in common practice). For the key encoder Fk, we shuffle the sample order in the current mini-batch before distributing it among GPUs (and shuffle back after encoding); the sample order of the mini-batch for the query encoder Fq is not altered…”).
Niu, Fu, Chen, and He are considered to be analogous to the claimed invention as Niu and Fu are in the same field of natural language processing and Chen and He are in the same field of contrastive learning. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu in view of Fu and Chen in order to specifically have the first projection head include a first batch normalization layer, the second projection head include a second batch normalization layer, and the first and second batch normalization layers be in different batch normalization modes at the same time. Doing do would be beneficial, as this would allow the contrastive training to benefit from batch normalization while avoiding the model from “cheating” and learning poor representations (He, pg. 4, section 3.3 “Shuffling BN”).
8. Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Niu in view of Fu and Chen, and further in view of Goswami and Sun et al. (NPL Contrastive Distillation on Intermediate Representations for Language Model Compression, hereinafter Sun).
Regarding claim 9, Niu in view of Fu and Chen discloses and the generating the sub-contrastive prediction losses comprises: …generating the sub-contrastive prediction loss based at least on the center sentence representation, the context sentence representation…(Niu, para. 0029 “To address this challenge, a contrastive learning approach can be adopted and the aligner model 110 is trained on a classification task with in-batch negatives. For example, for the batch of sentences S… in a source language, and a batch of sentences T… in a target language, wither S.sub.i is aligned with T.sub.i for each I, a pairwise semantic similarity between S and T is computed to obtain N similarities for the positive alignments, and N.sup.2-N similarities for the negative ones (in total N.sup.2 similarities computed). During training, these similarity scores are used as logits and pair each positive log with all negative ones. These logits are then sed to compute the contrastive loss 120… ”; Fu teaches generating the center sentence and context sentence representations: Fig. 2 caption: “It learns two generators Gx and Gy to approximate the joint distribution of vectors from both language. Gx projects sentence embeddings x from language X to Y, whie conversely, Gy projects sentence embeddings y from language Y to X”; see Gx(X) and Gy(Y) in Fig. 2).
Niu, Fu, and Chen are considered to be analogous to the claimed invention as
they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Fu in order to generate a center sentence and context sentence representation. It would have been obvious to combine given the same rationale for claim 1.
Niu in view of Fu and Chen does not specifically disclose wherein the center sentence and the context sentence are sentences in a third language.
Goswami teaches wherein the center sentence and the context sentence are sentences in a third language (pg. 3 section 3. “In this section, we describe the components and working of the proposed Dual Encoder with Anchor Model (DuEAM) architecture for multilingual sentence embeddings, trained using an unsupervised multi-task joint loss function…”; pg. 5 section 5 “Weakly Supervised Data”: “This training data contains both monolingual and cross-lingual sentence pairs, where the monolingual sentence pairs are same as those of XNLI (without annotated labels)…”).
Niu, Fu, Chen, and Goswami are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu in view of Fu and Chen to incorporate the teachings of Goswami in order to have the two sentences be sentences in the same third language. Doing so would be beneficial, as this would allow the model to learn semantic textual similarity between monolingual datasets which is a major task for sentence embedding (pg. 5, section 6.1, 1st para.).
Niu in view of Fu, Chen, and Goswami does not specifically disclose a previous representation set corresponding to a previous training dataset is stored in a memory bank…extracting a language-specific representation set for the third language from the previous representation set; and generating the sub-contrastive loss based at least on…the language-specific representation set.
Sun teaches a previous representation set corresponding to a previous training dataset is stored in a memory bank (pg. pg. 2 3rd para. “For efficient training, all data samples are stored in a memory bank…”)…extracting a language-specific representation set for the third language from the previous representation set (pg. 5 section 3.3 “Memory bank”: “For a positive pair…one needs to compute the intermediate representations for all negative samples…which requires K+1 times computation compared to normal training. A large number of negative samples is required to ensure performance...which renders large-scale contrastive distillation infeasible for practical use. To address this, we follow Wu et al. (2018) and use a memory bank M…to store the intermediate representation of all N training examples…”); and generating the sub-contrastive loss based at least on…the language-specific representation set (sub-contrastive loss based on stored representation: pg. 5, section 3.3 “…use a memory bank M…to store the intermediate representation of all N training examples, and the representation is only updated for positive samples in each forward propagation. Therefore, the training cost is roughly the same as in normal training. Specifically, assume the mini-batch size is 1, then at each training step, M is updated as (5) where m0 is the retrieved representation from memory bank M that corresponds to hs.sub.0…”; see Eq. 4 for contrastive loss).
Niu, Fu, Chen, Goswami, and Sun are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Wu in order to have a previous representation of set corresponding to a previous training dataset stored in a memory bank, to extract a language-specific representation set for the third language from the previous representation set, and to generate the sub-contrastive loss based at least on the language-specific representation set. Doing so would be beneficial, as this would enable large-scale contrastive learning which requires large numbers of negative training samples (Sun, pg. 5, section 3.3)
9. Claims 10-11 are rejected under 35 U.S.C. 103 as being unpatentable over Niu in view of Fu, and further in view of Zheng et al. (US 2023/0214177 A1, hereinafter Zheng)
Regarding claim 10, Niu in view of Fu teaches generating the target sentence representation (Fu, Fig. 2; pg. 4 section 3.4 “With regard to obtaining the sentence represntations, we adopt pre-trained word vectors…we adopt simple (weighted) averages of word vectors, which are surprisingly powerful, although our method could also be applied to other sentence embedding methods. Given source sentence embeddings {x} and target sentence embeddings {y} acquired as above, we can train the generators Gx and Gy through the joint loss function in Equation 5. Subsequently, we evaluate the obtained transformation via standard sentence retrieval task. For each source sentence embedding, we compute its k neural neighbors in terms of the distance function fd among all target embeddings. The corresponding k target sentences are regarded as the candidate set of mapping results.”). However, Niu in view of Fu does not specifically disclose generating the target sentence representation through performing, on the initial target sentence representation, at least one of shifting, scaling, and rotating.
Zheng teaches generating a representation through performing, on the initial target sentence representation, at least one of shifting, scaling, and rotating (para. 0124 “As shown in Fig. 3, at step 302, process 300 may include receiving at least one embedding set…”; para. 0128-0129 “As shown in Fig. 3, at step 304, process 300 may include applying mean centering…In some non-limiting embodiments or aspects, applying mean centering may include determining a mean based on all embedding vectors of the set of embedding vectors. Additionally or alternatively, the mean may be subtracted from each embedding vector of the set of embedding vectors.”).
Niu, Fu, and Zheng are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu in view of Fu in order to specifically obtain the target sentence representation through performing at least one of shifting, scaling, or rotating. Applying such preprocessing on embedding vectors would improve cross lingual word embedding performance (Zheng, para. 0003).
Regarding claim 11, Niu in view of Fu and Zheng discloses wherein the target sentence is a sentence in a first language (Niu, para. 0042 “In some examples, the cross-lingual transfer module 430, may receive an input 440, e.g., such as an input text in a source language and/or target language, via a data interface 415.”) and the shifting comprises: subtracting a predetermined mean from a current sentence representation, the predetermined mean computed based on a set of representations corresponding to a set of sentences in the first language (Niu teaches sentence representations set in a first language: Niu para. 0032 “In one embodiment, the training data set may be (1) an English-centered dataset such as OPUS-100; (2) a non-English-centered language dataset, e.g., the v2021-08-07 Tatoeba Challenge. OPUS-100 is English-centered, meaning that all training pairs include English on either the source or target side. The corpus covers 100 languages (including English)…”; Zheng teaches shifting using a predetermined mean for a particular set of embeddings: para. 0124 “As shown in Fig. 3, at step 302, process 300 may include receiving at least one embedding set…”; para. 0128-0129 “As shown in Fig. 3, at step 304, process 300 may include applying mean centering…In some non-limiting embodiments or aspects, applying mean centering may include determining a mean based on all embedding vectors of the set of embedding vectors. Additionally or alternatively, the mean may be subtracted from each embedding vector of the set of embedding vectors.”).
Niu, Fu, and Zheng are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu in view of Fu in order to specifically perform shifting by subtracting a predetermined mean from a current sentence representation, the predetermined mean computed based on a set of representations corresponding to a set of sentences in the first language. Doing so would have been obvious for the same rationale as given for claim 10.
10. Claims 12 is rejected under 35 U.S.C. 103 as being unpatentable over Niu in view of Fu and Zheng, and further in view of Zhao et al. (NPL Inducing Language-Agnostic Multilingual Representations, hereinafter Zhao).
Regarding claim 12, Niu in view of Fu and Zheng discloses wherein the target sentence is a sentence in a first language (Niu, para. 0042 “In some examples, the cross-lingual transfer module 430, may receive an input 440, e.g., such as an input text in a source language and/or target language, via a data interface 415.”) and a current sentence representation (Niu, Fig. 4 “Cross-Lingual Transfer Module 230”; para. 0042 “The cross-lingual transfer module 430 may generate an output 450 such as an alignment with a sentence in the target language corresponding to the input 440) and a set of sentences in the first language (Niu para. 0032 “In one embodiment, the training data set may be (1) an English-centered dataset such as OPUS-100; (2) a non-English-centered language dataset, e.g., the v2021-08-07 Tatoeba Challenge. OPUS-100 is English-centered, meaning that all training pairs include English on either the source or target side. The corpus covers 100 languages (including English)…). However, Niu in view of Fu and Zheng does not specifically disclose and the scaling comprises: dividing a current sentence representation by a predetermined variance, the predetermined variance computed based on a set of representations corresponding to a set of sentences in the first language.
Zhao teaches the scaling comprises: dividing an embedding by a predetermined variance, the predetermined variance computed based on a set of representations corresponding to a set of embeddings (pg. 4, section 3.2 “Vector space normalization”: “We add a batch normalization layer that constrains all embeddings of different language into a distribution with zero mean and unit variance…where ε is a constant value for numerical stability, µβ and σβ are mean and variance, serving as per batch statistics for each time step in a sequence…”; see Eq. 4, embedding f(I,s) divided by variance σ ).
Niu, Fu, Zheng, and Zhao are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu in view of Fu and Zheng in order to specifically perform scaling by dividing a current sentence representation by a predetermined variance, the predetermined variance computed based on a set of representations corresponding to a set of sentences in the first language. Doing so would be beneficial, as this would remove remove language identity signal such as variance, from multi-lingual embeddings and increasing the discriminativeness of the embeddings (Zhao, pg. 4, section 3.2).
11. Claims 13 is rejected under 35 U.S.C. 103 as being unpatentable over Niu in view of Fu and Zheng, and further in view of Liu et al. (US 2020/0372106 A1, hereinafter Liu).
Regarding claim 13, Niu in view of Fu and Zheng discloses wherein the target sentence is a sentence in a first language (Niu, para. 0042 “In some examples, the cross-lingual transfer module 430, may receive an input 440, e.g., such as an input text in a source language and/or target language, via a data interface 415.”), the target sentence representation is to be used for performing a cross-lingual retrieval task across the first language and a second language…(Fu, pg. 4 section 3.4 “Given source sentence embeddings {x} and target sentence embeddings {y} acquired as above, we can train the generators Gx and Gy through the joint loss function in Equation 5. Subsequently, we evaluate the obtained transformation via standard sentence retrieval task. For each source sentence embedding, we compute its k neural neighbors in terms of the distance function fd among all target embeddings. The corresponding k target sentences are regarded as the candidate set of mapping results.”; pg. 4, section 4.1 “We focus on German and English as well as Spanish and English translation retrieval.”).
Niu, Fu, and Zheng are considered to be analogous to the claimed invention as
they both are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention in order to specifically have the target sentence representation be suitable for performing a cross-lingual retrieval task across the first and second languages, for the same rationale as given in claim 1.
Niu in view of Fu and Zheng discloses a current sentence representation (Niu, Fig. 4 “Cross-Lingual Transfer Module 230”; para. 0042 “The cross-lingual transfer module 430 may generate an output 450 such as an alignment with a sentence in the target language corresponding to the input 440), but does not specifically disclose the rotating comprises: rotating a current sentence representation based on a predetermined rotation matrix between the first language and the second language.
Liu teaches the rotating comprises: rotating a representation based on a predetermined rotation matrix between the first language and the second language (para. 0043 “To provide additional details for an improved understanding of selected embodiments of the present disclosure, reference is now made to FIG. 4 which depicts a simplified illustration 400 of a sequence for aligning embeddings E1, E2 from different languages or domains into a shared embedding space. In FIG. 4A, there is shown two non- aligned sets of embeddings E1, E2 that are trained independently on monolingual data, including a first embedding E1 that includes English words and a second embedding E2 that includes Spanish words to be aligned/translated. Each dot represents a word in that space, and the size of the dot is proportional to the frequency of the words in the training corpus of that language. In FIG. 4B, the first embedding E1 is rotated into rough alignment with the second embedding E2, such as by using adversarial learning to learn a rotation matrix W for roughly aligning the two distributions. In FIG. 4C, the mapping rotation matrix W may be further refined using a geometric transformation, such as a Procrustes transformation, that involves only translation, rotation, uniform scaling, or a combination of these transformations whereby frequent words aligned by the previous step are used as anchor points to minimize an energy function that corresponds to a spring system between anchor points. The refined mapping rotation matrix W′ is then applied to the first embedding E1 to map all words in the dictionary. In FIG. 4D, the first embedding E1 is translated by using the mapping rotation matrix W′ and a distance metric that expands the space where there is high density of points (like the area around the word “cat”), so that “hubs” (like the word “cat”) become less close to other word vectors than they would otherwise (compared to the same region in FIG. 4A)…”).
Niu, Fu, Zheng, and Liu are considered to be analogous to the claimed invention as they are in the same field of natural language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Niu in view of Fu and Zheng in order to specifically perform rotating by rotating a current sentence representation based on a predetermined rotation matrix between the first language and the second language. Doing so would be beneficial, as this would align the monolingual embeddings from different language such that they are aligned in a shared space where words of high semantic similarity across language are close to each other (para. 0024, para. 0043).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Wu et al. (NPL DisCo: Effective Knowledge Distillation for Contrastive Learning of Sentence Embeddings): memory bank mechanism for contrastive learning of sentence embeddings (pg. 4, section 3.1, 2nd Col.)
Bhagavath et al. (US 12,019,984): multi-lingual out-of-domain detection (Fig. 10)
Han et al. (US 2023/0080904 A1): generating cross-lingual textual semantic model via contrastive learning (Fig. 3)
Fei et al. (US 2022/0318255 A1): cross-lingual language model for cross-lingual retrieval (Fig. 12)
Marcu & Munteanu (US 8,943,080 B2): identifying sentence pairs for generating training set (Fig. 5)
Niu & Zhou (US 2009/0024613 A1): cross-lingual query retrieval (Fig. 2)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CODY DOUGLAS HUTCHESON whose telephone number is (703)756-1601. The examiner can normally be reached M-F 8:00AM-5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached at (571)-272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CODY DOUGLAS HUTCHESON/Examiner, Art Unit 2659
/PIERRE LOUIS DESIR/Supervisory Patent Examiner, Art Unit 2659