DETAILED ACTION
This communication is responsive to the application # 18/331,530 filed on June 08, 2023. Claims 1-20 are pending and are directed toward CANONICAL TRANSFORMATIONS USING MACHINE LEARNING LANGUAGE MODEL.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-6 and 10-18 are rejected under 35 U.S.C. 103 as being unpatentable over DeFelice (US 2020/0110809, Pub. Date: Apr. 9, 2020), in view of Mena et al., (LEARNING LATENT PERMUTATIONS WITH GUMBELSINKHORN NETWORKS, ICLR 2018, 22 pages), in view of Imani et al., (Representation Alignment in Neural Networks, arXiv, 17 Sep 2022, 26 pages), hereinafter referred to as DeFelice, Mena, and Imani.
As per claim 1, DeFelice teaches a computer-implemented method comprising:
generating, by one or more processors and using a machine learning prediction model, a canonical representation for an input dataset (Correlation component 310 integrates multiple data representing the same real-world object or concept into a canonical representation that has a known value inside the system. This can refer both to the correlation of raw data as well as the correlation of higher-level information constructs. For each piece of information that is to be correlated, there are three steps: recognition of a correlatable data representation, conversion of the data representation into a canonical form, and linking of all representations of the same underlying data to the canonical form. DeFelice, [0057]), wherein the machine learning prediction model is previously trained (Although the bootstrapping and training of the system is described here, it is anticipated that ongoing information and results will be reintroduced to the system as new data, providing an ongoing learning loop. DeFelice, [0049]) by:
DeFelice does not explicitly teach permutations, However Mena teaches permutations (We construct a layer that encodes the representation of a permutation, and show how to train networks containing such layers as intermediate representations. Mena, page 3).
DeFelice in view of Mena are analogous art to the claimed invention, because they are from a similar field of endeavor of systems, components and methodologies for the performance of any automated operation using empirical data in electronic form for classifying, analyzing, monitoring, or carrying out calculations on the data to produce a result or event. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify DeFelice in view of Mena. This would have been desirable because Permutations and matchings are core building blocks in a variety of latent variable models, as they allow us to align, canonicalize, and sort data (Mena, Abstract, page 1).
DeFelice in view of Mena further teaches:
generating a plurality of permutative input embeddings for a training dataset based on a plurality of canonical data entity features (a canonical representation, including both the data type and an associated interpretation is chosen for each objective data element provided to the correlation component 310. Sometimes received information is labeled in such a way as to make it clear that it refers to a particular correlatable piece of objective data, such as labeling a data field with the string "street address." Other times, the format of the data can be recognized (such as a set of words that could be a city name, followed by a known state code, followed by a five or nine-digit group of integers). This can be recognized by a state machine, NFA, or DFA, as represented by a regular expression or similar, or by a neural network trained on similar data to recognize particular inputs. DeFelice, [0057]), wherein each permutative input embedding of the plurality of permutative input embeddings corresponds to a different sequence of the plurality of canonical data entity features (The number of input nodes is regularized to the dimensionality of the factual model 302, and include inputs for sequences of word embeddings (from information retrieved from the Internet) as well as inputs corresponding to locations (and location history), gender, profession (and professional history), and previous interactions. DeFelice, [0082]);
generating a latent representation based on the plurality of permutative input embeddings (A VAE consists of paired encoder and generator networks which encode a text into to a latent representation and generate samples from the latent space, respectively. DeFelice, [0007]);
DeFelice in view of Mena does not teach alignment, Imani however teaches alignment (we study feature transfer first in a controlled synthetic benchmark and then in pre-trained CNNs and find that positive and negative transfer can be traced to an increase or decrease in alignment between the learned representations and the target task. Imani, page 2),
DeFelice in view of Mena in view Imani are analogous art to the claimed invention, because they are from a similar field of endeavor of systems, components and methodologies for the performance of any automated operation using empirical data in electronic form for classifying, analyzing, monitoring, or carrying out calculations on the data to produce a result or event. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify DeFelice in view of Mena in view Imani. This would have been desirable because We investigate this representation alignment phenomenon in a variety of neural network architectures and find that (a) alignment emerges across a variety of different architectures and optimizers, with more alignment arising from depth (b) alignment increases for layers closer to the output and (c) existing high-performance deep CNNs exhibit high levels of alignment. We then highlight why alignment between the top singular vectors and the targets can speed up learning and show in a classic synthetic transfer problem that representation alignment correlates with positive and negative transfer to similar and dissimilar tasks (Imani, Abstract, page 1).
DeFelice in view of Mena in view Imani further teaches generating an alignment vector representation for the training dataset based on a comparison between the latent representation and a canonical data map (The encoder 22 network encodes the words within the source text 201 as a list of vectors, where each vector represents the contextual meaning of the words within the text, including in the context of their position within the statement and paragraph, encoding the latent distribution in one or more hidden layers at 223. The encoder also takes as input information associated with the prospect model, shown as arrow 217. DeFelice, [0046]);
generating an output vector for the training dataset based on the alignment vector representation (Once each sentence in the source text 201 is read, the decoder 224 begins, generating a series of equivalent sentences by sampling from the latent distribution implied by the source text. To generate the translated word at each step, the decoder pays attention to a weighted distribution over the encoded word and sentence vectors judged most relevant to generate the English word most appropriate for the particular place in the sentence. DeFelice, [0046]);
generating, using a loss function, a model loss for the machine learning prediction model based on the output vector and a labeled vector for the training dataset (Each RNN cell 1105 computes the overall loss of the network on a single pair. It runs the network over the input, for each input, computes the distribution of possible outputs, and computes the cross-entropy loss for each character. DeFelice, [0088]); and
updating one or more parameters of the machine learning prediction model based on the model loss (The response of the targeted reader is reported back to the CRM system 216 (shown as arrow 234) and prospect modeling component 210 (shown as arrow 236) where the response is used to update the model of the prospect for use in the next evaluation, creating a feedback loop allowing for the updating of the prospect model 214 as well as a higher-quality future "translation" of the source text 201 into an effective generated text 203. DeFelice, [0045]).
As per claim 2, DeFelice in view of Mena in view Imani teaches the computer-implemented method of claim 1 further comprising: receiving the input dataset from a third-party data source, wherein the input dataset comprises one or more data fields associated with inconsistent metadata that is indicative of one or more field descriptions or one or more column values that are specific to the third-party data source (DeFelice, [0049]).
As per claim 3, DeFelice in view of Mena in view Imani teaches the computer-implemented method of claim 1, wherein the latent representation is generated using one or more neural network layers of the machine learning prediction model, wherein the latent representation is indicative of a plurality of feature weights for each of the plurality of canonical data entity features (DeFelice, [0046]).
As per claim 4, DeFelice in view of Mena in view Imani teaches the computer-implemented method of claim 3, wherein the training dataset comprises a plurality of data fields and the plurality of feature weights comprise one or more feature weights between each of the plurality of data fields and each of the plurality of canonical data entity features (DeFelice, [0066]).
As per claim 5, DeFelice in view of Mena in view Imani teaches the computer-implemented method of claim 3, wherein the one or more neural network layers of the machine learning prediction model comprise a bidirectional recurrent neural network (Forward/Backward RNN, DeFelice, Fig. 11c).
As per claim 6, DeFelice in view of Mena in view Imani teaches the computer-implemented method of claim 1, wherein the alignment vector representation is based on a dot product between the latent representation and the canonical data map (Before giving a formal definition, let us visualize these values. In Figure 2 (a) we sampled 10000 points from the first two classes of the MNIST dataset (5000 points from each class) and plotted the singular values of X, the original features in the dataset, and the squared dot product between the label vector y and the corresponding left singular values u1:n. The dot product is noticeably large for the top few singular vectors, and drops once the singular values become small. This need not always be the case. We created the same plot for shuffled labels in Figure 2b. As the association between the features and the labels in MNIST dataset is lost, the label vector is more or less uniformly aligned with all the singular vectors. Imani, page 4).
DeFelice in view of Mena in view Imani are analogous art to the claimed invention, because they are from a similar field of endeavor of systems, components and methodologies for the performance of any automated operation using empirical data in electronic form for classifying, analyzing, monitoring, or carrying out calculations on the data to produce a result or event. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify DeFelice in view of Mena in view Imani. This would have been desirable because We investigate this representation alignment phenomenon in a variety of neural network architectures and find that (a) alignment emerges across a variety of different architectures and optimizers, with more alignment arising from depth (b) alignment increases for layers closer to the output and (c) existing high-performance deep CNNs exhibit high levels of alignment. We then highlight why alignment between the top singular vectors and the targets can speed up learning and show in a classic synthetic transfer problem that representation alignment correlates with positive and negative transfer to similar and dissimilar tasks (Imani, Abstract, page 1).
Claims 1-6 and 10-18 have limitations similar to those treated in the above rejection, and are met by the references as discussed above, and are rejected for the same reasons of obviousness as used above.
Claims 7-9, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over DeFelice (US 2020/0110809, Pub. Date: Apr. 9, 2020), in view of Mena et al., (LEARNING LATENT PERMUTATIONS WITH GUMBELSINKHORN NETWORKS, ICLR 2018, 22 pages), in view of Imani et al., (Representation Alignment in Neural Networks, arXiv, 17 Sep 2022, 26 pages), in view of SHAIB et al. (US 2021/0183484, Pub. Date: Jun. 17, 2021), hereinafter referred to as DeFelice, Mena, Imani, and SHAIB.
As per claim 7, DeFelice in view of Mena in view Imani teaches the computer-implemented method of claim 1, wherein generating the output vector for the training dataset comprises: generating, using a sigmoid function, a hidden state output for the alignment vector representation, generating, using an activation function, a refined hidden state output, and generating the output vector based on the refined hidden state output.but does not teach sigmoid, SHAIB, however teaches sigmoid (Applicants built a model BERT (pretrained base-cased model) and latent attention. Applicants fed static token embeddings from BERT 108 to the latent attention layer 120, which output sequence representations to be used for regression through a linear layer with sigmoid activation. Applicants train the model for 20 epochs and select the best performing one for testing. SHAIB, [0228]).
DeFelice in view of Mena in view Imani in view of SHAIB are analogous art to the claimed invention, because they are from a similar field of endeavor of systems, components and methodologies for the performance of any automated operation using empirical data in electronic form for classifying, analyzing, monitoring, or carrying out calculations on the data to produce a result or event. It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to modify DeFelice in view of Mena in view Imani in view of SHAIB. This would have been desirable because The output of the evaluator can be used both to disqualify a particular candidate text (for failing one or more binary classifiers or for falling too far outside an acceptable range on a Gaussian classifier) but it can also be used as part of a feedback loop for the encoder 222, shown as arrow 229(b), but also as an input to the discriminator 226, shown as arrow 227 (DeFelice, [0047]).
As per claim 8, DeFelice in view of Mena in view Imani in view SHAIB teaches the computer-implemented method of claim 7, wherein the activation function comprises a softmax function (softmax 1220, DeFelice, Fig. 12a).
As per claim 9, DeFelice in view of Mena in view Imani in view SHAIB teaches the computer-implemented method of claim 7, wherein the output vector for the training dataset comprises a dot product between the refined hidden state output and the canonical data map (In one embodiment, this is measured as the cosine of the angle between the two vectors. DeFelice, [0064]).
Claims 19 and 20 have limitations similar to those treated in the above rejection, and are met by the references as discussed above, and are rejected for the same reasons of obviousness as used above.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to OLEG KORSAK whose telephone number is (571)270-1938. The examiner can normally be reached on Monday-Friday 7:30am - 5:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rupal Dharia can be reached on (571) 272-3880. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/OLEG KORSAK/
Primary Examiner, Art Unit 2492