DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment/Status of Claims
Claims 1-6, 8, 10, 12, 13, 15, 17, and 19 were amended.
Claims 7, 14, and 20 were canceled.
Claims 1-6, 8-13, and 15-19 are pending and examined herein.
Claims 1-6, 8-13, and 15-19 are rejected under 35 U.S.C. 103.
Response to Remarks/Arguments
Applicant notes that "During the interview, the Examiner agreed that the proposed claim amendment appears to include features not disclosed by the cited references, and indicated that a further search is needed." However, according to the Examiner Interview Summary dated 12/18/2025, no agreement was reached. The Examiner Interview Summary states “Examiner pointed to the primary reference used in the previous office action, Krishnan as also having a second model that could potentially map to the second model in the amended claims. Examiner noted that there is an intended difference between the claims and the prior art, as the first and second model in Krishnan both perform both actions that the first and second model each perform in the amended claims." Note that an intended difference does not change the scope of a claim.
Applicant’s arguments, see page 9, filed 12/22/2025, with respect to the objection to claim 1 have been fully considered and are persuasive. The objection to claim 1 has been withdrawn.
Applicant's arguments filed 12/22/2025 regarding the rejection of claim 1-20 under 35 U.S.C. 103 have been fully considered but they are not persuasive. Applicant argues, see pages 10-11 that "Krishnan does not teach or suggest a first machine learning model trained to generate user embeddings, and a second machine learning model trained to generate a ranked list of recommended items." Though Krishnan does not teach one model to specifically only user embeddings and another to only generate a ranked list, Krishnan does teach a first machine learning model trained to generate user embeddings, and a second machine learning model trained to generate a ranked list of recommended items, as there are at least two models that each perform each task. Krishnan teaches all amended limitations. See updated 35 U.S.C. 103 rejection below.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claim(s) 1, 2, 8, 9, 15, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan (US 2021/0110306 A1), AWS Neuron (“Data Parallel Inference on Torch Neuron”, 2022), Kim (“Optional: Data Parallelism”, 2017) and Wikipedia (“Sparse matrix”, 2022).
Regarding claim 1, Krishnan teaches
A system, comprising: ([0004] - [0005] states "FIG. 1A illustrates an example of a system architecture in accordance with various embodiments of the present disclosure. FIG. 1B illustrates an example of a computer system that may be used in conjunction with embodiments of the present disclosure."
at least one processor; and ([0037] states "Both the server 20 and the user devices 42 may include hardware components of typical computing devices, including a processor, input devices (e.g., keyboard, pointing device, microphone for voice commands, buttons, touch screen, etc.), and output devices ( e.g., a display device, speakers, and the like ). The server 20 and user devices 42 may include computer-readable media, e.g., memory and storage devices (e.g., flash memory, hard drive, optical disk drive, magnetic disk drive, and the like) containing computer instructions that implement the functionality disclosed herein when executed by the processor.")
a non-transitory memory storing instructions that, when executed, cause the at least one processor to: ("Both the server 20 and the user devices 42 may include hardware components of typical computing devices, including a processor, input devices (e.g., keyboard, pointing device, microphone for voice commands, buttons, touch screen, etc.), and output devices (e.g., a display device, speakers, and the like). The server 20 and user devices 42 may include computer-readable media, e.g., memory and storage devices (e.g., flash memory, hard drive, optical disk drive, magnetic disk drive, and the like) containing computer instructions that implement the functionality disclosed herein when executed by the processor.")
obtain user-item interaction data with respect to a plurality of users; ([0047] states "Context variables that may be used in conjunction with the present disclosure are described in more detail below. In some embodiments, a set of context variables in the first context module are associated with a first set of transaction records associated with a dense-data source domain, and the second context module is based on a second set of transaction records associated with the sparse-data target domain." [0057] – [0060] states "Context variables may be used in embodiments of the disclosure in learning-to-organize the user and item latent spaces. The following describes three different types of context features. Interactional Context: These predicates describe the conditions under which a specific user-item interaction occur, e.g., time or day of the interaction. They can vary across interactions of the same user-item pair. Historical Context: Historical predicates describe the past interactions associated with the interacting users and items, e.g., user's interaction pattern for an item (or, item category). Attributional Context: Attributional context encapsulates the time-invariant attributes, e.g., user demographic features or item descriptions. They do not vary across interactions of the same user-item pair." Therefore, the context variables are interpreted as the user-item interaction data.)
generate inferred user embeddings by applying a first machine learning model …, wherein the inferred user embeddings are user representations in a same latent space, ([0097] states "The user embedding space,
e
u
,
u
∈
U
m
is organized to reflect the contextual preferences of users. To achieve this organization of the embeddings, the meta-model may back propagate the extracted multi-linear context embeddings
c
n
into the user embedding space and create context conditioned clusters of users for item ranking. The precise motivation holds good for the item embedding space as well." As the embeddings are in a “user embedding space”, they are in a same latent space. As the embeddings represent the contextual preferences of users, they are a user representation. [0046] states "In the example depicted in FIG. 2C, process 250 includes, at 252, generating a first recommendation model associated with a dense-data source domain, wherein the first recommendation model includes: (i) a first context module that is based on a set of context variables; (ii) a first user embedding module; and (iii) a first merchant embedding module. Process 250 further includes extracting a meta-model from the first recommendation model (254),". As the context variables are determined from the payment transaction records, and the context variables are used to generate a model, and the model also includes user embeddings, and the model generates recommendations, the recommended items are generated based on the user session data and the inferred user embeddings. The first recommendation model, which is the model trained using dense source domain data, is interpreted as the first machine learning model.)
obtain user session data from a user device of a query user, generate recommended items based on the user session data and the inferred user embeddings, wherein: ([0032] states "Payment card transactions may be performed using a variety of platforms such as brick and mortar stores, ecommerce stores, wireless terminals, and user mobile devices. The payment card transaction details sent over the network 14 are received by one or more servers 20 of the payment card processor 12 and processed by, for example, by a payment authorization process 22 and / or forwarded to an issuing bank (not shown). The payment card transaction details are stored as payment transaction records 24 in a transaction database 26." The payment transaction records are interpreted as the user session data. The user mobile device is interpreted as the user device. [0036] states "The recommendation engine 36 can respond to a user query 38 (also referred to herein as a “recommendation request”) from a user 18 and provide a list of merchant rankings 40 in response." The merchant rankings are interpreted as the recommended items. [0035] states "As described in more detail below, the merchant recommendation system 25 retrieves the payment transaction records 24 to determine context variables 28a, 28b associated with merchants 16 and users 18. The system generates a source recommendation meta-model that includes a source context module 27a based on a source set of context variables 28a. Similarly, the system generates a target recommendation meta-model with a target context module 27b that is based on a target set of context variables 286. The system 25 transfers the source context module 27a to the target context module." [0046] states "In the example depicted in FIG. 2C, process 250 includes, at 252, generating a first recommendation model associated with a dense-data source domain, wherein the first recommendation model includes: (i) a first context module that is based on a set of context variables; (ii) a first user embedding module; and (iii) a first merchant embedding module. Process 250 further includes extracting a meta-model from the first recommendation model (254),". As the context variables are determined from the payment transaction records, and the context variables are used to generate a model, and the model also includes user embeddings, and the model generates recommendations, the recommended items are generated based on the user session data and the inferred user embeddings. )
the plurality of inference [data] are generated based on a sparse part of the user-item interaction data, … and ([0276] states "The system may train a deep learning-based recommender model using dense data and adapt the trained model to work with sparse data." One of ordinary skill in the art would realize that working with sparse data means that the inferences are generated based on sparse data.)
the first machine learning model is trained using [data] generated based on a dense part of the user-item interaction data; … and ([0276] states "The system may train a deep learning-based recommender model using dense data and adapt the trained model to work with sparse data. In an example, the system may start by choosing the bay area restaurants and then limit the users and restaurants to those having dense interactions between them. The system may use this dense data to train a deep learning based base recommender model." Therefore, the first machine learning model (source domain model) is trained using data based on a dense part of the user-item interaction data.)
generate an updated set of user representations in the same latent space based on the inferred user embeddings generated from the sparse part and user representations in the dense part; (Fig. 3B shows that embeddings are generated from the source (dense) region and the target (sparse region). As the source model, which has been trained to generate the inferred user embeddings, is transferred to the target model, which learns user embeddings in the target region, the target region embeddings are interpreted as the updated set of user representations, which includes the dense user representations. [0062] states "While the user and item embedding are specific to each domain, embodiments of the disclosure may provide a meta-learning approach grounded on contextual predicates to organize the embedding spaces of the target recommendation domains (e.g., learn-to-learn embedding representations) which is shared across the source and target domains." Therefore, the embedding spaces, interpreted as the latent space, in both the first and second models/source and target domains, are the same.)
train a second machine learning model based on the user-item interaction data and the updated set of user representations, wherein the second machine learning model is different from the first machine learning model. (Fig. 3B shows the final recommendation model, which uses the item embeddings from the target recommendation model, which are the updated user representations. [0124] states "Some embodiments may adopt a simulated-annealing approach as to adapt the layers transferred from the source domain. This may help decay the learning rate for the transferred layers at a rapid rate (e.g., employing an exponential schedule ), while user and item embeddings are allowed to evolve when trained on the target data points.” Therefore, the final/second model is trained based on the user-item interaction data and the updated set of user representations. Additionally, Fig. 2C. states "GENERATING A TRANSFER LEARNING MODEL BASED ON THE FIRST RECOMMENDATION MODEL AND THE SECOND RECOMMENDATION MODEL". The final/transfer learning model is interpreted as the second machine learning model, which is different from the first and second recommendation model.)
generate the ranked list of the recommended items based on the user session data and the second machine learning model; and ([0173]-[0175] state "Finally, the system may perform an inference pro cess. For example, suppose the system wants to rank the restaurants in the sparse region for the user U4. Here, the system provides the embeddings for the user U4 and restaurants R5 and R6 along with the context (e.g., lunch), to the trained model. The trained model outputs the scores for the restaurants as shown below: R5—0.67 R6--0.24 The top-k ( k being a predetermined number of listings ) restaurants sorted by the score may be provided to the user's computing device as recommendations." Fig. 3B shows that the recommendations are generated by the final recommendation model, interpreted as the second machine learning model.)
transmit the ranked list to the user device for display to the query user. ([0036] states "The recommendation engine 36 can respond to a user query 38 (also referred to herein as a “recommendation request” from a user 18 and provide a list of merchant rankings 40 in response." The merchant rankings are interpreted as the recommended items. One of ordinary skill in the art would realize that the list would be displayed to the user after transmission.)
Krishnan does not appear to explicitly teach
[generating inferences by applying a deep learning model] to a plurality of inference data batches in parallel
[inference] data batches
a majority of the sparse part being zero elements,
a majority of the dense part being non-zero elements,
[training using] a plurality of training data batches
However, AWS Neuron—directed to analogous art—teaches
[generating inferences by applying a deep learning model] to a plurality of inference data batches in parallel (Page 1, ‘Data parallel inference’ states “Data Parallelism is a form of parallelization across multiple devices or cores, referred to as nodes. Each node contains the same model and parameters, but data is distributed across the different nodes. By distributing the data across multiple nodes, data parallelism reduces the total execution time of large batch size inputs compared to sequential execution.” Page 2, ‘Batch dim’ states, “DataParallel accespts a dim argument that denotes the batch dimension used to split the input data for distributed inference.”)
[inference] data batches (Page 2, ‘Batch dim’ states, “DataParallel accespts a dim argument that denotes the batch dimension used to split the input data for distributed inference.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Krishnan and AWS Neuron because, as AWS Neuron states on page 1, “By distributing the data across multiple nodes, data parallelism reduces the total execution time of large batch size inputs compared to sequential execution.”
The combination of Krishnan and AWS Neuron does not appear to explicitly teach
a majority of the sparse part being zero elements,
a majority of the dense part being non-zero elements,
[training using] a plurality of training data batches
However, Wikipedia—directed to analogous art—teaches
a majority of the sparse part being zero elements, (Page 1 states “In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero.”)
a majority of the dense part being non-zero elements, (Page 2 states “By contrast, if most of the elements are non-zero, the matrix is considered dense.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Krishnan and AWS Neuron with the teachings of Wikipedia because, as stated by Wikipedia on page 1, “When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix.” Additionally, page 1 states “The concept of sparsity is useful in combinatorics and application areas such as network theory and numerical analysis, which typically have a low density of significant data or connections.” As the area of recommender systems also has a low density of significant data, see Krishnan [0023], the combination is obvious.
The combination of Krishnan, AWS Neuron, and Wikipedia does not appear to explicitly teach
[training using] a plurality of training data batches
However, Kim—directed to analogous art—teaches
[training using] a plurality of training data batches (Page 1 states “It’s natural to execute your forward, backward propagation on multiple GPUs. However, Kim will only use on GPU by default. You can easily run your operations on multiple GPUs by making your model run parallelly using DataParallel:”. One of ordinary skill in the art would realize that forward, backward propagation is training. Page 3-4 show the batch input sizes for 2 GPUs, 15, 3 GPUs, 10, and 8 GPUs, 7x4 and 2.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Krishnan, AWS Neuron, and Wikipedia with the teachings of Kim because, as stated by Kim on Page 1, “It’s natural to execute your forward, backward propagation on multiple GPUs.”
Regarding claim 2, the rejection of claim 1 is incorporated herein. Krishnan teaches
obtain historical user session data and historical user transaction data of the plurality of users; ([0183] states "FT-Data: This large-scale financial transaction dataset obtained from a major global payments technology company contains credit/debit card payments made to restaurants by cardholders in the U.S. Each transaction entry is accompanied by contextual information such as date, time, amount, etc. Unlike the public datasets, the transactions do not provide explicit ratings." The transaction data is interpreted as the historical user transaction data and the contextual information is interpreted as the historical user session data.)
generate the user-item interaction data by parsing the historical user session data and the historical user transaction data; and ([0183] states "The system may also leverage cardholders' and merchants’ transaction history and infer additional contextual features such as the spending habits of users, restaurant popularity, restaurant peak hours, cardholders' tipping patterns at restaurants, etc." In order to infer the additional contextual features, the historical data must have been parsed. The context features and transactions are interpreted as the user-item interaction data.)
store the user-item interaction data in a database. ([0283] states "Data Tier: the system may use two databases to hold all the data required by the inference models. The system may use RocksDB to store the user embeddings and the users ' history for all the users.")
Regarding claim 8, Krishnan teaches
A computer-implemented method, comprising: ([0039] states "FIG. 1B shows a computer system 170 for implementing or executing software instructions that may carry out the functions of the embodiments described herein according to various embodiments. For example, computer system 170 may comprise server 20, a merchant system 16, user system 18, or user mobile device 42 illustrated in FIG. 1A."
The remainder of claim 8 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.
Claim 9 recites substantially similar subject matter to claim 2 and is rejected with the same rationale, mutatis mutandis.
Regarding claim 15, Krishnan teaches
A non-transitory computer readable medium having instructions stored thereon, wherein the instructions, when executed by at least one processor, cause at least one device to perform operations comprising: ([0043] states "The methods described and depicted herein can be implemented in any suitable manner, such as through software operating on one or more computer systems. The software may comprise computer-readable instructions stored in a tangible computer-readable medium ( such as the memory of a computer system, and can be executed by one or more processors to perform the methods of various embodiments.")
The remainder of claim 15 recites substantially similar subject matter to claim 1 and is rejected with the same rationale, mutatis mutandis.
Claim 16 recites substantially similar subject matter to claim 2 and is rejected with the same rationale, mutatis mutandis.
Claim(s) 3, 10, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan (US 2021/0110306 A1), AWS Neuron (“Data Parallel Inference on Torch Neuron”, 2022), Kim (“Optional: Data Parallelism”, 2017) and Wikipedia (“Sparse matrix”, 2022) as applied to claims 1 and 2 above, further in view of Rosenthal (“Intro to Matrix Factorization: Classic ALS with Sketchfab Models”, 2016).
Regarding claim 3, the rejection of claim 2 is incorporated herein. The combination of Krishnan, AWS Neuron, Kim, and Wikipedia does not appear to explicitly teach
the user-item interaction data includes a user-item interaction matrix;
each element in the user-item interaction matrix is either one representing a corresponding user had an interaction with a corresponding item, or zero representing a corresponding user had no interaction with a corresponding item; and
a majority of the elements in the user-item interaction matrix are zero.
However, Rosenthal—directed to analogous art—teaches
the user-item interaction data includes a user-item interaction matrix; (Page 1 states “Recall that with implicit feedback, we do not have ratings anymore: rather, we have users’ preference for items. In the WRMG loss function, the ratings matrix
r
u
i
has been replaced with a preference matrix
p
u
i
. We make the assumption that if a user has interacted at all with an item, then
p
u
i
=
1
. Otherwise,
p
u
i
=
0
.
”
p
u
i
is interpreted as the user-item interaction matrix.)
each element in the user-item interaction matrix is either one representing a corresponding user had an interaction with a corresponding item, or zero representing a corresponding user had no interaction with a corresponding item; and (Page 1 states “Recall that with implicit feedback, we do not have ratings anymore: rather, we have users’ preference for items. In the WRMG loss function, the ratings matrix
r
u
i
has been replaced with a preference matrix
p
u
i
. We make the assumption that if a user has interacted at all with an item, then
p
u
i
=
1
. Otherwise,
p
u
i
=
0
.
”
p
u
i
is interpreted as the user-item interaction matrix.)
a majority of the elements in the user-item interaction matrix are zero. (Page 4 states that the user-item interaction matrix has “Sparsity: 0.035%”. As sparsity is the percentage of non-zero elements, the majority of the elements are zero.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Krishnan, AWS Neuron, Wikipedia, and Kim with the teachings of Rosenthal because, as Rosenthal states on page 1 "Specifically, this model makes reasonable intuitive sense, it’s scalable, and, most importantly, I’ve found it easy to tune. There are much fewer hyperparameters than, say, stochastic gradient descent models." Page 1 further states that the model’s loss function includes the item-interaction matrix. Therefore, it would be obvious to use the matrix in order to be able to use the model.
Claims 10 and 17 recite substantially similar subject matter to claim 3 and are rejected with the same rationale, mutatis mutandis.
Claim(s) 4, 11, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan (US 2021/0110306 A1), AWS Neuron (“Data Parallel Inference on Torch Neuron”, 2022), Kim (“Optional: Data Parallelism”, 2017) and Wikipedia (“Sparse matrix”, 2022) as applied to claim 1 above, further in view of Spring Batch (“Batch Processing Strategies”, 2014).
Regarding claim 4, the rejection of claim 1 is incorporated herein. Krishnan teaches
training data ([0205] states "The system randomly splits each dataset into Training (80%), Validation (10%) and Test (10 %) subsets.")
inference data ([0046] states "generating a set of recommendations based on the transfer learning mode", which means that inference data must be available to provide the inference (recommendation).)
The combination of Krishnan, AWS Neuron, Kim, and Wikipedia does not appear to explicitly teach
generate … indexes with respect to the plurality of … data batches;
store the plurality of … data batches, the … indexes, the plurality of … data batches, and the … indexes into the database.
However, Spring Batch—directed to analogous art—teaches
generate … indexes with respect to the plurality of … data batches; (Pages 3-4 state "An architecture that supports multi-partitioned applications which run against partitioned database tables using the key-column approach should include a central partition repository for storing partition parameters. This provides flexibility and ensures maintainability. The repository will generally consist of a single table known as the partition table. Information stored in the partition table will be static and in general should be maintained by the DBA. The table should consist of one row of information for each partition of a multi-partitioned application. The table should have columns for: Program ID Code, Partition Number (Logical ID of the partition), Low Value of the db key column for this partition, High Value of the db key column for this partition." The partition number is interpreted as the index.)
store the plurality of … data batches, the … indexes, the plurality of … data batches, and the … indexes into the database. (Pages 3-4 state "An architecture that supports multi-partitioned applications which run against partitioned database tables using the key-column approach should include a central partition repository for storing partition parameters. This provides flexibility and ensures maintainability. The repository will generally consist of a single table known as the partition table. Information stored in the partition table will be static and in general should be maintained by the DBA. The table should consist of one row of information for each partition of a multi-partitioned application. The table should have columns for: Program ID Code, Partition Number (Logical ID of the partition), Low Value of the db key column for this partition, High Value of the db key column for this partition." The partition number is interpreted as the index.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Krishnan, AWS Neuron, Kim, and Wikipedia with the teachings of Spring Batch because, as stated by Spring Batch on page 2, "Using partitioning allows multiple versions of large batch applications to run concurrently. The purpose of this is to reduce the elapsed time required to process long batch jobs." Note that here, the application would be training a model or inferencing using a model, both of which could use a partitioned database such as the one taught by Spring Batch.
Claims 11 and 18 recite substantially similar subject matter to claim 4 and are rejected with the same rationale, mutatis mutandis.
Claim(s) 5, 6, 12, 13, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Krishnan (US 2021/0110306 A1), AWS Neuron (“Data Parallel Inference on Torch Neuron”, 2022), Kim (“Optional: Data Parallelism”, 2017) and Wikipedia (“Sparse matrix”, 2022) as applied to claim 1 above, further in view of Spring Batch (“Batch Processing Strategies”, 2014) as applied to claim 4 above, and further in view of Schikuta (“Neural Networks and Database Systems”, 2008).
Regarding claim 5, the rejection of claim 4 is incorporated herein. Krishnan does not appear to explicitly teach
the first machine learning model is trained based on the plurality of training data batches and the training indexes to generate a plurality of model weights; and
the plurality of model weights are stored in the database.
However, Kim teaches
the first machine learning learning model is trained based on the plurality of training data batches … to generate a plurality of model weights; and (Page 1 states “It’s natural to execute your forward, backward propagation on multiple GPUs. However, Kim will only use on GPU by default. You can easily run your operations on multiple GPUs by making your model run parallelly using DataParallel:”. One of ordinary skill in the art would realize that forward, backward propagation is training. Page 3-4 show the batch input sizes for 2 GPUs, 15, 3 GPUs, 10, and 8 GPUs, 7x4 and 2. Page 2 states “However, you can use DataParallel on any model (CNN, RNN, Capsule Net, etc.).” One of ordinary skill in the art would recognize a CNN as a deep learning model, which would generate a plurality of model weights when trained.)
The combination of Krishnan, AWS Neuron, Kim, and Wikipedia does not appear to explicitly teach
[using] indexes [in an application]
However, Spring Batch—directed to analogous art teaches
[using] indexes [in an application] (Pages 3-4 state "An architecture that supports multi-partitioned applications which run against partitioned database tables using the key-column approach should include a central partition repository for storing partition parameters. This provides flexibility and ensures maintainability. The repository will generally consist of a single table known as the partition table. Information stored in the partition table will be static and in general should be maintained by the DBA. The table should consist of one row of information for each partition of a multi-partitioned application. The table should have columns for: Program ID Code, Partition Number (Logical ID of the partition), Low Value of the db key column for this partition, High Value of the db key column for this partition." The partition number is interpreted as the index. Page 4 states ". This provides flexibility and ensures maintainability. The repository will generally consist of a single table known as the partition table. Information stored in the partition table will be static and in general should be maintained by the DBA. The table should consist of one row of information for each partition of a multi-partitioned application. The table should have columns for: Program ID Code, Partition Number (Logical ID of the partition), Low Value of the db key column for this partition, High Value of the db key column for this partition." The partition number is interpreted as the index.")
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Krishnan, AWS Neuron, Kim, and Wikipedia with the teachings of Spring Batch for the reasons given above in regards to claim 4.
The combination of Krishnan, AWS Neuron, Kim, Wikipedia, and Spring Batch does not appear to explicitly teach
the plurality of model weights are stored in the database.
However, Schikuta—directed to analogous art—teaches
the plurality of model weights are stored in the database. (Page 7 states "For justification of the described framework we present a practical integration of neural networks into the Iris database system [6]. The data model of the Iris database system is based on three elements: objects, types, and functions." Page 7 further states "A neural network in our model can be represented by the 3 basic object types we mentioned in the preceding section, NUnit, NEUNET, PElement." Page 8 states "The PElement object contains the local memory function Activation, which stores the activation value of the unit. It is possible to define further storage functions (for more complex networks) or specialized transfer functions. Connections of a processing unit X is defined via the multi-valued Predecessor(X) function. It evaluates to a set of Link objects, which contain 3 functions giving the object identifier Y of the preceding neural object, the weight value and the delta value (update value during training phase) of the link."
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Krishnan, AWS Neuron, Kim, Wikipedia, and Spring Batch with the teachings of Schikuta because, as Schikuta states on page 17, "Not only the data sets alone are administrated by the computational level, but also the neural networks as basic objects of the database system. This allows the usage of efficient access mechanisms of the database system to handle the (in many situation) very large number of different neural network objects. This results into two advantages, a fast access to a neural network and a high level logical specification of similar network sets."
Regarding claim 6, the rejection of claim 5 is incorporated herein. Krishnan teaches
generate inferred user embeddings ([0097] states "The user embedding space,
e
u
,
u
∈
U
m
is organized to reflect the contextual preferences of users. To achieve this organization of the embeddings, the meta-model may back propagate the extracted multi-linear context embeddings
c
n
into the user embedding space and create context conditioned clusters of users for item ranking. The precise motivation holds good for the item embedding space as well.")
Krishnan does not appear to explicitly teach
the at least one processor comprises a plurality of processors each of which corresponds to a respective inference data batch of the plurality of inference [data]; and
each of the plurality of processors is configured to [generate inference] by applying a full replica of the first machine learning model with the plurality of model weights to the respective inference data batch.
data batches according to the … indexes
However, AWS Neuron teaches
the at least one processor comprises a plurality of processors each of which corresponds to a respective inference data batch of the plurality of inference [data]; and (Page 1 states “:func:`torch.neuron.DataParallel` implements data parallelism at the module level by replicating the Neuron model on all available NeuronCores and distributing data across the different cores for parallelized inference.” The NeuronCores are interpreted as the processors.)
each of the plurality of processors is configured to [generate inference] by applying a full replica of the first machine learning model with the plurality of model weights to the respective inference data batch. (Page 1 states “Data Parallelism is a form of parallelization across multiple devices or cores, referred to as nodes. Each node contains the same model and parameters, but data is distributed across the different nodes.” One of ordinary skill in the art would realize that the deep learning model would have model weights.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present application to combine the teachings of Krishnan and AWS Neuron for the reasons given above in regards to claim 1.
Claims 12 and 13 recite substantially similar subject matter to claims 5 and 6 respectively and are rejected with the same rationale, mutatis mutandis.
Claim 19 recites substantially similar subject matter to claim 6 and is rejected with the same rationale, mutatis mutandis.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JESSICA THUY PHAM whose telephone number is (571)272-2605. The examiner can normally be reached Monday - Friday, 9 A.M. - 5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached at (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/J.T.P./Examiner, Art Unit 2121
/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121