Last updated: April 19, 2026
Application No. 18/104,021
SYSTEMS AND METHODS FOR HYBRID OPTIMIZATION TRAINING OF MULTINOMIAL LOGIT MODELS

Non-Final OA §103
Filed
Jan 31, 2023
Examiner
PATEL, LOKESHA G
Art Unit
2125
Tech Center
2100 — Computer Architecture & Software
Assignee
Walmart Apollo LLC
OA Round
1 (Non-Final)
Interview Optional

— +38.0% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 74 resolved cases, 2023–2026
Examiner Intelligence

PATEL, LOKESHA G View full profile →
Grants 76% — above average
Career Allow Rate
56 granted / 74 resolved
+20.7% vs TC avg
Strong +38% interview lift
Without
With
+38.0%
Interview Lift
resolved cases with interview
Typical timeline
4y 5m
Avg Prosecution
20 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
29.5%
-10.5% vs TC avg
§103
35.3%
-4.7% vs TC avg
§102
8.0%
-32.0% vs TC avg
§112
18.1%
-21.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 74 resolved cases
Office Action

§103
Detailed Action
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 01/31/2023. Claims 1-20 are pending and have been examined.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/31/2023 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 
The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) are: 
Claim 1: 
a step function configured to determine a relevance score.
Clam 10:
	a step function configured to determine a relevance score
Upon a review of the Specification, each bolded generic placeholder in the claims A review of the specification shows that the corresponding structure is not described in the specification for the 35 U.S.C. 112(f) limitations:

Regarding the above-noted step function recited in claims 1 and 10, aside from merely repeating the claim language in paragraphs 4-6 and mentioning examples in paragraphs 76, 78 and 80, the corresponding structure of the claimed step function capable of performing the claimed determining of “a relevance score” is not described in applicant’s specification.
Because these claim limitation(s) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 4-10, 12-17 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tang  (“FILM: A Fast, Interpretable, and Low-rank Metric Learning Approach for Sentence Matching”) in view of Du (“A faster path-based algorithm with Barzilai-Borwein step size for solving stochastic traffic equilibrium models”).
Claim 1.
Tang teaches a system, comprising: a non-transitory memory configured to store a training dataset comprising a plurality of anchor items, a plurality of recommended item sets, and ground truth data (5 Analysis & Page 8 “FILM is executable on a CPU, but we had to run the black box model on a GPU” teaches a system comprising Memory Usage “on CPU”; 3.4 Deep metric learning with BERT & Page 6-7 “The setup consists of three separate feedforward networks, which take in anchor samples, positive samples and negative samples” teaches dataset comprising anchor, positive (ground truth data) and negative samples (recommended sets));
a processor communicatively coupled to the non-transitory memory, wherein the processor is configured to read a set of instructions to (5 Analysis & Page 8 “FILM is executable on a CPU, but we had to run the black box model on a GPU” teaches run on GPU):
obtain, from the non-transitory memory, the training dataset (5 Analysis & Page 8 “FILM is executable on a CPU, but we had to run the black box model on a GPU” a system comprising Memory Usage “on CPU”; 3.4 Deep metric learning with BERT & Page 6-7 “The setup consists of three separate feedforward networks, which take in anchor samples, positive samples and negative samples” and 3 Our Approach & Page 2 “Given a training set with triplet labels, FILM learns a low-dimensional representation of the high-dimensional samples, such that in the low-dimensional space, samples originally more similar to one another are closer to one another in terms of some user-specified metric or similarity measure” teaches obtain dataset comprising anchor, positive (ground truth data) and negative samples (recommended sets));
obtain a base machine learning model including a step function configured to determine a relevance score (Step 2 (Rewrite everything in terms of matrices) and Page 4 “
    PNG
    media_image1.png
    98
    288
    media_image1.png
    Greyscale
” teaches obtain a step function to determine a score and 3.4 Deep metric learning with BERT & Page 7 “Apart from FILM, we design a deep neural network with loss function similar to the average hinge loss of FILM” teaches FILM design with a deep neural network model corresponds to base machine learning model);
iteratively train the base machine learning model to generate a trained ranking model, wherein the plurality of anchor items and the plurality of recommended item sets are provided as an input to the base machine learning model and the ground truth data is provided as a target output (3.4 Deep metric learning with BERT & Page 6-7 “The setup consists of three separate feedforward networks, which take in anchor samples, positive samples and negative samples. Here, anchor sample refers to a sentence s, positive sample refers to a sentence s+ similar to s, and negative sample refers to a sentence s− dissimilar to s. The generation of such triplets {s, s+, s−} is the same as in Step 3 of the Execution of FILM in the previous Subsection” teaches dataset comprising anchor, positive (recommended sets) and negative samples (ground truth data) wherein the anchor sample and positive samples are provided as input and negative samples are as targeted output and Algorithm 1  
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
teaches generate a trained rank constraint model), 
and wherein the step function is trained using an adaptive step size according to a first order Barzilai-Borwein (BB) process (3.2 FILM Algorithm & Page 5 “we apply the Cayley transformation method with Barzilai-Borwein step size” and 
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
 teaches the model is iteratively train on step function Barzilai-Borwein (BB) process);
and store the trained ranking model in the non-transitory memory (5 Analysis & Page 8 “FILM is executable on a CPU, but we had to run the black box model on a GPU” and Algorithm 1 
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
teaches model is updated corresponds to storing values in the CPU and GPU).
Tang does not explicitly teach and wherein the step function is trained using an adaptive step size according to…line search process. 
However, in the same field, analogous art, Du teaches and wherein the step function is trained using an adaptive step size according to…line search process (2. Review of the BB step size determination scheme & Page 985 “The steepest descent method takes more time per iteration than the BB method due to the frequent function evaluations required by the exact/inexact line search schemes” teaches step function is trained using a line search process and Barzilai and Borwein).
Tang and Du are analogous art because they are both directed to a Barzilai-Borwein step size in gradient method.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Du into the disclosed invention of Tang.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “Numerical experiments are conducted on two real transportation networks to demonstrate the computational efficiency and robustness of the BB step size. The results show that the BB step size outperforms the current step size strategies, i.e., the Armijo rule and the self-regulated averaging scheme”, as suggested by Du (Du, Abstract, Page 982).
Claim 2. 
As discussed above, Tang in view of Du teaches the system of claim 1, 
Tang further teaches wherein the base machine learning model includes a gradient descent model (3.2 FILM Algorithm & Page 5 “The BB method is a “looking back” approach that accelerates gradient methods at nearly no extra cost, unlike traditional “looking back” line search approaches (e.g., the Armijo and Wolfe condition” teaches machine learning model includes a gradient learning).
Claim 4. 
As discussed above, Tang in view of Du teaches the system of claim 1, 
Du further teaches wherein the base machine learning model comprises a multinomial logit model, and wherein the trained ranking model comprises a multinomial logit choice model (4. A faster path-based traffic assignment algorithm with the BB step size scheme & Page 989 “the SUE problem can be equivalently represented as a fixed-point problem in Eq. (29). Specifically, based on the MNL and CNL path choice models in Eq. (12) and Eqs. (17)–(19), respectively, the network flow pattern defined by the MNL and CNL SUE models can be written as 
    PNG
    media_image3.png
    52
    552
    media_image3.png
    Greyscale
 ” teaches machine learning model comprising multinomial logit model).
Tang and Du are analogous art because both are directed Barzilai-Borwein process incorporated into the gradient method.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Du into the disclosed invention of Tang.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “Numerical experiments are conducted on two real transportation networks to demonstrate the computational efficiency and robustness of the BB step size. The results show that the BB step size outperforms the current step size strategies, i.e., the Armijo rule and the self-regulated averaging scheme” (Du, Abstract, Page 982).
Claim 5. 
As discussed above, Tang in view of Du teaches the system of claim 1, 
Tang further teaches wherein the adaptive step size is determined based on a difference between optimized parameter points and a gradient information vector at a prior iteration and a current iteration (3.2 FILM Algorithm & Page 5 “Since AP is the gradient of the Lagrangian, a natural idea is to consider the following update: Pk+1(τ ) = Pk − τAPk, (22) where τ is a step size to be chosen later. However, Eq. (22) doesn’t guarantee Pk+1(τ ) ∈ St(d, r), and so an additional projection back to St(d, r) is required if we wish to preserve the constraint at each iterate” teaches determine difference between a prior iteration and a current internation).
Claim 6. 
As discussed above, Tang in view of Du teaches the system of claim 1, 
Tang further teaches wherein the relevance score is determined based on a set of features related to a recommended item in one of the plurality of recommended item sets, a set of features related to a corresponding anchor item in the plurality of anchor items, and features related to a seller of the recommended item (3.4 Deep metric learning with BERT & Page 6-7 “The setup consists of three separate feedforward networks, which take in anchor samples, positive samples and negative samples. Here, anchor sample refers to a sentence s, positive sample refers to a sentence s+ similar to s, and negative sample refers to a sentence s− dissimilar to s. The generation of such triplets {s, s+, s−} is the same as in Step 3 of the Execution of FILM in the previous Subsection” and Step 2 (Rewrite everything in terms of matrices) and Page 4 “
    PNG
    media_image1.png
    98
    288
    media_image1.png
    Greyscale
” teaches obtain a step function to determine a score teaches dataset comprising anchor, positive (ground truth data) and negative samples (recommended sets) wherein the anchor sample and positive samples are provided as input and negative samples are as targeted output).
Claim 7. 
As discussed above, Tang in view of Du teaches the system of claim 1, 
Tang further teaches wherein the base machine learning model is trained to optimize a log-likelihood of an interaction with a first recommended item in one of the plurality of recommended item sets given a related anchor item in the plurality of anchor  items (4 Results & Page 7 “Submissions are evaluated on the log loss between the predicted values and the ground truth. By applying our FILM, we obtain a log loss score of 0.32141 from a single classifier (trained less than 20 minutes)” and 3.4 Deep metric learning with BERT & Page 6-7 “The setup consists of three separate feedforward networks, which take in anchor samples, positive samples and negative samples. Here, anchor sample refers to a sentence s, positive sample refers to a sentence s+ similar to s, and negative sample refers to a sentence s− dissimilar to s. The generation of such triplets {s, s+, s−} is the same as in Step 3 of the Execution of FILM in the previous Subsection” teaches machine learning model trained a log loss score based on the positive and anchor item).
Claim 8. 
As discussed above, Tang in view of Du teaches the system of claim 1, 
Tang further teaches wherein iteratively training the base machine learning model includes converting the training dataset to a n-dimensional matrix (3.1 Formalization & Page 3 “a smooth optimization problem with the variable being a lowdimensional matrix lying on the Stiefel manifold” and Algorithm 1 
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
 teaches iteratively training the machine learning model includes converting dataset to dimensional matrix).
Claim 9. 
As discussed above, Tang in view of Du teaches the system of claim 1, 
Du further teaches wherein the BB process and line search process are performed iteratively with different frequencies (C. ACCELERATING GD ALGORITHMS & Page 14-15 “A non-exhaustive list of tricks include step size-based approaches including fixed step size, adaptive step size, optimal step sizes including line search, BB methods, etc. Here we show how our abstractTion can be applied to implement BGD5 using backtracking line search. Backtracking line search chooses the step size in each iteration of GD as follows: αki = β ∗ αki−1 , where k is the iteration step of BGD and I is the iteration step of the line search” teaches iteratively run BB and line search process).
Tang and Du are analogous art because both are directed Barzilai-Borwein process incorporated into the gradient method.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Du into the disclosed invention of Tang.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “Numerical experiments are conducted on two real transportation networks to demonstrate the computational efficiency and robustness of the BB step size. The results show that the BB step size outperforms the current step size strategies, i.e., the Armijo rule and the self-regulated averaging scheme”, as suggested by Du (Du, Abstract, Page 982).
Claim 10.
	Tang teaches a computer-implemented method, comprising: obtaining, from a first database, a training dataset comprising a plurality of anchor items, a plurality of recommended item sets, and ground truth data (5 Analysis & Page 8 “FILM is executable on a CPU, but we had to run the black box model on a GPU” teaches a system comprising Memory Usage “on CPU”; 3.4 Deep metric learning with BERT & Page 6-7 “The setup consists of three separate feedforward networks, which take in anchor samples, positive samples and negative samples” and 4 Results & Page 7 “we focus on a recent dataset published by the QA website Quora.com” teaches dataset comprising anchor, positive (ground truth data) and negative samples (recommended sets) coming from quora.com corresponding to first database); 
obtaining a base machine learning model including a step function configured to determine a relevance score (Step 2 (Rewrite everything in terms of matrices) and Page 4 “
    PNG
    media_image1.png
    98
    288
    media_image1.png
    Greyscale
” teaches obtain a step function to determine a score and 3.4 Deep metric learning with BERT & Page 7 “Apart from FILM, we design a deep neural network with loss function similar to the average hinge loss of FILM” teaches FILM design with a deep neural network model corresponds to base machine learning model);
training the base machine learning model to generate a trained ranking model, wherein the plurality of anchor items and the plurality of recommended item sets are provided as an input to the base machine learning model and the ground truth data is provided as a target output (3.4 Deep metric learning with BERT & Page 6-7 “The setup consists of three separate feedforward networks, which take in anchor samples, positive samples and negative samples. Here, anchor sample refers to a sentence s, positive sample refers to a sentence s+ similar to s, and negative sample refers to a sentence s− dissimilar to s. The generation of such triplets {s, s+, s−} is the same as in Step 3 of the Execution of FILM in the previous Subsection” teaches dataset comprising anchor, positive (recommended sets) and negative samples (ground truth data) wherein the anchor sample and positive samples are provided as input and negative samples are as targeted output and Algorithm 1  
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
teaches generate a trained rank constraint model), 
and wherein the step function is trained using an adaptive step size according to a first order Barzilai-Borwein (BB) process and a line search function (3.2 FILM Algorithm & Page 5 “we apply the Cayley transformation method with Barzilai-Borwein step size” and 
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
 teaches the model is iteratively train on step function Barzilai-Borwein (BB) process); 
and storing the trained ranking model in a second database (5 Analysis & Page 8 “FILM is executable on a CPU, but we had to run the black box model on a GPU” and Algorithm 1 
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
teaches model is updated corresponds to storing values in the CPU and GPU corresponding to a second database).
Tang does not explicitly teach wherein the step function is trained using an adaptive step size according to…a line search function.
However, in the same field, analogous art, Du teaches wherein the step function is trained using an adaptive step size according to…a line search function (2. Review of the BB step size determination scheme & Page 985 “The steepest descent method takes more time per iteration than the BB method due to the frequent function evaluations required by the exact/inexact line search schemes” teaches step function is trained using a line search process and Barzilai and Borwein).
Tang and Du are analogous art because they are both directed to a Barzilai-Borwein step size in gradient method.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Du into the disclosed invention of Tang.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “Numerical experiments are conducted on two real transportation networks to demonstrate the computational efficiency and robustness of the BB step size. The results show that the BB step size outperforms the current step size strategies, i.e., the Armijo rule and the self-regulated averaging scheme”, as suggested by Du (Du, Abstract, Page 982).
Claim 12.
This claim recites limitations that are similar to the limitations of claim 4, thus is rejected with the same rationale applied against claim 4.
Claim 13.
This claim recites limitations that are similar to the limitations of claim 5, thus is rejected with the same rationale applied against claim 5.
Claim 14.
This claim recites limitations that are similar to the limitations of claim 6, thus is rejected with the same rationale applied against claim 6.
Claim 15.
This claim recites limitations that are similar to the limitations of claim 7, thus is rejected with the same rationale applied against claim 7.
Claim 16.
This claim recites limitations that are similar to the limitations of claim 8, thus is rejected with the same rationale applied against claim 8.
Claim 17.
Tang teaches a non-transitory computer-readable medium having instructions stored thereon, wherein the instructions, when executed by at least one processor, cause a device to perform operations comprising (5 Analysis & Page 8 “FILM is executable on a CPU, but we had to run the black box model on a GPU” teaches a system comprising Memory Usage “on CPU”):
obtaining, from a first database, a training dataset comprising a plurality of anchor items, a plurality of recommended item sets, and ground truth data (3.4 Deep metric learning with BERT & Page 6-7 “The setup consists of three separate feedforward networks, which take in anchor samples, positive samples and negative samples” and 4 Results & Page 7 “we focus on a recent dataset published by the QA website Quora.com” teaches dataset comprising anchor, positive (ground truth data) and negative samples (recommended sets) coming from quora.com corresponding to first database); 
obtaining a base machine learning model including a step function configured to determine a relevance score (Step 2 (Rewrite everything in terms of matrices) and Page 4 “
    PNG
    media_image1.png
    98
    288
    media_image1.png
    Greyscale
” teaches obtain a step function to determine a score and 3.4 Deep metric learning with BERT & Page 7 “Apart from FILM, we design a deep neural network with loss function similar to the average hinge loss of FILM” teaches FILM design with a deep neural network model corresponds to base machine learning model);
training the base machine learning model to generate a trained ranking model, wherein the plurality of anchor items and the plurality of recommended item sets are provided as an input to the base machine learning model and the ground truth data is provided as a target output (3.4 Deep metric learning with BERT & Page 6-7 “The setup consists of three separate feedforward networks, which take in anchor samples, positive samples and negative samples. Here, anchor sample refers to a sentence s, positive sample refers to a sentence s+ similar to s, and negative sample refers to a sentence s− dissimilar to s. The generation of such triplets {s, s+, s−} is the same as in Step 3 of the Execution of FILM in the previous Subsection” teaches dataset comprising anchor, positive (recommended sets) and negative samples (ground truth data) wherein the anchor sample and positive samples are provided as input and negative samples are as targeted output and Algorithm 1  
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
teaches generate a trained rank constraint model), 
and wherein the step function is trained using an adaptive step size according to a first order Barzilai-Borwein (BB) process (3.2 FILM Algorithm & Page 5 “we apply the Cayley transformation method with Barzilai-Borwein step size” and 
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
 teaches the model is iteratively train on step function Barzilai-Borwein (BB) process); 
and storing the trained ranking model in a second database (5 Analysis & Page 8 “FILM is executable on a CPU, but we had to run the black box model on a GPU” and Algorithm 1 
    PNG
    media_image2.png
    327
    324
    media_image2.png
    Greyscale
teaches model is updated corresponds to storing values in the CPU and GPU corresponding to a second database).
Tang does not explicitly teach wherein the step function is trained using an adaptive step size according to…a line search function.
However, in the same field, analogous art, Du teaches wherein the step function is trained using an adaptive step size according to…a line search function (2. Review of the BB step size determination scheme & Page 985 “The steepest descent method takes more time per iteration than the BB method due to the frequent function evaluations required by the exact/inexact line search schemes” teaches step function is trained using a line search process and Barzilai and Borwein).
Tang and Du are analogous art because they are both directed to a Barzilai-Borwein step size in gradient method.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Du into the disclosed invention of Tang.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “Numerical experiments are conducted on two real transportation networks to demonstrate the computational efficiency and robustness of the BB step size. The results show that the BB step size outperforms the current step size strategies, i.e., the Armijo rule and the self-regulated averaging scheme”, as suggested by Du (Du, Abstract, Page 982).
Claim 19.
This claim recites limitations that are similar to the limitations of claims 5 and 6, thus is rejected with the same rationale applied against claims 5 and 6.
Claim 20.
This claim recites limitations that are similar to the limitations of claim 8, thus is rejected with the same rationale applied against claim 8.
Claims 3, 11 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Tang  (“FILM: A Fast, Interpretable, and Low-rank Metric Learning Approach for Sentence Matching”) in view of Du (“A faster path-based algorithm with Barzilai-Borwein step size for solving stochastic traffic equilibrium models”) and further in view of Kaoudi (“A Cost-based Optimizer for Gradient Descent Optimization”).
Claim 3. 
As discussed above, Tang in view of Du teaches the system of claim 2, 
Tang in view of Du does not explicitly teach wherein iteratively training the base machine learning model includes implementing a plurality of parallel gradient descent steps, wherein each of the plurality of parallel gradient descent steps operates on a chunk of the training dataset.
However, Kaoudi teaches wherein iteratively training the base machine learning model includes implementing a plurality of parallel gradient descent steps, wherein each of the plurality of parallel gradient descent steps operates on a chunk of the training dataset (7.1 Operator Cost Model & Page 7 “For example, consider a compute cluster of 10 nodes, each being able to process 2 partitions in parallel. Given this setup, we could parallelize the processing of a given dataset composed of 85 partitions in 5 waves: each wave processing 20 partitions in parallel, except the last wave that processes the remaining 5 partitions” and 7. GD COST MODEL and Page 6 “To estimate the overall cost of a GD plan, it uses a cost model that is composed of the cost per iteration and the number of iterations of the GD plan” teaches iteratively training machine learning model includes a parallel processing for an input datasets).
Tang, Kaoudi and Du are analogous art because they are all directed Barzilai-Borwein process incorporated into the gradient method.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Kaoudi into the disclosed invention of Tang in view of Du.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “Extensive experiments on real and synthetic datasets show that our optimizer not only chooses the best GD plan but also allows for optimizations that achieve orders of magnitude performance speed-up”, as suggested by Kaoudi (Kaoudi, Abstract, Page 2).
Claim 11,
This claim recites limitations that are similar to the limitations of claim 3, thus is rejected with the same rationale applied against claim 3.
Claim 18.
As discussed above, Tang in view of Du teaches the non-transitory computer-readable medium of claim 17, 
Du further teaches wherein the base machine learning model comprises a multinomial logit model, and wherein the trained ranking model comprises a multinomial logit choice model (4. A faster path-based traffic assignment algorithm with the BB step size scheme & Page 989 “the SUE problem can be equivalently represented as a fixed-point problem in Eq. (29). Specifically, based on the MNL and CNL path choice models in Eq. (12) and Eqs. (17)–(19), respectively, the network flow pattern defined by the MNL and CNL SUE models can be written as 
    PNG
    media_image3.png
    52
    552
    media_image3.png
    Greyscale
 ” teaches machine learning model comprising multinomial logit model).
Tang and Du are analogous art because both are directed Barzilai-Borwein process incorporated into the gradient method.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Du into the disclosed invention of Tang.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “Numerical experiments are conducted on two real transportation networks to demonstrate the computational efficiency and robustness of the BB step size. The results show that the BB step size outperforms the current step size strategies, i.e., the Armijo rule and the self-regulated averaging scheme” (Du, Abstract, Page 982).
Tang in view of Du does not explicitly teach and wherein iteratively training the base machine learning model includes implementing a plurality of parallel gradient descent steps, wherein each of the plurality of parallel gradient descent steps operates on a chunk of the training dataset.
However, Kaoudi teaches and wherein iteratively training the base machine learning model includes implementing a plurality of parallel gradient descent steps, wherein each of the plurality of parallel gradient descent steps operates on a chunk of the training dataset (7.1 Operator Cost Model & Page 7 “For example, consider a compute cluster of 10 nodes, each being able to process 2 partitions in parallel. Given this setup, we could parallelize the processing of a given dataset composed of 85 partitions in 5 waves: each wave processing 20 partitions in parallel, except the last wave that processes the remaining 5 partitions” and 7. GD COST MODEL and Page 6 “To estimate the overall cost of a GD plan, it uses a cost model that is composed of the cost per iteration and the number of iterations of the GD plan” teaches iteratively training machine learning model includes a parallel processing for an input datasets).
Tang, Kaoudi and Du are analogous art because all are directed Barzilai-Borwein process incorporated into the gradient method.
It would have been obvious for one of ordinary skill in the arts before the effective filing date of the claimed invention to incorporate the limitation(s) above as taught by Kaoudi into the disclosed invention of Tang in view of Du.
One of ordinary skill in the arts would have been motivated to make this modification because of the following, “Extensive experiments on real and synthetic datasets show that our optimizer not only chooses the best GD plan but also allows for optimizations that achieve orders of magnitude performance speed-up” (Kaoudi, Abstract, Page 2).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Lokesha Patel whose telephone number is (571)272-6267. The examiner can normally be reached 8 AM - 4 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached at (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LOKESHA PATEL/Examiner, Art Unit 2125
                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125
Read full office action
Prosecution Timeline

Jan 31, 2023
Application Filed
Jan 23, 2026
Non-Final Rejection — §103
Mar 13, 2026
Interview Requested
Mar 25, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/347,150
Patent 12585938
Consensus Driven Learning
2y 5m to grant Granted Mar 24, 2026
16/273,973
Patent 12572811
CONTROLLABLE AND INTERPRETABLE CONTENT CONVERSION
2y 5m to grant Granted Mar 10, 2026
17/475,003
Patent 12561556
DEVICES, SYSTEMS, METHODS, AND MEDIA FOR DOMAIN ADAPTATION USING HYBRID LEARNING
2y 5m to grant Granted Feb 24, 2026
17/467,971
Patent 12536454
TODDLER-INSPIRED BAYESIAN LEARNING METHOD AND COMPUTING APPARATUS FOR PERFORMING THE SAME
2y 5m to grant Granted Jan 27, 2026
17/352,899
Patent 12530615
INTELLIGENT OVERSIGHT OF MULTI-PARTY ENGAGEMENTS
2y 5m to grant Granted Jan 20, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
99%
With Interview (+38.0%)
4y 5m
Median Time to Grant
Low
PTA Risk
Based on 74 resolved cases by this examiner. Grant probability derived from career allow rate.