Last updated: May 29, 2026
Application No. 17/579,566
EMBEDDING OPTIMIZATION FOR A MACHINE LEARNING MODEL

Non-Final OA §101§103
Filed
Jan 19, 2022
Examiner
WU, NICHOLAS S
Art Unit
2148
Tech Center
2100 — Computer Architecture & Software
Assignee
Lemon Inc.
OA Round
2 (Non-Final)
This examiner grants 51% of cases after interview

— +39.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 43 resolved cases, 2023–2026
Examiner Intelligence

WU, NICHOLAS S View full profile →
Grants 51% of resolved cases
Career Allowance Rate
22 granted / 43 resolved
-3.8% vs TC avg
Strong +40% interview lift
Without
With
+39.5%
Interview Lift
resolved cases with interview
Typical timeline
3y 11m
Avg Prosecution
32 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.8%
-37.2% vs TC avg
§103
94.4%
+54.4% vs TC avg
§112
2.8%
-37.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 43 resolved cases
Office Action

§101 §103
DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 09/03/2025 have been fully considered but they are not persuasive.
Regarding the 101 rejections, on pages 10-11 of “Remarks” applicant contends that the additional elements in amended claim 1 provides a practical application through a technical improvement under Step 2A Prong 2. The examiner respectfully disagrees. Applicant argues that the additional element of a “model training system” and a “machine learning model” are not abstract ideas. The examiner agrees that the model training system and machine learning model elements are not abstract ideas. However, the two additional elements do not incorporate the previously identified judicial exceptions in amended claim 1 into a practical application. Instead, the additional elements are used as tools to perform the previously identified judicial exceptions. The mention of performing the previously identified judicial exceptions using a machine learning model and a model training system, under the broadest reasonable interpretation, merely recite steps that apply generic computer components to perform an abstract idea which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Under Step 2B, the courts have found that adding the words “apply it”, or an equivalent, with the judicial exception does not qualify as significantly more under Step 2B (MPEP 2106.05). Additionally, applicant argues that the amended claim 1 provides a technical improvement to machine learning models by using an orthogonality metric to compress the model and reduce memory usage. However, the amended claim 1 does not recite limitations that draw the orthogonality metric to compressing the model or reducing memory usage. Therefore, applicant’s arguments regarding the 101 rejections are not persuasive. 
Regarding the 103 rejections, applicant’s arguments about reference(s) Bansal have been fully considered but are not persuasive.
Alleged no teaching of orthogonality metric
	In Remarks/Arguments pg. 13-14, applicant contends:
“Regarding the feature (2), the Office acknowledges that Zhao fails to disclose the feature"the first training objective function being based on an orthogonality metric between embedding vectors in the set of embedding vectors and based on a difference between the model output and a ground-truth model output" as recited in feature (2). However, the Office Action alleges that Bansal discloses that the first training objective function is based on an orthogonality metric between embedding vectors in the set of embedding vectors and based on a difference between the model output and a ground-truth model output. Applicant respectfully disagrees…In the above disclosure, Bansal only teaches the orthogonality regularizers are applicable to both fully-connected and convolutional layers, and the setting for regularizing convolutional layers enforces orthogonality across filter, encouraging filter diversity. However, Bansal does not disclose or teach the training objective function can be designed based on an orthogonality metric between embedding vectors in an embedding table for a certain input field. In contrast, the feature (2) requires that the first training objective function is based on an orthogonality metric between embedding vectors in the set of embedding vectors and based on a difference between the model output and a ground-truth model output…Thus, Bansal fails to disclose or suggest at least "the first training objective function is based on an orthogonality metric between embedding vectors in the set of embedding vectors and based on a difference between the model output and a ground-truth model output" as recited in the feature (2).

Feature (3) as amended requires that the orthogonality metric is determined based on the following: constructing a matrix comprising the set of embedding vectors; determining a difference between a transpose of the matrix times the matrix itself and an identity matrix; and determining the orthogonality metric based on the difference. 
The Office Action alleges that Bansal at page 3 discloses the addition features of pervious claim 2. Applicant respectfully disagrees…In the above disclosure of Bansal, Bansal only teaches that previous works [14, 32, 33] proposed to require the Gram matrix of the weight matrix to be close to identity, which we termas Soft Orthogonality (SO) regularization: (SO) 
    PNG
    media_image1.png
    11
    8
    media_image1.png
    Greyscale
I|WTW - I|
    PNG
    media_image2.png
    16
    14
    media_image2.png
    Greyscale
 where A is the regularization coefficient. Bansal does not mention the set of embedding vectors. Thus, Bansal does not disclose or teach the orthogonality metric is determined based on the following: constructing a matrix comprising the set of embedding vectors; determining a difference between a transpose of the matrix times the matrix itself and an identity matrix; and determining the orthogonality metric based on the difference, as recited in the feature (3).”

The relevant claim limitations appear to be: the first training objective function being based on an orthogonality metric between embedding vectors in the set of embedding vectors and based on a difference between the model output and a ground-truth model output, wherein the orthogonality metric is determined based on the following: constructing a matrix comprising the set of embedding vectors; determining a difference between a transpose of the matrix times the matrix itself and an identity matrix; and determining the orthogonality metric based on the difference. in claim 1. As noted in the previous Office Action, Zhao and Bansal teach:
(Zhao, ⁋88, “Given the selected embedding spaces, unique embedding vectors (x1, . . . , xM) are obtained for features (x1, . . . , xM). The method concatenates these embeddings and feeds them into hidden layers. The prediction ŷ is generated by the output layer. Further, the parameters of the DLRS are updated by minimizing the supervised loss function L(ŷ, y) through back-propagation.”).

(Bansal, pg. 3, “In this section, we will derive and discuss several orthogonality regularizers. Note that those regularizers are applicable to both fully-connected and convolutional layers. The default mathematical expressions of regularizers will be assumed on a fully-connected layer W ∈ m×n (m could be either larger or smaller than n)…Previous works [14, 32, 33] proposed to require the Gram matrix of the weight matrix to be close to identity, which we term as Soft Orthogonality (SO) regularization: (SO) λ||WTW − I||2 F, (1) where λ is the regularization coefficient (the same hereinafter).”).

In other words, Zhao teaches the embedding vectors in an embedding layer of a neural network. Zhao also teaches the use of a supervised loss function which is interpreted as a first training objective function as a supervised loss function compares model outputs to ground truth outputs. While Zhao teaches the use of an embedding layer with embedding vectors and a first objective function through a loss function, Zhao does not explicitly teach the use of an orthogonality metric. Bansal teaches the use of an orthogonal regularizer for training fully connected layers. The embedding layer in Zhao is interpreted as a fully connected layer see (Zhao, ⁋56, “There are N fully-connected layers, which transform embedding vectors {xm 1, . . . , xm N} into embeddings”). Therefore, the combination of Bansal’s teaching of using an orthogonality regularizer to Zhao’s teachings of an embedding layer with embedding vectors teaches a first training objective function being based on an orthogonality metric between embedding vectors in the set of embedding vectors and based on a difference between the model output and a ground-truth model output.
As mentioned above, Zhao teaches the embedding vectors in the embedding layer and Bansal’s teaching of an orthogonality regularizer to a fully connected layer also applies to the embedding layer in Zhao. Bansal shows that soft orthogonality regularization takes a fully connected layer matrix W and multiplies it to the transpose of itself. Then an identity matrix is subtracted from the product of W and its transpose. Thus, Bansal also teaches wherein the orthogonality metric is determined based on the following: constructing a matrix comprising the set of embedding vectors; determining a difference between a transpose of the matrix times the matrix itself and an identity matrix; and determining the orthogonality metric based on the difference. Therefore, the applicant’s arguments are not persuasive. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 3-13, and 15-20 are rejected under 35 U.S.C 101 because the claimed invention is directed to an abstract idea without significantly more. 
Regarding claim 1, in step 1 of the 101 analysis set forth in MPEP 2106, the claim recites A method of learning embedding vectors for a machine learning model,. The claim recites a method. A method is one of the four statutory categories of invention.  
In Step 2A, Prong 1 of the 101 analysis set forth in MPEP 2106, the examiner has determined that the following limitations recite a process that, under broadest reasonable interpretation, covers a mental process or mathematical concept but for the recitation of generic computer components:
determining a set of model parameter values for the machine learning model and a set of embedding vectors for an input field of the machine learning model, the machine learning model being constructed to map an input sample in the input field to an embedding vector in the set of embedding vectors and process the embedding vector with the set of model parameter values to generate a model output; (i.e., the broadest reasonable interpretation includes a step of evaluation and judgement and could be performed mentally or with pen and paper like selecting initial parameters for a machine learning model, which is either a mental process of evaluation/judgement (MPEP 2106)).
the first training objective function being based on an orthogonality metric between embedding vectors in the set of embedding vectors and based on a difference between the model output and a ground-truth model output. (i.e., the broadest reasonable interpretation includes mathematical calculations of a relation metric and a difference metric, a mathematical calculation is considered a mathematical concept (MPEP 2106)).
wherein the orthogonality metric is determined based on the following: constructing a matrix comprising the set of embedding vectors; determining a difference between a transpose of the matrix times the matrix itself and an identity matrix; and determining the orthogonality metric based on the difference. (i.e., the broadest reasonable interpretation includes mathematical calculations of performing matrix calculations, a mathematical calculation is considered a mathematical concept (MPEP 2106)).
If the claim limitations, under their broadest reasonable interpretation, covers activities classified under Mental processes: concepts performed in the human mind (including observation, evaluation, judgement, or opinion) (see MPEP 2106.04(a)(2), subsection (III)) or Mathematical concepts: mathematical relationships, mathematical formulas or equations, or mathematical calculations (see MPEP 2106.04(a)(2), subsection (I)). Accordingly, the claim recites an abstract idea.
In Step 2A, Prong 2 of the 101 analysis, set forth in MPEP 2106, the examiner has determined that the following additional elements do not integrate this judicial exception into a practical application:
the method is implemented at a model training system, (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f))).
and training the machine learning model by updating the set of model parameter values and the set of embedding vectors according to at least a first training objective function, (i.e., the generic computer components recited in this limitation merely add the words “apply it”, or an equivalent, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea (MPEP 2106.05(f))).
Since the claim does not contain any other additional elements, that amount to integration into a practical application, the claim is directed to an abstract idea. 
In Step 2B of the 101 analysis set forth in the 2019 PEG, the examiner has determined that the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitations (IV) and (V), under the broadest reasonable interpretation, merely recite steps that apply generically training a machine learning model in a model training system, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Considering additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
Regarding claim 3, it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 3 recites wherein the machine learning model is further constructed to mask the embedding vector with a dimension mask for the input field and process the masked embedding vector with the set of model parameter values to generate the model output, wherein the dimension mask indicates respective importance levels of a plurality of embedding elements comprised in each of the set of embedding vectors. Under the broadest reasonable interpretation, the limitations recite determining a dimension mask which is a step of evaluation and judgement which can be performed mentally or with pen and paper. The steps of evaluation and judgement are mental processes thus, claim 3 does not solve the deficiencies of claim 1.
Regarding claim 4, it is dependent upon claim 3 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 4 recites wherein training the machine learning model comprises: performing a first training procedure on the machine learning model to update the set of model parameter values and the set of embedding vectors according to the first training objective function, and performing a second training procedure on the machine learning model to update the dimension mask and to further update the set of model parameter values and the set of embedding vectors according to a second training objective function, Under the broadest reasonable interpretation, merely recite steps that apply generically training a machine learning model in two stages, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Claim 4 also recites wherein the second training objective function is at least based on the orthogonality metric and the difference between the model output generated with the set of model parameter values and a ground-truth model output. This limitation is similar to the first objective function in claim 1 and therefore also interpreted as a mathematical calculation. Therefore, claim 4 does not solve the deficiencies of claim 3.
Regarding claim 5, it is dependent upon claim 4 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 5 recites wherein during the first training procedure, the dimension mask is set to indicate that embedding elements comprised in the set of embedding vectors are important and are retained. Under the broadest reasonable interpretation, the limitations recite determining which values are important in a dimension mask which is a step of evaluation and judgement which can be performed mentally or with pen and paper. The steps of evaluation and judgement are mental processes thus, claim 5 does not solve the deficiencies of claim 4.
Regarding claim 6, it is dependent upon claim 4 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 6 recites wherein performing the second training procedure comprises: iteratively performing the following until the second training objective function reaches a threshold value, updating the dimension mask using a first training data batch for the machine learning model; and updating the set of model parameter values and the set of embedding vectors using a second training data batch for the machine learning model, wherein the updated dimension mask remain unchanged during the updating of the set of model parameter values and the set of embedding vectors. Under the broadest reasonable interpretation, merely recite steps that apply generic batch training to a machine learning model, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Therefore, claim 6 does not solve the deficiencies of claim 4.
Regarding claim 7, it is dependent upon claim 4 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 7 recites wherein the dimension mask comprises a plurality of mask elements corresponding to a plurality of embedding elements comprised in each of the set of embedding vectors, each mask element having either a first value to indicate that the corresponding embedding element is important and is retained or a second value to indicate that the corresponding embedding element is pruned from each of the set of embedding vectors. Under the broadest reasonable interpretation, the limitations recite assigning a first or second value to elements within a dimension mask to specify which elements are important which is a step of evaluation and judgement which can be performed mentally or with pen and paper. The steps of evaluation and judgement are mental processes thus, claim 7 does not solve the deficiencies of claim 4.
Regarding claim 8, it is dependent upon claim 7 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 8 recites wherein the second training objective function is further based on a difference between the number of mask elements in the dimension mask having the first value and a target number of mask elements having the first value. Under the broadest reasonable interpretation, the limitations recite performing a difference calculation which is interpreted as a mathematical calculation. A mathematical calculation is a mathematical concept thus, claim 8 does not solve the deficiencies of claim 7.
Regarding claim 9, it is dependent upon claim 8 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 9 recites wherein the dimension mask is updated based on the following: determining a first adjusted dimension mask by deleting at least one element having the second value from the dimension mask;. Under the broadest reasonable interpretation, the limitations recite removing elements within a dimension mask which is a step of evaluation and judgement which can be performed mentally or with pen and paper. The steps of evaluation and judgement are mental processes. Claim 9 also recites determining a gradient of the second training objective function with respect to the adjusted dimension mask; and updating the dimension mask based on the determined gradient. Under the broadest reasonable interpretation, the limitations recite performing a gradient calculation which is interpreted as a mathematical calculation. A mathematical calculation is a mathematical concept thus, claim 9 does not solve the deficiencies of claim 8.
Regarding claim 10, it is dependent upon claim 4 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 10 recites wherein training the machine learning model further comprises: performing a third training procedure on the machine learning model to further update the set of model parameter values and the set of embedding vectors obtained after the second training procedure, wherein the third training objective function is based on the orthogonality metric and the difference between the model output and a ground-truth model output, and wherein the dimension mask obtained after the second training procedure remains unchanged during the third training procedure. Under the broadest reasonable interpretation, merely recite steps that apply generic retraining to a machine learning model, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Therefore, claim 10 does not solve the deficiencies of claim 4.
Regarding claim 11, it is dependent upon claim 4 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 11 recites further comprising: determining a set of masked embedding vectors by masking each of the set of embedding vectors with the dimension mask; determining the trained machine learning model by masking, with the dimension mask, a subset of the set of model parameter values that are directly applied to an embedding vector of the set of the embedding vector; and providing the set of masked embedding vectors and the trained machine learning model. Under the broadest reasonable interpretation, merely recite steps that apply generic training using masks to a machine learning model, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Claim 11 also recites and providing the set of masked embedding vectors and the trained machine learning model. Under the broadest reasonable interpretation, the limitations merely recite steps of mere data gathering/outputting, which has been recognized by the courts as being well-understood, routine, and conventional functions. Specifically, the courts have recognized computer functions directed to mere data gathering as well-understood, routine, and conventional functions when they are claimed in a merely generic manner or as insignificant extra-solution activity (MPEP 2106.05(g)). Therefore, claim 11 does not solve the deficiencies of claim 4.
Regarding claim 12, it is dependent upon claim 1 and fails to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. For example, claim 12 recites wherein training the machine learning model further comprises: determining a further set of embedding vectors for a further input field of the machine learning model, the machine learning model being constructed to map a further input sample in the further input field to a further embedding vector in the further set of embedding vectors and process the further embedding vector with the set of model parameter values to generate a model output; and training the machine learning model by updating the set of model parameter values, the set of embedding vectors, and the further set of embedding vectors according to at least the first training objective function, wherein the first training objective function is further based on an orthogonality metric between embedding vectors in the further set of embedding vectors. Under the broadest reasonable interpretation, merely recite steps that apply generic incremental learning using additional input data, which represents merely adding the words “apply it”, or an equivalent, which are not indicative of an inventive concept (MPEP 2106.05(f)). Therefore, claim 12 does not solve the deficiencies of claim 1.
Regarding claims 13 and 15-20, they are similar to claims 1 and 3-12 and are rejected under the same rationales.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao, et al., US Pre-Grant Publication 2023/0124258A1 (“Zhao”) in view of Bansal, et al., Non-Patent Literature “Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?” (“Bansal”).
Regarding claim 1 and analogous claims 13 and 20, Zhao discloses:
A method of learning embedding vectors for a machine learning model, the method is implemented at a model training system, (Zhao, ⁋99, “a neural network optimizer 928 and a predictor 930. The neural network optimizer 928 finds an optimal neural network by determining optimal embeddings for the feature fields, as described above [A method of learning embedding vectors for a machine learning model, the method is implemented at a model training system,].”).
the method comprising: determining a set of model parameter values for the machine learning model and a set of embedding vectors for an input field of the machine learning model, (Zhao, ⁋20, “One aspect includes a framework to automatically select dimensions for the embeddings of feature fields in a data-driven fashion [and a set of embedding vectors for an input field of the machine learning model,]. An end-to-end differentiable framework calculates weights for multiple embedding sizes of feature fields over various dimensions while optimizing parameters of the neural network [the method comprising: determining a set of model parameter values for the machine learning model].”).
the machine learning model being constructed to map an input sample in the input field to an embedding vector in the set of embedding vectors (Zhao, ⁋29, “DLRSs map these categorical features into dense vectors of real numbers via an embedding-component 206, e.g., an embedding-lookup process, which leads to a large number of embedding parameters [the machine learning model being constructed to map an input sample in the input field to an embedding vector in the set of embedding vectors].”).
and process the embedding vector with the set of model parameter values to generate a model output; (Zhao, ⁋30, “The DLRS is the Multi-Layer Perception (MLP) component 204 that transforms the embedding component 206 input embeddings from the feature fields 208 to generate the outputs, also referred to as predictions, referred to as the output layer 202 [and process the embedding vector with the set of model parameter values to generate a model output;].”).
and training the machine learning model by updating the set of model parameter values and the set of embedding vectors according to at least a first training objective function, (Zhao, ⁋88, “Given the selected embedding spaces, unique embedding vectors (x1, . . . , xM) are obtained for features (x1, . . . , xM) [and the set of embedding vectors]. The method concatenates these embeddings and feeds them into hidden layers. The prediction ŷ is generated by the output layer. Further, the parameters of the DLRS are updated [and training the machine learning model by updating the set of model parameter values] by minimizing the supervised loss function L(ŷ, y) [according to at least a first training objective function,] through back-propagation.”).
the first training objective function…based on a difference between the model output and a ground-truth model output. (Zhao, ⁋88, “Further, the parameters of the DLRS are updated by minimizing the supervised loss function L(ŷ, y) [the first training objective function] through back-propagation; supervised loss function is interpreted as comparing predicted outputs to labeled outputs (i.e. based on a difference between the model output and a ground-truth model output.).”).
While Zhao teaches optimizing the embedding vectors in a machine learning environment, Zhao does not explicitly teach:
…being based on an orthogonality metric between embedding vectors in the set of embedding vectors and…
wherein the orthogonality metric is determined based on the following: constructing a matrix comprising the set of embedding vectors; determining a difference between a transpose of the matrix times the matrix itself and an identity matrix; and determining the orthogonality metric based on the difference.
Bansal teaches:
…being based on an orthogonality metric between embedding vectors in the set of embedding vectors and… (Bansal, pg. 3, “In this section, we will derive and discuss several orthogonality regularizers […being based on an orthogonality metric]. Note that those regularizers are applicable to both fully-connected and convolutional layers. The default mathematical expressions of regularizers will be assumed on a fully-connected layer W ∈ m×n (m could be either larger or smaller than n)…Previous works [14, 32, 33] proposed to require the Gram matrix of the weight matrix to be close to identity, which we term as Soft Orthogonality (SO) regularization: (SO) λ||WTW − I||2 F, (1) where λ is the regularization coefficient (the same hereinafter); W is interpreted as an embedding vector as the orthogonality regularization can be applied to any fully-connected layer like an embedding layer in Zhao (i.e. between embedding vectors in the set of embedding vectors and…).”).
wherein the orthogonality metric is determined based on the following: constructing a matrix comprising the set of embedding vectors; determining a difference between a transpose of the matrix times the matrix itself and an identity matrix; and determining the orthogonality metric based on the difference. (Bansal, pg. 3, “In this section, we will derive and discuss several orthogonality regularizers. Note that those regularizers are applicable to both fully-connected and convolutional layers. The default mathematical expressions of regularizers will be assumed on a fully-connected layer W ∈ m×n (m could be either larger or smaller than n)…Previous works [14, 32, 33] proposed to require the Gram matrix of the weight matrix to be close to identity, which we term as Soft Orthogonality (SO) regularization: (SO) λ||WTW − I||2 F, (1) [determining a difference between a transpose of the matrix times the matrix itself and an identity matrix; and determining the orthogonality metric based on the difference.] where λ is the regularization coefficient (the same hereinafter); W is interpreted as an embedding vector as the orthogonality regularization can be applied to any fully-connected layer like an embedding layer in Zhao (i.e. constructing a matrix comprising the set of embedding vectors;).”).
Zhao and Bansal are both in the same field of endeavor (i.e. machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Zhao and Bansal to teach the above limitation(s). The motivation for doing so is that using orthogonality regularization during training improves training efficiency (cf. Bansal, pg. 9, “We presented an efficient mechanism for regularizing different flavors of orthogonality, on several state-of-art convolutional deep CNNs [21, 6, 20]. We showed that in all cases, we can achieve better accuracy, more stable training curve and smoother convergence.”).
Regarding claim 13, the claim is analogous to claim 1. 
Zhao further teaches the additional limitations A system, comprising: at least one processor; and at least one memory communicatively coupled to the at least one processor and comprising computer-readable instructions that upon execution by the at least one processor cause the at least one processor (Zhao, claim 8, “A system comprising: a memory comprising instructions; and one or more computer processors, wherein the instructions, when executed by the one or more computer processors, cause the system to perform operations comprising [A system, comprising: at least one processor; and at least one memory communicatively coupled to the at least one processor and comprising computer-readable instructions that upon execution by the at least one processor cause the at least one processor]”).
Regarding claim 20, the claim is analogous to claim 1. 
Zhao further teaches the additional limitation A non-transitory computer-readable storage medium, storing computer-readable instructions that upon execution by a computing device cause the computing device to perform acts (Zhao, ⁋135, “In yet another general aspect, a machine-readable storage medium (e.g., a non-transitory storage medium) includes instructions that, when executed by a machine, cause the machine to perform operations [A non-transitory computer-readable storage medium, storing computer-readable instructions that upon execution by a computing device cause the computing device to perform acts]”).

Claims 3-7, 11, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao, et al., US Pre-Grant Publication 2023/0124258A1 (“Zhao”) in view of Bansal, et al., Non-Patent Literature “Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?” (“Bansal”) and further in view of Cheng, et al., Non-Patent Literature “Differentiable neural input search for recommender systems” (“Cheng”).
Regarding claim 3 and analogous claim 15, Zhao in view of Bansal teaches the method of claim 1. While the combination teaches training a machine learning model using optimized vector embeddings and an orthogonality metric, the combination does not explicitly teach wherein the machine learning model is further constructed to mask the embedding vector with a dimension mask for the input field and process the masked embedding vector with the set of model parameter values to generate the model output, wherein the dimension mask indicates respective importance levels of a plurality of embedding elements comprised in each of the set of embedding vectors.
Cheng teaches:
wherein the machine learning model is further constructed to mask the embedding vector with a dimension mask for the input field and process the masked embedding vector with the set of model parameter values to generate the model output, (Cheng, pg. 2 col. 1 and see Figure 1, “a soft selection layer [to mask the embedding vector with a dimension mask] between the embedding layer and the feature interaction layers [for the input field] of latent factor models. Each input feature embedding is fed into the soft selection layer to perform an element-wise multiplication with a scaling vector. The soft selection layer directly controls the significance of each dimension of the feature embedding, and it is essentially a part of model architecture which can be optimized according to model’s validation performance; in Figure 1(b), the soft selection layer’s outputs are used by the model to make a prediction (i.e. and process the masked embedding vector with the set of model parameter values to generate the model output).”).
wherein the dimension mask indicates respective importance levels of a plurality of embedding elements comprised in each of the set of embedding vectors. (Cheng, pg. 4 col. 2, “A straightforward way to derive the discrete mixed dimension scheme D is to prune non-informative embedding dimensions in the soft selection layer α. Here we employ a fine-grained pruning procedure through layer merging. Specifically, for feature fi in the l-th block, we can compute its output embedding e˜i with ei and αl∗ following Equation (4). We collect the output embeddings e˜i for all the features in F and form an output embedding matrix E˜ ∈ R N×K. We then prune non-informative embedding dimensions [wherein the dimension mask indicates respective importance levels of a plurality of embedding elements comprised in each of the set of embedding vectors.] in E˜ as follows: E˜ i,j = { 0, if |E˜ i,j | < e E˜ i,j , otherwise (9) where e is a threshold that can be manually tuned according to the requirements on model performance and parameter size.”).
Zhao, in view of Bansal, and Cheng are both in the same field of endeavor (i.e. recommender systems). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Zhao, in view of Bansal, and Cheng to teach the above limitation(s). The motivation for doing so is that using a mask identifies the importance of each portion of an embedding (cf. Cheng, pg. 7 col. 2, “soft selection layer that controls the significance of each embedding dimension, and optimize this layer according to model’s validation performance.”).
Regarding claim 4 and analogous claim 16, Zhao in view of Bansal and Cheng teaches the method of claim 3. 
Bansal further teaches the use of the orthogonality metric for the second objective function as the second objective function is similar to the first objective function in claim 1. 
Zhao further teaches wherein training the machine learning model comprises: performing a first training procedure on the machine learning model to update the set of model parameter values and the set of embedding vectors according to the first training objective function, (Zhao, ⁋85, “In some example embodiments, a pre-train technique [performing a first training procedure on the machine learning model] is used to enable a fair competition between the candidate embeddings. For each feature field, the equivalent weights are allocated initially on all its candidate embeddings, e.g., [0.5, 0.5]. If there are two candidate embedding dimensions, then, these initialized weights α are fixed and pre-train the DLRS parameters W including all candidate embeddings. This process ensures a fair competition between candidate embeddings when the process begins to update α.”).
Cheng further teaches and performing a second training procedure on the machine learning model to update the dimension mask and to further update the set of model parameter values and the set of embedding vectors according to a second training objective function, wherein the second training objective function is at least based on the…and the difference between the model output generated with the set of model parameter values and a ground-truth model output. (Cheng, pg. 4 col. 1, “selection layer α, our problem stated in Equation (3) can be transformed into: min α Lval(Θ∗ (α), α) [according to a second training objective function,] s.t. Θ ∗ (α) = argmin Θ Ltrain(Θ, α) ∧ αk,j ∈ [0, 1] (5) where Θ = {θ, E} represents model parameters in both the embedding layer and interaction layers [and to further update the set of model parameter values and the set of embedding vectors]. Equation 5 essentially defines a bi-level optimization problem [Colson, Marcotte, and Savard 2007], which has been studied in differentiable NAS [Liu, Simonyan, and Yang 2019] and gradient-based hyperparameter optimization [Chen et al. 2019; Franceschi et al. 2018; Pedregosa 2016]. Basically, α and Θ are respectively treated as the upper-level and lowerlevel variables to be optimized [and performing a second training procedure on the machine learning model to update the dimension mask] in an interleaving way; minimizing a validation loss function is interpreted as comparing predicted outputs to ground truth outputs (i.e. wherein the second training objective function is at least based on the…and the difference between the model output generated with the set of model parameter values and a ground-truth model output.).”).
Regarding claim 5 and analogous claim 17, Zhao in view of Bansal and Cheng teaches the method of claim 4.
Cheng further teaches wherein during the first training procedure, the dimension mask is set to indicate that embedding elements comprised in the set of embedding vectors are important and are retained. (Cheng, pg. 4 col. 2, “Initialize the soft selection layer α to be an all-one matrix, and randomly initialize Θ; initializing the soft selection layer is interpreted as setting the dimension mask during the first training procedure as the soft selection layer has not been optimized by the second training procedure yet (i.e. wherein during the first training procedure, the dimension mask is set)” and Cheng, pg. 2 col. 1, “The soft selection layer directly controls the significance of each dimension of the feature embedding [to indicate that embedding elements comprised in the set of embedding vectors are important and are retained.]”).
Regarding claim 6, Zhao in view of Bansal and Cheng teaches the method of claim 4. 
Cheng further teaches wherein performing the second training procedure comprises: iteratively performing the following until the second training objective function reaches a threshold value, updating the dimension mask using a first training data batch for the machine learning model; and updating the set of model parameter values and the set of embedding vectors using a second training data batch for the machine learning model, wherein the updated dimension mask remain unchanged during the updating of the set of model parameter values and the set of embedding vectors. (Cheng, pg. 4 col. 2, “Sort features into F˜ and divide them into L blocks; 4: Initialize the soft selection layer α to be an all-one matrix, and randomly initialize Θ; // Θ = {θ, E} 5: while not converged [iteratively performing the following until the second training objective function reaches a threshold value,] do 6: Update trainable parameters Θ by descending ∇ΘLtrain(Θ, α); [and updating the set of model parameter values and the set of embedding vectors using a second training data batch for the machine learning model,] 7: Calculate the gradients of α as: −ξ∇2 α,ΘLtrain(Θ, α) · ∇αLval(Θ0 , α) + ∇αLval(Θ0 , α); // (set ξ = 0 if using first-order approximation) 8: Perform Equation (8) to normalize the gradients in α; 9: Update α by descending the gradients; the soft selection layer gets updated after the model updates are made thus, the dimension mask is not updated while the model parameters are updated (i.e. updating the dimension mask using a first training data batch for the machine learning model;…wherein the updated dimension mask remain unchanged during the updating of the set of model parameter values and the set of embedding vectors.), and then clip its values into the range of [0, 1]; 10: end”).
Regarding claim 7 and analogous claim 18, Zhao in view of Bansal and Cheng teaches the method of claim 4. Cheng further teaches:
wherein the dimension mask comprises a plurality of mask elements corresponding to a plurality of embedding elements comprised in each of the set of embedding vectors, (Cheng, pg. 2 col. 1, “feature embedding is fed into the soft selection layer to perform an element-wise multiplication [wherein the dimension mask comprises a plurality of mask elements corresponding to a plurality of embedding elements comprised in each of the set of embedding vectors,] with a scaling vector.”).
each mask element having either a first value to indicate that the corresponding embedding element is important and is retained or a second value to indicate that the corresponding embedding element is pruned from each of the set of embedding vectors. (Cheng, pg. 4 col. 2 and see Equation 9, “We then prune non-informative embedding dimensions in E˜ as follows: E˜ i,j = { 0, if |E˜ i,j | < e E˜ i,j , otherwise (9) where e is a threshold that can be manually tuned according to the requirements on model performance and parameter size; Equation 9 shows that the element within the dimension mask can either be assigned a 0 or non-zero value depending on a threshold and the important values are interpreted as the non-zero values (i.e. each mask element having either a first value to indicate that the corresponding embedding element is important and is retained or a second value to indicate that the corresponding embedding element is pruned from each of the set of embedding vectors.).”).
Regarding claim 11, Zhao in view of Bansal and Cheng teaches the method of claim 4. 
	Cheng further teaches:
determining a set of masked embedding vectors by masking each of the set of embedding vectors with the dimension mask; (Cheng, pg. 2 col. 1 and see Figure 1, “a soft selection layer between the embedding layer and the feature interaction layers of latent factor models. Each input feature embedding is fed into the soft selection layer to perform an element-wise multiplication with a scaling vector. The soft selection layer directly controls the significance of each dimension of the feature embedding [determining a set of masked embedding vectors by masking each of the set of embedding vectors with the dimension mask;], and it is essentially a part of model architecture which can be optimized according to model’s validation performance.”).
determining the trained machine learning model by masking, with the dimension mask, a subset of the set of model parameter values that are directly applied to an embedding vector of the set of the embedding vector; (Cheng, pg. 2 col. 1 and see Figure 1, “a soft selection layer between the embedding layer and the feature interaction layers of latent factor models. Each input feature embedding is fed into the soft selection layer to perform an element-wise multiplication with a scaling vector. The soft selection layer directly controls the significance of each dimension of the feature embedding [by masking, with the dimension mask, a subset of the set of model parameter values that are directly applied to an embedding vector of the set of the embedding vector;], and it is essentially a part of model architecture which can be optimized according to model’s validation performance [determining the trained machine learning model].”).
and providing the set of masked embedding vectors and the trained machine learning model. (Cheng, pg. 3 see Figure 1, Figure 1 shows that the set of masked embedding vectors are used in the training process of training a model (i.e. and providing the set of masked embedding vectors and the trained machine learning model.)). 

 Claims 8-9 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao, et al., US Pre-Grant Publication 2023/0124258A1 (“Zhao”) in view of Bansal, et al., Non-Patent Literature “Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?” (“Bansal”) and further in view of Cheng, et al., Non-Patent Literature “Differentiable neural input search for recommender systems” (“Cheng”) and Lemaire, et al., Non-Patent Literature “Structured Pruning of Neural Networks with Budget-Aware Regularization” (“Lemaire”).
Regarding claim 8 and analogous claim 19, Zhao in view of Bansal and Cheng teaches the method of claim 7. While the combination teaches training a machine learning model using optimized vector embeddings, an orthogonality metric, and embedding masks, the combination does not explicitly teach wherein the second training objective function is further based on a difference between the number of mask elements in the dimension mask having the first value and a target number of mask elements having the first value. 
Lemaire teaches wherein the second training objective function is further based on a difference between the number of mask elements in the dimension mask having the first value and a target number of mask elements having the first value. (Lemaire, pg. 9110 col. 1, “In our implementation, a budget is the maximum number of neurons a “hard-pruned” network is allowed to have. To compute this metric, one may replace z ∼ q(z|Φ) by its mean so feature maps with E[z|Φ] = 0 have no effect and can be removed, while the others are kept.”, Lemaire, pg. 9111 col. 2, “Our proposed prior loss is as follows: LBAR(Φ, V, a, b) = LS(Φ)f(V, a, b) (6) where (a, b) are the lower and upper budget margins, V is the current “hard-pruned” volume as computed by Eq. (4), and LS(Φ) is a differentiable approximation of V; the budget is interpreted as the difference between the number of current important values to the target number of important values as the budget controls how many important values remain after the training iterations (i.e. is further based on a difference between the number of mask elements in the dimension mask having the first value and a target number of mask elements having the first value.).”, and Lemaire, pg. 9112 col. 1, “In our case, the unpruned network is the teacher and the pruned network is the student. As such, our final loss is: (1−α)LCE(Ps, Ygt)+αT2LCE(Ps, Pt)+λLBAR(Φ, V, a, b) [wherein the second training objective function]. where λ, α and T are fixed parameters.”).
Zhao, in view of Bansal and Cheng, and Lemaire are both in the same field of endeavor (i.e. machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Zhao, in view of Bansal and Cheng, and Lemaire to teach the above limitation(s). The motivation for doing so is that pruning reduces a model’s overhead while maintaining performance (cf. Lemaire, pg. 9108 col. 2, “By removing unimportant filters from a network and retraining it, one can shrink it while maintaining good performance [10, 19].”).
Regarding claim 9, Zhao in view of Bansal, Cheng, and Lemaire teaches the method of claim 8. 
Lemaire further teaches:
wherein the dimension mask is updated based on the following: determining a first adjusted dimension mask by deleting at least one element having the second value from the dimension mask; (Lemaire, pg. 9110 col. 1, “In our implementation, a budget is the maximum number of neurons a “hard-pruned” network is allowed to have. To compute this metric, one may replace z ∼ q(z|Φ) by its mean so feature maps with E[z|Φ] = 0 have no effect and can be removed [determining a first adjusted dimension mask by deleting at least one element having the second value from the dimension mask;], while the others are kept.”).
determining a gradient of the second training objective function with respect to the adjusted dimension mask; (Lemaire, pg. 9109 col. 2, “Another budgeted pruning approach is “LearningCompression” [4], which uses the method of auxiliary coordinates [3] instead of back-propagation. Contrary to this method, our approach adopts a usual gradient descent optimization scheme; using gradient descent during budgeted pruning is interpreted as having a gradient with respect to an adjusted mask (i.e. determining a gradient of the second training objective function with respect to the adjusted dimension mask;), and does not rely on the magnitude of the weights as a surrogate of their importance.”).
and updating the dimension mask based on the determined gradient. (Lemaire, pg. 9111 col. 2, “As mentioned earlier, initializing the network with a volume that respects the budget (as required by the barrier method) leads to severe optimization issues. Instead, we iteratively shift the pruning target b during training; during the training is interpreted as using the updated gradient as the previous quote mentioned the use of gradient descent for optimization (i.e. and updating the dimension mask based on the determined gradient.). Specifically, we shift it from b = VF at the beginning, to b = B at the end (where VF is the unpruned network’s volume and B the maximum allowed budget).”).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao, et al., US Pre-Grant Publication 2023/0124258A1 (“Zhao”) in view of Bansal, et al., Non-Patent Literature “Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?” (“Bansal”) and further in view of Cheng, et al., Non-Patent Literature “Differentiable neural input search for recommender systems” (“Cheng”) and Sung, et al., Non-Patent Literature “Training Neural Networks with Fixed Sparse Masks” (“Sung”).
Regarding claim 10, Zhao in view of Bansal and Cheng teaches the method of claim 4. 
The combination further teaches the objective function is based on the orthogonality metric and the difference between the model output and a ground-truth model output, because the third objective function is interpreted as being similar to the first objective function in claim 1 but during a later training period. While the combination teaches training a machine learning model using optimized vector embeddings and an orthogonality metric, the combination does not explicitly teach a third training cycle where the masks are unchanged during the third training cycle: performing a third training procedure on the machine learning model to further update the set of model parameter values and the set of embedding vectors obtained after the second training procedure, wherein the third training…and wherein the dimension mask obtained after the second training procedure remains unchanged during the third training procedure.   
	Sung teaches:
performing a third training procedure on the machine learning model to further update the set of model parameter values and the set of embedding vectors obtained after the second training procedure, (Sung, pg. 4, “Transfer learning [36] [performing a third training procedure on the machine learning model], where a model is initialized from a pre-trained checkpoint before being finetuned on a related downstream task [to further update the set of model parameter values and the set of embedding vectors obtained after the second training procedure,], can dramatically improve performance and speed up convergence on the downstream task [12, 8, 40].”).
wherein the third training…and wherein the dimension mask obtained after the second training procedure remains unchanged during the third training procedure. (Sung, pg. 3-4, “Recall that our goal is to select a subset of parameters (or, equivalently, a sparse mask over parameters) to update over many iterations of training while keeping the remainder of the parameters fixed…Further, the fact that we re-use the mask for many iterations prevents us from having to compute Fˆ θ frequently. As we will show in section 4, we find that this simple procedure is sufficient to produce a mask that can be reused for many iterations [wherein the third training…and wherein the dimension mask obtained after the second training procedure remains unchanged during the third training procedure.]”).
Zhao, in view of Bansal and Cheng, and Sung are both in the same field of endeavor (i.e. machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Zhao, in view of Bansal and Cheng, and Sung to teach the above limitation(s). The motivation for doing so is that keeping a mask static during subsequent training reduces model overhead (cf. Sung, pg. 2, “Third, by pre-computing a mask, we avoid the computational and memory overhead that are apparent when updating the mask over the course of training.”).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao, et al., US Pre-Grant Publication 2023/0124258A1 (“Zhao”) in view of Bansal, et al., Non-Patent Literature “Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?” (“Bansal”) and further in view of Patruno, Non-Patent Literature “The Ultimate Guide to Model Retraining” (“Patruno”).
Regarding claim 12, Zhao in view of Bansal teaches the method of claim 1. While the combination teaches training a machine learning model using optimized vector embeddings and an orthogonality metric, the combination does not explicitly teach performing the steps of claim 1 with further training data: wherein training the machine learning model further comprises: determining a further set of embedding vectors for a further input field of the machine learning model, the machine learning model being constructed to map a further input sample in the further input field to a further embedding vector in the further set of embedding vectors and process the further embedding vector with the set of model parameter values to generate a model output; and training the machine learning model by updating the set of model parameter values, the set of embedding vectors, and the further set of embedding vectors according to at least the first training objective function, wherein the first training objective function is further based on an orthogonality metric between embedding vectors in the further set of embedding vectors. 
Patruno teaches determining a further set of embedding vectors for a further input field of the machine learning model, the machine learning model being constructed to map a further input sample in the further input field to a further embedding vector in the further set of embedding vectors and process the further embedding vector with the set of model parameter values to generate a model output; and training the machine learning model by updating the set of model parameter values, the set of embedding vectors, and the further set of embedding vectors according to at least the first training objective function, wherein the first training objective function is further based on an orthogonality metric between embedding vectors in the further set of embedding vectors. (Patruno, pg. 5, “Rather retraining simply refers to re-running the process that generated the previously selected model on a new training set of data. The features, model algorithm, and hyperparameter search space should all remain the same. One way to think about this is that retraining doesn’t involve any code changes. It only involves changing the training data set; retraining is interpreted as keeping the parameters and functions of a model the same but using additional data (i.e. determining a further set of embedding vectors for a further input field of the machine learning model, the machine learning model being constructed to map a further input sample in the further input field to a further embedding vector in the further set of embedding vectors and process the further embedding vector with the set of model parameter values to generate a model output; and training the machine learning model by updating the set of model parameter values, the set of embedding vectors, and the further set of embedding vectors according to at least the first training objective function, wherein the first training objective function is further based on an orthogonality metric between embedding vectors in the further set of embedding vectors.).”).
Zhao, in view of Bansal, and Patruno are both in the same field of endeavor (i.e. machine learning). It would have been obvious for a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Zhao, in view of Bansal, and Patruno to teach the above limitation(s). The motivation for doing so is that model retraining remedies model performance degradation (cf. Patruno, pg. 7, “A machine learning model’s predictive performance is expected to decline as soon as the model is deployed to production. For that reason, it’s imperative that practitioners prepare for degraded performance by setting up ML-specific monitoring solutions and workflows to enable model retraining.”).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Ranasinghe, et al., “Orthogonal Projection Loss” discloses an orthogonal projection loss which adds an orthogonality metric in the cross-entropy loss function. The orthogonal metric augments the properties of cross-entropy loss and directly enforces inter-class separation alongside intra-class clustering in the feature space.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NICHOLAS S WU whose telephone number is (571)270-0939. The examiner can normally be reached Monday - Friday 8:00 am - 4:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michelle Bechtold can be reached at 571-431-0762. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/N.S.W./Examiner, Art Unit 2148                                                                                                                                                                                                        
/MICHELLE T BECHTOLD/Supervisory Patent Examiner, Art Unit 2148
Read full office action
Prosecution Timeline

Jan 19, 2022
Application Filed
Jun 03, 2025
Non-Final Rejection mailed — §101, §103
Sep 03, 2025
Response Filed
Nov 26, 2025
Final Rejection mailed — §101, §103
Jan 26, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/231,514
Patent 12619880
METHODS, DEVICES AND MEDIA FOR RE-WEIGHTING TO IMPROVE KNOWLEDGE DISTILLATION
5y 0m to grant Granted May 05, 2026
18/882,311
Patent 12488244
APPARATUS AND METHOD FOR DATA GENERATION FOR USER ENGAGEMENT
1y 2m to grant Granted Dec 02, 2025
17/444,687
Patent 12423576
METHOD AND APPARATUS FOR UPDATING PARAMETER OF MULTI-TASK MODEL, AND STORAGE MEDIUM
4y 1m to grant Granted Sep 23, 2025
17/265,476
Patent 12361280
METHOD AND DEVICE FOR TRAINING A MACHINE LEARNING ROUTINE FOR CONTROLLING A TECHNICAL SYSTEM
4y 5m to grant Granted Jul 15, 2025
17/191,518
Patent 12354017
ALIGNING KNOWLEDGE GRAPHS USING SUBGRAPH TYPING
4y 4m to grant Granted Jul 08, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

2-3
Expected OA Rounds
51%
Grant Probability
91%
With Interview (+39.5%)
3y 11m (~0m remaining)
Median Time to Grant
Moderate
PTA Risk
Based on 43 resolved cases by this examiner. Grant probability derived from career allowance rate.