Prosecution Insights
Last updated: April 19, 2026
Application No. 18/087,877

METHODS, SYSTEMS, APPARATUSES, AND COMPUTER-READABLE MEDIA FOR DECOMPOSING A LAYER IN A NEURAL NETWORK

Final Rejection §101§103§112
Filed
Dec 23, 2022
Examiner
BOSTWICK, SIDNEY VINCENT
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Huawei Technologies Co., Ltd.
OA Round
2 (Final)
52%
Grant Probability
Moderate
3-4
OA Rounds
4y 7m
To Grant
90%
With Interview

Examiner Intelligence

Grants 52% of resolved cases
52%
Career Allow Rate
71 granted / 136 resolved
-2.8% vs TC avg
Strong +38% interview lift
Without
With
+38.2%
Interview Lift
resolved cases with interview
Typical timeline
4y 7m
Avg Prosecution
68 currently pending
Career history
204
Total Applications
across all art units

Statute-Specific Performance

§101
24.4%
-15.6% vs TC avg
§103
40.9%
+0.9% vs TC avg
§102
12.0%
-28.0% vs TC avg
§112
21.9%
-18.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 136 resolved cases

Office Action

§101 §103 §112
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Remarks This Office Action is responsive to Applicants' Amendment filed on February 6, 2026, in which claims 1, 3, 5 and 17 are currently amended. Claim 4 is cancelled. Claims 1-3 and 5-20 are currently pending. Drawings Applicant's amendments made to the drawings are acknowledged. Examiner’s objection to the drawings are maintained. Figures 7-11 submitted 2/6/2026 are low quality scans containing illegible elements. Response to Arguments The rejection to claims 3 under 35 U.S.C. § 112(b) is hereby withdrawn, as necessitated by applicant's amendments and remarks made to the rejections. Applicant’s arguments with respect to rejection of claims 1-3 and 5-20 under 35 U.S.C. 101 based on amendment have been considered, however, are not persuasive. With respect to Applicant's arguments on p. 7 of the Remarks submitted 2/6/2026 that "calculating a performance function of a processor by measuring a performance metric of the processor" is a "concrete, physical operation performed on actual hardware and tied to the targeted processor's low-level implementation", Examiner respectfully disagrees. There is nothing in the claims that ties the calculation to a low-level implementation. The claim recites a processor (a generic computer component) at a high level and performs a mental process of observation, evaluation, and judgement with the processor. For this reason the recitation of the processor is seen as mere instructions to apply the judicial exception using generic computer components. Examiner notes (MPEP 2106.07(a)(II) "employing well-known computer functions to execute an abstract idea, even when limiting the use of the idea to one particular environment, does not integrate the exception into a practical application"). The remainder of the claim is unambiguously directed towards mathematical calculations and relationships. For this reason Examiner asserts that the claim as a whole is directed towards a judicial exception, where the processor is a generic computer component for implementing the judicial exception. The performance metrics are not narrowed in such a way that it would make this interpretation unreasonable (contrary to the assertion on p. 7 of the Remarks that narrowly interprets the metrics to throughput or FLOPs). Similarly, with respect to Applicant's arguments on p. 7 of the Remarks submitted 2/6/2025 that the matrix decomposition enables compression, and thus is a technical improvement, Examiner notes that Applicant is relying explicitly on the judicial exception of matrix decomposition (mathematical calculations) to provide the technical improvement (2106.05(a) "It is important to note, the judicial exception alone cannot provide the improvement."). Examiner further notes that the model is not seen as necessarily being a computer component as it is clear from the instant claims that the model itself is comprised entirely of matrices, otherwise it would not be possible to decompose the layers into separate matrices. This further supports the interpretation that the claim is directed towards a judicial exception, where a generic processor is used to improve the judicial exception rather than the judicial exception somehow improving the generically recited processor. For at least these reasons and those further detailed below, Examiner asserts that it is reasonable and appropriate to maintain the rejection. Applicant’s arguments with respect to rejection of claims 1-3 and 5-20 under 35 U.S.C. 101 based on amendment have been considered, however, are not persuasive. With respect to Applicant's arguments on p. 9 of the Remarks submitted 2/6/2029 that "There is no discussion in Girshick of how to select the rank of decomposition [...] based on a performance function of processor, as required by claim 1", Examiner respectfully notes that instant claim 1 does not require "selecting" a rank of decomposition, nor would one of ordinary skill in the art interpret anything in instant claim 1 as being analogous to "selecting" a rank of decomposition. Specifically, instant claim 1 recites "calculating a rank of decomposition" which Girshick objectively does by way of singular value decomposition (SVD). Girshick explicitly performs SVD which by definition calculates a rank of decomposition. SVD is used on model weights in order to speed up model training and inference. Girshick provides multiple processor measurements reflecting the training improvement. With respect to Applicant's arguments on p. 9 of the Remarks submitted 2/6/2026 that Girshick does not "disclose calculating the rank of decomposition based on the performance of the [...] GPU", Examiner respectfully disagrees. Girshick explicitly computes a loss for training which is explicitly performed on the GPU, [p. 1140] "Training is a multi-stage pipeline. R-CNN first fine tunes a ConvNet on object proposals using log loss. Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax classifier learnt by fine-tuning. In the third training stage, bounding-box regressors are learned" [p. 1143] "We experiment with multi-scale training for smaller networks only, due to GPU memory limits" Training interpreted as performance function calculated by measuring a loss (performance metric) of the processor (GPU) [p. 1445] "Training time is reduced by 9×, from 84 hours to 9.5. Compared to SPPnet, Fast R CNNtrains VGG16 2.7× faster (in 9.5 vs. 25.5 hours) and tests 7× faster without truncated SVD or 10× faster with it" See also Table 4. Examiner notes that the training time (the time it takes to train the model on the GPU) could also be interpreted as a measured performance metric Girshick explicity uses SVD to speed up the training, and SVD by definition calculates a rank of decomposition. For at least these reasons Examiner asserts that it is reasonable and appropriate to maintain the rejection in view of Girshick. Claim Rejections - 35 USC § 112 Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention. Regarding claim 5, claim 5 is dependent on cancelled claim 4. Since the claim is dependent on a cancelled claim the scope of the claim cannot be determined. In the interest of further examination claim 5 is interpreted as being dependent on claim 1. Claim Rejections - 35 USC § 101 101 Rejection 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-3 and 5-20 are rejected under 35 USC § 101 because the claimed invention is directed to non-statutory subject matter. Regarding Claim 1: Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Step 1 Analysis: Claim 1 is directed to a method, which is directed to a process, one of the statutory categories. Step 2A Prong One Analysis: Claim 1 under its broadest reasonable interpretation is a series of mathematical calculations and mental processes. For example, but for the generic computer components language, the above limitations in the context of this claim encompass neural network processing, including the following: calculating a performance function of a processor by measuring a performance metric of the processor (observation, evaluation, and judgement) calculating a rank of decomposition based on a performance function of a processor (mathematical calculation and relationships), decomposing the layer into a plurality of matrices based on the rank of decomposition (mathematical calculations and relationships) replacing the layer in the AI model with the plurality of matrices to produce a compressed AI model (observation, evaluation, and judgement based on the mathematical calculations and relationships) Therefore, claim 1 recites an abstract idea which is a judicial exception. Step 2A Prong Two Analysis: Claim 1 does not recite additional elements that integrate the judicial exception into a practical application. Therefore, claim 1 is directed to a judicial exception. Step 2B Analysis: Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. For the reasons above, claim 1 is rejected as being directed to non-patentable subject matter under §101. This rejection applies equally to independent claim 17, which recites a non-transitory computer readable media, as well as to dependent claims 2, 3, 5-16, and 18-20. Independent claim 17 recites additional elements “A non-transitory computer-readable medium comprising computer program code stored thereon for decomposing a layer in an AI model, wherein the code, when executed by one or more processors, causes the one or more processors to perform a method comprising” which amounts to mere instructions to apply the judicial exception using generic computer components. The additional limitations of the dependent claims are addressed briefly below: Dependent claims 2 and 18 recite additional mathematical calculations and relationships “decomposing the layer comprises decomposing the layer using Singular Value Decomposition or Tucker decomposition” Dependent claims 3 and 19 recite additional observation, evaluation, and judgement “wherein the performance function measures floating-point operations per second, processing time, or throughput of the processor.” Dependent claim 5 recites additional mathematical calculations and relationships “wherein calculating the performance function comprises: decomposing the layer into a plurality of test matrices based on a test rank; computing a function based on the plurality of test matrices” as well as additional observation, evaluation, and judgement “the performance metric of the processor is measured while computing the function” Dependent claim 6 recites additional mathematical calculations and relationships “decomposing the layer comprises removing one or more rows or columns from the plurality of matrices, such that a number of rows or columns of at least one of the plurality of matrices equals the rank of decomposition” Dependent claim 7 recites additional mathematical calculations and relationships “the plurality of matrices comprises two matrices or three matrices” Dependent claim 8 recites additional mathematical calculations and relationships “the layer is a matrix” Dependent claim 9 recites additional mathematical calculations and relationships “the layer is a tensor” Dependent claim 10 recites additional mathematical calculations and relationships “calculating the rank of decomposition comprises maximizing a function r t ( r ) over a given range of r, wherein r is the rank of decomposition and t(r) is the performance function” Dependent claim 11 recites additional mathematical calculations and relationships “the given range of r is from m × n ( p + 1 ) × ( m + n ) ⁢ to ⁢ m × n p × ( m + n ) , wherein m is a number of rows of the matrix, n is a number of columns of the matrix, and p is a given compression ratio” Dependent claim 12 recites additional observation, evaluation, and judgement “the given range of r is determined by Empirical Variational Bayesian Matrix Factorization” Dependent claim 13 recites additional mathematical calculations and relationships “calculating the rank of decomposition comprises maximizing a function r 1 × r 2 t ⁡ ( r 1 , r 2 ) , wherein r1 is a first rank, r2 is a second rank, and t(r1, r2) is the performance function” Dependent claim 14 recites additional mathematical calculations and relationships “calculating the rank of decomposition comprises maximizing a function log(r)− log(t(r)) over a given range of r, wherein r is the rank of decomposition and t(r) is the performance function” Dependent claim 15 recites additional mathematical calculations and relationships “calculating the rank of decomposition comprises maximizing a function r t ⁡ ( r ) over a given range of r, wherein r is the rank of decomposition and t(r) is the performance function” Dependent claim 16 recites additional observation, evaluation, and judgement “the AI model is a neural network, and wherein the layer is a fully connected layer or a convolutional layer of the neural network” (Examiner notes that the simple linear regression y=m*x+b is a fully connected single layer neural network having linear activation) Dependent claim 20 recites additional instructions to apply the judicial exception using generic computer components “to calculate an inference of the AI model or to train the AI model.” Therefore, when considering the elements separately and in combination, they do not add significantly more to the inventive concept. Accordingly, claims 1-3 and 5-20 are rejected under 35 U.S.C. § 101. Claim Rejections - 35 USC § 103 The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1, 2, 5, 6-9, 16-18, and 20 are rejected under U.S.C. §102(a)(1) as anticipated by Girshick (“Fast R-CNN”, 2015). Regarding claim 1, Girshick teaches A method for decomposing a layer in an artificial intelligence (AI) model, comprising: ([p. 1443 §3] "In this technique, a layer parameterized by the u × v weight matrix W is approximately factorized as W ≈ UΣtV T (5) using SVD") calculating a performance function of a processor by measuring a performance metric of the processor; ([p. 1140] "Training is a multi-stage pipeline. R-CNN first fine tunes a ConvNet on object proposals using log loss. Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax classifier learnt by fine-tuning. In the third training stage, bounding-box regressors are learned" [p. 1143] "We experiment with multi-scale training for smaller networks only, due to GPU memory limits" Training interpreted as performance function calculated by measuring a loss (performance metric) of the processor (GPU) [p. 1445] "Training time is reduced by 9×, from 84 hours to 9.5. Compared to SPPnet, Fast R CNNtrains VGG16 2.7× faster (in 9.5 vs. 25.5 hours) and tests 7× faster without truncated SVD or 10× faster with it" See also Table 4. Examiner notes that alternatively the training time could reasonably be interpreted as a measured performance metric of the processor.) calculating a rank of decomposition based on the performance function of the processor; ([p. 1440] "Training is a multi-stage pipeline. R-CNN first fine tunes a ConvNet on object proposals using log loss. Then, it fits SVMs to ConvNet features. These SVMs act as object detectors, replacing the softmax classifier learnt by fine-tuning. In the third training stage, bounding-box regressors are learned" [p. 4] "Large fully connected layers are easily accelerated by compressing them with truncated SVD" [p. 1445] "Training time is reduced by 9×, from 84 hours to 9.5. Compared to SPPnet, Fast R CNNtrains VGG16 2.7× faster (in 9.5 vs. 25.5 hours) and tests 7× faster without truncated SVD or 10× faster with it" Girshick explicitly applies SVD for training (performance function) of the model, after applying the first loss. SVD calculates a rank of decomposition by definition.) decomposing the layer into a plurality of matrices based on the rank of decomposition; and ([p. 1443 §3] "In this technique, a layer parameterized by the u × v weight matrix W is approximately factorized as W ≈ UΣtV T (5) using SVD") replacing the layer in the AI model with the plurality of matrices to produce a compressed AI model. ([p. 1443 §3] "In this factorization, U is a u × t matrix comprising the first t left-singular vectors of W, Σt is a t × t diagonal matrix containing the top t singular values of W, and V is v × t matrix comprising the first t right-singular vectors of W. Truncated SVD reduces the parameter count from uv to t(u + v), which can be significant if t is much smaller than min(u, v). To compress a network, the single fully connected layer corresponding to W is replaced by two fully connected layers, without a non-linearity between them. The first of these layers uses the weight matrix ΣtV T (and no biases) and the second uses U (with the original biases associated with W)"). Regarding claim 2, Girshick teaches The method of claim 1, wherein decomposing the layer comprises decomposing the layer using Singular Value Decomposition or Tucker decomposition.(Girshick [p. 1443 §3] "In this technique, a layer parameterized by the u × v weight matrix W is approximately factorized as W ≈ UΣtV T (5) using SVD"). Regarding claim 5, Girshick teaches The method of claim 4, wherein calculating the performance function comprises: decomposing the layer into a plurality of test matrices based on a test rank;([p. 1443 §3] "In this factorization, U is a u × t matrix comprising the first t left-singular vectors of W, Σt is a t × t diagonal matrix containing the top t singular values of W, and V is v × t matrix comprising the first t right-singular vectors of W. Truncated SVD reduces the parameter count from uv to t(u + v), which can be significant if t is much smaller than min(u, v). To compress a network, the single fully connected layer corresponding to W is replaced by two fully connected layers, without a non-linearity between them. The first of these layers uses the weight matrix ΣtV T (and no biases) and the second uses U (with the original biases associated with W)" t is interpreted as the test rank of decomposition.) computing a function based on the plurality of test matrices; and([p. 1443 §3.1] "For each test RoI r, the forward pass outputs a class posterior probability distribution p and a set of predicted bounding-box offsets relative to r (each of the K classes gets its own refined bounding-box prediction). We assign a detection confidence to r for each object class k using the estimated probability" [p. 1443 §3] "In this factorization, U is a u × t matrix comprising the first t left-singular vectors of W, Σt is a t × t diagonal matrix containing the top t singular values of W, and V is v × t matrix comprising the first t right-singular vectors of W. Truncated SVD reduces the parameter count from uv to t(u + v), which can be significant if t is much smaller than min(u, v). To compress a network, the single fully connected layer corresponding to W is replaced by two fully connected layers, without a non-linearity between them. The first of these layers uses the weight matrix ΣtV T (and no biases) and the second uses U (with the original biases associated with W)") wherein the performance metric of the processor is measured while computing the function([p. 6] "Runtime comparison between the same models in Fast R CNN, R-CNN, and SPPnet. Fast R-CNN uses single-scale mode. SPPnet uses the five scales specified in [11]. †Timing provided by the authors of [11]. Times were measured on an Nvidia K40 GPU" See also Table 4. Training time interpreted as performance metric of a processor (GPU) reflecting performance function (training)). Regarding claim 6, Girshick teaches The method of claim 1, wherein decomposing the layer comprises removing one or more rows or columns from the plurality of matrices, such that a number of rows or columns of at least one of the plurality of matrices equals the rank of decomposition.(Girshick [p. 1443 §3] "In this factorization, U is a u × t matrix comprising the first t left-singular vectors of W, Σt is a t × t diagonal matrix containing the top t singular values of W, and V is v × t matrix comprising the first t right-singular vectors of W. Truncated SVD reduces the parameter count from uv to t(u + v), which can be significant if t is much smaller than min(u, v). To compress a network, the single fully connected layer corresponding to W is replaced by two fully connected layers, without a non-linearity between them. The first of these layers uses the weight matrix ΣtV T (and no biases) and the second uses U (with the original biases associated with W)" t is the rank of decomposition.). Regarding claim 7, Girshick teaches The method of claim 1, wherein the plurality of matrices comprises two matrices or three matrices.(Girshick [p. 1443 §3] "In this factorization, U is a u × t matrix comprising the first t left-singular vectors of W, Σt is a t × t diagonal matrix containing the top t singular values of W, and V is v × t matrix comprising the first t right-singular vectors of W. Truncated SVD reduces the parameter count from uv to t(u + v), which can be significant if t is much smaller than min(u, v). To compress a network, the single fully connected layer corresponding to W is replaced by two fully connected layers, without a non-linearity between them. The first of these layers uses the weight matrix ΣtV T (and no biases) and the second uses U (with the original biases associated with W)" U,Σ, and V is three matrices.). Regarding claim 8, Girshick teaches The method of claim 1, wherein the layer is a matrix.(Girshick [p. 1443 §3.1] "a layer parameterized by the u × v weight matrix W" A matrix is a 2nd-order tensor.). Regarding claim 9, Girshick teaches The method of claim 1, wherein the layer is a tensor.(Girshick [p. 1443 §3.1] "a layer parameterized by the u × v weight matrix W" A matrix is a 2nd-order tensor.). Regarding claim 16, Girshick teaches The method of claim 1, wherein the AI model is a neural network, and wherein the layer is a fully connected layer or a convolutional layer of the neural network.(Girshick [p. 1441 2.2] "the network’s last fully connected layer and softmax (which were trained for 1000-way ImageNet classification) are replaced with the two sibling layers described earlier (a fully connected layer and softmax over K + 1 categories and category-specific bounding-box regressors" [p. 1443 §3.1] "Large fully connected layers are easily accelerated by compressing them with truncated SVD"). Regarding claims 17 and 18, claims 17 and 18 are directed towards a non-transitory computer-readable medium comprising computer program code stored thereon for performing the methods of claims 1 and 2, respectively. Therefore, the rejections applied to claims 1 and 2 also apply to claims 17 and 18. Regarding claim 20, Girshick teaches Use of the compressed AI model of claim 1 to calculate an inference of the AI model or to train the AI model. (Girshick [p. 1 Abstract] "Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9× faster than R-CNN"). Claims 3, 10, 11, 15, and 19 are rejected under U.S.C. §103 as being unpatentable over the combination of Girshick and Kim (“Automatic Rank Selection for High-Speed Convolutional Neural Network”, 2018). Regarding claim 3, Girshick teaches The method of claim 1. However, Girshick doesn't explicitly teach wherein the performance function measures floating-point operations per second, processing time, or throughput of the specific processor. Kim, in the same field of endeavor, teaches The method of claim 1, wherein the performance function measures floating-point operations per second, processing time, or throughput of the specific processor. ([p. 12] "Table 2. Performance comparison. FLOPs is computed including fully-connected layers"). Girshick as well as Kim are directed towards neural network compression using rank decomposition. Therefore, Girshick as well as Kim are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Girshick with the teachings of Kim by maximizing a rank function over a given range. Kim provides as additional motivation for combination ([p. 2 §1] "the proposed model-wise greedy algorithm illustrated in Fig. 1(b) changes the rank of all layers at a time and iteratively selects a best set of rank for all kernel layers1 (i.e. rank set) maximizing the performance of the network. In this strategy, the performances of some candidate rank sets are compared to select a rank set including the immediate worst-affected layer. This allows to find a relatively optimal solution, since the candidate rank sets are composed of various combinations of the rank"). Regarding claim 10, Girshick teaches The method of claim 8. However, Girshick doesn't explicitly teach, wherein calculating the rank of decomposition comprises maximizing a function rt (r) over a given range of r, wherein r is the rank of decomposition and t(r) is the performance function. Kim, in the same field of endeavor, teaches The method of claim 8, wherein calculating the rank of decomposition comprises maximizing a function rt (r) over a given range of r, wherein r is the rank of decomposition and t(r) is the performance function. ([p. 4 §2] "we restrict the number of parameters in the decomposed kernel tensors to less than or equal to the original 4-dimensional tensor. Under this restriction, the maximum rank of each layer is given by [See Eqn. 3]" [p. 7] "To restrict Xl , we set the upper boundary r max l and lower boundary r min l , and the interval size sl of respective elements"). Girshick as well as Kim are directed towards neural network compression using rank decomposition. Therefore, Girshick as well as Kim are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Girshick with the teachings of Kim by maximizing a rank function over a given range. Kim provides as additional motivation for combination ([p. 2 §1] "the proposed model-wise greedy algorithm illustrated in Fig. 1(b) changes the rank of all layers at a time and iteratively selects a best set of rank for all kernel layers1 (i.e. rank set) maximizing the performance of the network. In this strategy, the performances of some candidate rank sets are compared to select a rank set including the immediate worst-affected layer. This allows to find a relatively optimal solution, since the candidate rank sets are composed of various combinations of the rank"). Regarding claim 11, the combination of Girshick and Kim teaches The method of claim 10, wherein the given range of r is from (m×n)/((p+1)×(m+n)) to (m×n)/(p×(m+n)), (Kim [p. 7] "We empirically set the scaling factors, δs and δm, to 0.01 and 0.1" If p=1.1111 then p*δs (lower bound) is .01 and p*δm (upper bound) is .1, in which case the range in Kim is exactly uv/p(u+v) to uv/(p+1)(u+v).) wherein m is a number of rows of the matrix, n is a number of columns of the matrix, and p is a given compression ratio.(Girshick [p. 1443 §3] "Truncated SVD reduces the parameter count from uv to t(u + v)" Gershik explicitly teaches that the compression ratio using SVD is uv/t(u+v)). Regarding claim 15, Girshick teaches The method of claim 8. However, Girshick doesn't explicitly teach, wherein calculating the rank of decomposition comprises maximizing a function rt (r) over a given range of r, wherein r is the rank of decomposition and t(r) is the performance function. Kim, in the same field of endeavor, teaches calculating the rank of decomposition comprises maximizing a function rt (r) over a given range of r, wherein r is the rank of decomposition and t(r) is the performance function. ([p. 4 §2] "we restrict the number of parameters in the decomposed kernel tensors to less than or equal to the original 4-dimensional tensor. Under this restriction, the maximum rank of each layer is given by [See Eqn. 3]" [p. 7] "To restrict Xl , we set the upper boundary r max l and lower boundary r min l , and the interval size sl of respective elements"). Girshick as well as Kim are directed towards neural network compression using rank decomposition. Therefore, Girshick as well as Kim are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Girshick with the teachings of Kim by maximizing a rank function over a given range. Kim provides as additional motivation for combination ([p. 2 §1] "the proposed model-wise greedy algorithm illustrated in Fig. 1(b) changes the rank of all layers at a time and iteratively selects a best set of rank for all kernel layers1 (i.e. rank set) maximizing the performance of the network. In this strategy, the performances of some candidate rank sets are compared to select a rank set including the immediate worst-affected layer. This allows to find a relatively optimal solution, since the candidate rank sets are composed of various combinations of the rank"). Regarding claim 19, claim 19 is directed towards a non-transitory computer-readable medium comprising computer program code stored thereon for performing the method of claim 3. Therefore, the rejection applied to claim 3 also applies to claim 19. Claim 12 is rejected under U.S.C. §103 as being unpatentable over the combination of Girshick and Kim and Astrid (“Rank Selection of CP-decomposed Convolutional Layers with Variational Bayesian Matrix Factorization”, 2018). Regarding claim 12, the combination of Girshick and Kim teaches The method of claim 10. However, the combination of Girshick and Kim doesn't explicitly teach wherein the given range of r is determined by Empirical Variational Bayesian Matrix Factorization.. Astrid, in the same field of endeavor, teaches the given range of r is determined by Empirical Variational Bayesian Matrix Factorization. ([p. 1 Abstract] "To compress with CP-decomposition, rank selection is important. In the previous approach rank selection that is based on sensitivity of each layer, the average rank of the network was still arbitrarily selected. Additionally, the rank of all layers were decided before whole process of iterative compression, while the rank of a layer can be changed after fine-tuning. Therefore, this paper proposes selecting rank of each layer using Variational Bayesian Matrix Factorization (VBMF) which is more systematic than arbitrary approach"). The combination of Girshick and Kim as well as Astrid are directed towards neural network compression using rank-decomposition. Therefore, the combination of Girshick and Kim as well as Astrid are analogous art in the same field of endeavor. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of the combination of Girshick and Kim with the teachings of Astrid by using empirical variational Bayesian matrix factorization (VBMF) for rank decomposition. Astrid provides as additional motivation for combination ([p. 349 §III] "This work achieves higher accuracy compare to previous sensitivity rank selection approach, even though it has more compression, theoretically and with Caffe CPU time. Even, the accuracy increases around 1% compare to the original network."). Claim 13 is rejected under U.S.C. §103 as being unpatentable over the combination of Girshick and Gallaugher (“A Mixture of Matrix Variate Bilinear Factor Analyzers”, 2018). Regarding claim 13, Girshick teaches The method of claim 9. However, Girshick doesn't explicitly teach, wherein calculating the rank of decomposition comprises maximizing a function r1×r2/t(r1,r2), wherein r1 is a first rank, r2 is a second rank, and t(r1, r2) is the performance function. Gallaugher, in the same field of endeavor, teaches calculating the rank of decomposition comprises maximizing a function r1×r2/t(r1,r2), wherein r1 is a first rank, r2 is a second rank, and t(r1, r2) is the performance function.([p. 4 °3.1] "A MMVBFA model is derived here by extending (3). Specifically, we remove the isotropic constraint and assume that Xi = Mg + AgUigB 0 g + AgE B ig + E A ig B 0 g + Eig (4) with probability πg, for g = 1, . . . , G, where Mg is an n×p location matrix, Ag is an n×q column factor loading matrix, with q < n, Bg is a p × r row factor loading matrix, with r < p" [p. 6 §3.2] "Suppose we observe N observations X1, X2, . . . , XN then the log-likelihood is given by [See Eqn. 5] To maximize (5), the observed data is viewed as incomplete and an AECM is then to maximize (5). There are three different sources of missingness: the component memberships z1, . . . , zn as well as the latent matrix variables Y B ig and Y A ig . A three-stage AECM algorithm is now described for parameter estimation"). Girshick as well as Gallaugher are directed towards applications of singular value decomposition. Therefore, Girshick as well as Gallaugher are reasonably pertinent analogous art. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Girshick with the teachings of Gallaugher by using the bilinear factorization for compression. Gallaugher enables layer compression while keeping the two-dimensional layer matrices intact. Gallaugher provides as additional motivation for combination ([p. 1 §1] "Matrix variate distributions have been shown to be useful for modelling three-way data such as images and multivariate longitudinal data; however, the methods presented in the literature suffer from dimensionality concerns. In this paper, we present a mixture of matrix variate bilinear factor analyzers (MMVBFA) model for use in clustering higher dimensional matrix variate data. The matrix variate bilinear factor analyzers model can be viewed as a generalization of bilinear principal component analysis"). Claim 14 is rejected under U.S.C. §103 as being unpatentable over the combination of Girshick and Minka (“Automatic choice of dimensionality for PCA”, 2000). Regarding claim 14, Girshick teaches The method of claim 8. However, Girshick doesn't explicitly teach calculating the rank of decomposition comprises maximizing a function log(r)− log(t(r)) over a given range of r, wherein r is the rank of decomposition and t(r) is the performance function. Minka, in the same field of endeavor, teaches calculating the rank of decomposition comprises maximizing a function log(r)− log(t(r)) over a given range of r, wherein r is the rank of decomposition and t(r) is the performance function. ([p. 2 Eqn. 6] "p(DIM) = fo p(DIO)p(OIM)dO" [p. 2 §3.1] "For the PCA model, we want to select the subspace dimensionality k. To do this, we compute the probability of the data for each possible dimensionality and pick the maximum" PCA interpreted as relying on SVD. Dimensionality k interpreted as synonymous with rank r. Since log is monotonic Eqn. 6 is equivalent to maximizing log(p(D|k)) which is a tautology of log likelihood log(r)-log(r/p(D|r)) where p(D|r) is interpreted as t(r).). Girshick as well as Minka are directed towards applications of singular value decomposition. Therefore, Girshick as well as Minka are reasonably pertinent analogous art. It would have been obvious before the effective filing date of the claimed invention to combine the teachings of Girshick with the teachings of Minka by using the log likelihood rank maximization in Minka to determine the optimal compression rank in Girshick. Minka provides as additional motivation for combination ([p. 1 Abstract] "after choosing an appropriate parameterization and applying Laplace's method, an accurate and practical estimator is obtained. In simulations, it is convincingly better than cross-validation and other proposed algorithms, plus it runs much faster."). Conclusion THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to SIDNEY VINCENT BOSTWICK whose telephone number is (571)272-4720. The examiner can normally be reached M-F 7:30am-5:00pm EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on (571)270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /SIDNEY VINCENT BOSTWICK/Examiner, Art Unit 2124 /MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action

Prosecution Timeline

Dec 23, 2022
Application Filed
Nov 05, 2025
Non-Final Rejection — §101, §103, §112
Feb 06, 2026
Response Filed
Mar 14, 2026
Final Rejection — §101, §103, §112 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12561604
SYSTEM AND METHOD FOR ITERATIVE DATA CLUSTERING USING MACHINE LEARNING
2y 5m to grant Granted Feb 24, 2026
Patent 12547878
Highly Efficient Convolutional Neural Networks
2y 5m to grant Granted Feb 10, 2026
Patent 12536426
Smooth Continuous Piecewise Constructed Activation Functions
2y 5m to grant Granted Jan 27, 2026
Patent 12518143
FEEDFORWARD GENERATIVE NEURAL NETWORKS
2y 5m to grant Granted Jan 06, 2026
Patent 12505340
STASH BALANCING IN MODEL PARALLELISM
2y 5m to grant Granted Dec 23, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
52%
Grant Probability
90%
With Interview (+38.2%)
4y 7m
Median Time to Grant
Moderate
PTA Risk
Based on 136 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month