Last updated: May 29, 2026
Application No. 18/164,875
COMPRESSING NEURAL NETWORKS THROUGH UNBIASED MINIMUM VARIANCE PRUNING

Non-Final OA §101§103
Filed
Feb 06, 2023
Examiner
RAMIREZ BRAVO, BEATRIZ A
Art Unit
2146
Tech Center
2100 — Computer Architecture & Software
Assignee
Habana Labs Ltd.
OA Round
1 (Non-Final)
This examiner grants 63% of cases after interview

— +29.4% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 98 resolved cases, 2023–2026
Examiner Intelligence

RAMIREZ BRAVO, BEATRIZ A View full profile →
Grants 63% of resolved cases
Career Allowance Rate
62 granted / 98 resolved
+8.3% vs TC avg
Strong +29% interview lift
Without
With
+29.4%
Interview Lift
resolved cases with interview
Typical timeline
4y 6m
Avg Prosecution
9 currently pending
Career history
116
Total Applications
across all art units
Statute-Specific Performance

§101
5.8%
-34.2% vs TC avg
§103
89.4%
+49.4% vs TC avg
§102
1.0%
-39.0% vs TC avg
§112
3.0%
-37.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 98 resolved cases
Office Action

§101 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
Status of Claims
Claims 1-25 are currently pending examination.

Information Disclosure Statement 
The Information Disclosure Statement (IDS) submitted by Applicant on 2/6/2023 has been considered. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


	Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (abstract idea) without significantly more. 
	Regarding claim 1, 
	Step 1: Claim 1 is directed towards a method.
	Step 2A, Prong 1: Claim 1 recites the following limitations:
	determining a first pruning parameter and a second pruning parameter for pruning a deep learning tensor, a ratio of the first pruning parameter to the second pruning parameter indicating a percentage of to-be-pruned values in the deep learning tensor; (i.e., a person could, mentally or with the aid of pen and paper, determine a first pruning parameter and a second pruning parameter based on a ratio of the first pruning parameter to the second pruning parameter indicating a percentage of to be pruned values – See Paragraph [0085] of Applicant’s Specification)
extracting a vector from the deep learning tensor, the vector comprising elements that are a subset of the values in the deep learning tensor, a size of the vector equal to the second pruning parameter; (i.e., the limitation consists of a mathematical process as indicated in paragraph [0091] of Applicant’s Specification) 
determining one or more pruning probabilities for the vector, each pruning probability corresponding to a respective element in the vector; (i.e., this limitation consists of a mathematical process as indicated in paragraphs [0086]-[0087] of Applicant’s Specification)
selecting one or more elements in the vector based on the one or more pruning probabilities and a reference probability, a number of the one or more elements equal to the first pruning parameter; (i.e., this limitation is a mathematical process and indicated in paragraphs [0087]-[0091] of Applicant’s Specification)
modifying the deep learning tensor by setting a value of an unselected element of the vector to zero. (i.e., this limitation is a mathematical process as indicated by paragraphs [0095]-[0098] of Applicant’s Specification)
Therefore the claim recites an abstract idea.
	Step 2A, Prong 2: The claim recites the additional elements of “a method of compressing a deep neural network (DNN)” and “a deep learning tensor”. This element is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. (See MPEP 2106.05(f)). Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 1 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of “a method of compressing a deep neural network (DNN)” and “a deep learning tensor” to perform the steps above amounts to no more than mere instructions to apply the judicial exception using generic computer components (see MPEP 2106.05(f)). Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Considering the additional elements individually and in combination, and the claim as a whole, the additional element does not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
	
Regarding claim 2, 
	Step 2A, Prong 1: Claim 2 recites an abstract idea as inherited from claim 1. Claim 2 further recites the following limitations: 
wherein determining the one or more pruning probabilities for the vector comprises: 
determining a total value of the vector by accumulating values of the elements in the vector; (i.e., this limitation is a mathematical process as indicated in paragraphs [0088]-[0089] of Applicant’s Specification )
determining a pruning probability of an element in the vector based on a value of the element and the total value of the vector (i.e., This limitation is a mathematical process as indicated in paragraphs [0029] and [0096])
Hence, the claim recites an abstract idea.
	Step 2A, Prong 2: This claim does not recite additional elements. Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 2 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Therefore, the claim is not patent eligible. 
	
Regarding claim 3, 
	Step 2A, Prong 1: Claim 3 recites an abstract idea as inherited from claim 1. Claim 3 further recites the following limitations: 
wherein selecting the one or more elements in the vector based on the one or more pruning probabilities and the reference probability comprises: selecting the one or more elements based on a comparison of at least one of the one or more pruning probabilities and the reference probability. (i.e., this limitation is a mathematical process as indicated in paragraphs [0114]-[0115] of Applicants Specification)
Hence, the claim recites an abstract idea.
	Step 2A, Prong 2: This claim does not recite additional elements. Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 3 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Therefore, the claim is not patent eligible. 

	Regarding claim 4, 
	Step 2A, Prong 1: Claim 4 recites an abstract idea as inherited from claims 1 and 3. Claim 4 further recites the following limitations:
wherein selecting the one or more elements based on the comparison of at least one of the one or more pruning probabilities and the reference probability comprises: determining whether a pruning probability of an element is greater than the reference probability; and selecting the element based on a determination that the pruning probability of the element is greater than the reference probability. (i.e., This limitation is a mathematical process as indicated by Paragraphs [0116]-[0117] of Applicant’s Specification.)
Hence, the claim recites an abstract idea.
	Step 2A, Prong 2: This claim does not recite additional elements. Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 4 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Therefore, the claim is not patent eligible. 

	Regarding claim 5,
Step 2A, Prong 1: Claim 5 recites an abstract idea as inherited from claims 1, 3, and 4. Claim 5 further recites the following limitations:
modifying the deep learning tensor further by increasing an absolute value of the selected element. (i.e., this limitation is a mathematical process as indicated in Paragraph [0120] of the Specification – the absolute value of a selected element is increased by dividing the absolute value with the pruning probability of the selected column in a matrix)
Hence, the claim recites an abstract idea.
	Step 2A, Prong 2: This claim does not recite additional elements. Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 5 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Therefore, the claim is not patent eligible. 
		
Regarding claim 6, 
	Step 2A, Prong 1: Claim 6 recites an abstract idea as inherited from claims 1, 3, 4, and 5. Claim 6 further recites the following limitations: 
wherein increasing the absolute value of the element comprises: dividing the absolute value of the element by the pruning probability of the element. (i.e., this limitation is a mathematical process as indicated in Paragraph [0120] of the Specification)
Hence, the claim recites an abstract idea.
Step 2A, Prong 2: This claim does not recite additional elements. Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 6 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Therefore, the claim is not patent eligible. 

	Regarding claim 7, 
	Step 2A, Prong 1: Claim 7 recites an abstract idea as inherited from claim 1. Claim 7 recites the following additional limitations:
	wherein selecting the one or more elements in the vector comprises: forming a first subvector and a second subvector from the vector, the first subvector comprising at least two elements in the vector, the second subvector comprising at least two other elements in the vector; selecting a first element from the first subvector based on pruning probabilities of the at least two elements and the reference probability; and selecting a second element from the second subvector based on pruning probabilities of the at least two other elements and the reference probability. (i.e., These limitations are mathematical processes as indicated in Paragraphs [0094]-[0095] of Applicant’s Specification)
	Hence, the claim recites an abstract idea.
Step 2A, Prong 2: This claim does not recite additional elements. Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 7 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Therefore, the claim is not patent eligible.
	
Regarding claim 8, 
	Step 2A, Prong 1: Claim 8 recites an abstract idea as inherited from claims 1 and 7. 	
	Step 2A, Prong 2: Claim 8 recites the additional elements of “wherein the at least two elements have a first value and a second value, and the at least two other elements have values between the first value and the second value”. This limitation merely generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)). Hence the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 8 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of “wherein the at least two elements have a first value and a second value, and the at least two other elements have values between the first value and the second value” merely generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)). Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

	Regarding claim 9, 
	Step 2A, Prong 1: Claim 9 recites an abstract idea as inherited from claim 1. Claim 9 further recites the following limitations:
wherein selecting the one or more elements in the vector comprises: selecting a first element from the elements in the vector based on the one or more pruning probabilities and the reference probability; determining one or more new pruning probabilities based on other elements than the first element in the vector; and selecting a second element from the other elements based on the one or more new pruning probabilities and the reference probability. (i.e., these limitations recite a mathematical process as indicated in paragraphs [0094]-[0096] of Applicant’s Specification)
Hence, the claim recites an abstract idea.
Step 2A, Prong 2: This claim does not recite additional elements. Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 9 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Therefore, the claim is not patent eligible.

	Regarding claim 10, 
	Step 2A, Prong 1: Claim 10 recites an abstract idea as inherited from claim 1. 
	Step 2A, Prong 2: Claim 10 recites the following additional elements “wherein the first pruning parameter is an integer that is equal to or greater than one, and the second pruning parameter is an integer that is greater than the first pruning parameter”. This limitation merely generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)). Hence the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that are indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 10 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of “wherein the first pruning parameter is an integer that is equal to or greater than one, and the second pruning parameter is an integer that is greater than the first pruning parameter” merely generally links the use of the judicial exception to a particular technological environment or field of use (see MPEP 2106.05(h)). Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Considering the additional elements individually and in combination, and the claim as a whole, the additional elements do not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.
	
	Regarding claim 11, 
	Step 1: Claim 11 is directed towards a non-transitory computer readable medium.
	Step 2A, Prong 1: Claim 11 recites the following limitations:
determining a first pruning parameter and a second pruning parameter for pruning a deep learning tensor, a ratio of the first pruning parameter to the second pruning parameter indicating a percentage of to-be-pruned values in the deep learning tensor; (i.e., a person could, mentally or with the aid of pen and paper, determine a first pruning parameter and a second pruning parameter based on a ratio of the first pruning parameter to the second pruning parameter indicating a percentage of to be pruned values – See Paragraph [0085] of Applicant’s Specification)
extracting a vector from the deep learning tensor, the vector comprising elements that are a subset of the values in the deep learning tensor, a size of the vector equal to the second pruning parameter; (i.e., the limitation consists of a mathematical process as indicated in paragraph [0091] of Applicant’s Specification) 
determining one or more pruning probabilities for the vector, each pruning probability corresponding to a respective element in the vector; (i.e., this limitation consists of a mathematical process as indicated in paragraphs [0086]-[0087] of Applicant’s Specification)
selecting one or more elements in the vector based on the one or more pruning probabilities and a reference probability, a number of the one or more elements equal to the first pruning parameter; (i.e., this limitation is a mathematical process and indicated in paragraphs [0087]-[0091] of Applicant’s Specification)
modifying the deep learning tensor by setting a value of an unselected element of the vector to zero. (i.e., this limitation is a mathematical process as indicated by paragraphs [0095]-[0098] of Applicant’s Specification)
Therefore the claim recites an abstract idea.
	Step 2A, Prong 2: Claim 11 recites the additional elements of “one or more non-transitory computer-readable media storing instructions executable to perform operations for compressing a deep neural network (DNN)” and “a deep learning tensor”. These elements are recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. (See MPEP 2106.05(f)). Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 11 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of  “one or more non-transitory computer-readable media storing instructions executable to perform operations for compressing a deep neural network (DNN)” and “a deep learning tensor” to perform the steps above amounts to no more than mere instructions to apply the judicial exception using generic computer components (see MPEP 2106.05(f)). Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Considering the additional elements individually and in combination, and the claim as a whole, the additional element does not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

	Regarding claim 12, 
	Claim 12 recites the same and/or analogous limitations as claim 2 above. Therefore, claim 12 is rejected under the same rational as claim 2.
	
	Regarding claim 13, 
Claim 13 recites the same and/or analogous limitations as claim 3 above. Therefore, claim 13 is rejected under the same rational as claim 3.



Regarding claim 14, 
Claim 14 recites the same and/or analogous limitations as claim 4 above. Therefore, claim 14 is rejected under the same rational as claim 4.

Regarding claim 15, 
Claim 15 recites the same and/or analogous limitations as claim 5 above. Therefore, claim 15 is rejected under the same rational as claim 5.


Regarding claim 16, 
Claim 16 recites the same and/or analogous limitations as claim 6 above. Therefore, claim 16 is rejected under the same rational as claim 6.

Regarding claim 17, 
Claim 17 recites the same and/or analogous limitations as claim 7 above. Therefore, claim 17 is rejected under the same rational as claim 7.

Regarding claim 18, 
Claim 18 recites the same and/or analogous limitations as claim 8 above. Therefore, claim 18 is rejected under the same rational as claim 8.



Regarding claim 19, 
Claim 19 recites the same and/or analogous limitations as claim 9 above. Therefore, claim 19 is rejected under the same rational as claim 9.

Regarding claim 20, 
Claim 20 recites the same and/or analogous limitations as claim 10 above. Therefore, claim 20 is rejected under the same rational as claim 10.

Regarding claim 21, 
Step 1: Claim 21 is directed towards an apparatus.
Step 2A, Prong 1: Claim 21 recites the following limitations:
determining a first pruning parameter and a second pruning parameter for pruning a deep learning tensor, a ratio of the first pruning parameter to the second pruning parameter indicating a percentage of to-be-pruned values in the deep learning tensor; (i.e., a person could, mentally or with the aid of pen and paper, determine a first pruning parameter and a second pruning parameter based on a ratio of the first pruning parameter to the second pruning parameter indicating a percentage of to be pruned values – See Paragraph [0085] of Applicant’s Specification)
extracting a vector from the deep learning tensor, the vector comprising elements that are a subset of the values in the deep learning tensor, a size of the vector equal to the second pruning parameter; (i.e., the limitation consists of a mathematical process as indicated in paragraph [0091] of Applicant’s Specification) 
determining one or more pruning probabilities for the vector, each pruning probability corresponding to a respective element in the vector; (i.e., this limitation consists of a mathematical process as indicated in paragraphs [0086]-[0087] of Applicant’s Specification)
selecting one or more elements in the vector based on the one or more pruning probabilities and a reference probability, a number of the one or more elements equal to the first pruning parameter; (i.e., this limitation is a mathematical process and indicated in paragraphs [0087]-[0091] of Applicant’s Specification)
modifying the deep learning tensor by setting a value of an unselected element of the vector to zero. (i.e., this limitation is a mathematical process as indicated by paragraphs [0095]-[0098] of Applicant’s Specification)
Therefore the claim recites an abstract idea.

Step 2A, Prong 2: Claim 21 recites the additional elements of “a computer processor for executing computer program instructions; and a non-transitory computer-readable memory storing computer program instructions executable by the computer processor to perform operations comprising” and “a deep learning tensor”. These elements are recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. (See MPEP 2106.05(f)). Hence, the claim does not recite additional elements that integrate the judicial exception into a practical application. Since the claim as a whole, looking at the additional elements individually and in combination, does not contain any other additional elements that indicative of integration into a practical application, the claim is directed to an abstract idea.
	Step 2B: Claim 21 does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of “a computer processor for executing computer program instructions; and a non-transitory computer-readable memory storing computer program instructions executable by the computer processor to perform operations comprising” and “a deep learning tensor” to perform the steps above amounts to no more than mere instructions to apply the judicial exception using generic computer components (see MPEP 2106.05(f)). Hence, the claim lacks limitations which amount to significantly more than the judicial exception or an inventive concept, and is rejected. Considering the additional elements individually and in combination, and the claim as a whole, the additional element does not provide significantly more than the abstract idea. Therefore, the claim is not patent eligible.

Regarding claim 22, 
Claim 22 recites the same and/or analogous limitations as claim 2 above. Therefore, claim 22 is rejected under the same rationale as claim 2.

Regarding claim 23, 
Claim 23 recites the same and/or analogous limitations as claim 3 above. Therefore, claim 23 is rejected under the same rationale as claim 3.

Regarding claim 24, 
Claim 24 recites the same and/or analogous limitations as claim 7 above. Therefore, claim 24 is rejected under the same rationale as claim 7.

Regarding claim 25, 
Claim 25 recites the same and/or analogous limitations as claim 9 above. Therefore, claim 25 is rejected under the same rationale as claim 9.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 2, 11, 12, 21, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al. (U.S. Patent No. 11663481, filed Feb. 24, 2020 and published May 30, 2023) in view of Sather et al. (U.S. Patent No. 12536413, filed Mar. 16, 2022 and published Jan 27, 2026)

Regarding claim 1, Liu teaches a method for compressing a deep neural network (DNN), comprising: 
determining a first pruning parameter and a second pruning parameter for pruning a deep learning tensor, a ratio of the first pruning parameter to the second pruning parameter indicating a percentage of to-be-pruned values in the deep learning tensor (Liu, Col. 3, lines 44-58 teaches the neural network pruning system can utilize pruning parameters [i.e., plurality of pruning parameters reading on first and second pruning parameters] to progressively prune a convolutional neural network (or simply “neural network”). In some implementations, the pruning parameter is network size pruning parameter. For example, a network size pruning parameter can indicate the desired size of the neural network after progressively pruning. In another example, the network size pruning parameter can indicate an amount or percentage of a neural network to prune away. In some alternative implementations, the pruning parameter is a relative condition pruning parameter (or simply “relative parameter”). For example, the relative parameter indicates a pruning sensitivity ratio or a threshold relevance condition for one or more portions of the neural network, as further described below.; Liu, (35) Col. 4, lines 57-67 and Col. 5, lines 1-8 teaches the term “pruning parameter” refers to a factor indicating how to prune a neural network. For instance, a pruning parameter can correspond to a network size pruning parameter (or simply “size parameter”) that indicates a size of the pruned neural network. For example, the size parameter can indicate an amount to remove from the neural network or the final size of the neural network. In another instance, the pruning parameter can correspond to a relative condition pruning parameter (or simply “relative parameter”). For instance, the relative condition pruning parameter can indicate a pruning sensitivity ratio/rate or a threshold relevance condition for one or more portions of the neural network.; See also Fig. 2, 204, 208 and Fig. 5, 504); 
modifying the deep learning tensor by setting a value of an unselected element of the vector to zero (Liu, Col. 3 lines 30-43 teaches the neural network pruning system can initialize a convolutional neural network that includes multiple layers (i.e., batch-normalization layers) and multiple network weights. Next, the neural network pruning system can prune the convolutional neural network based on a pruning parameter across multiple iterations while jointly learning the neural network weights and scaling parameters. In particular, the neural network pruning system can iteratively update the neural network weights and scaling parameters for each portion (e.g., channel or layer) of the neural network, determine portions of the neural network that generate a scaling parameter not satisfying the pruning parameter, and modify the architecture of the neural network by removing the determined portions. [Note: Examiner is interpreting the modifying of the architecture of the neural network based on the pruning parameters to read on modifying the deep learning tensor as claimed.]; Liu, Col. 4 lines 8-12 further teaches the neural network pruning system can penalize non-zero scaling parameters and/or encourage sparseness around the scaling parameters such that non-impactful portions (e.g., layers or channels) are pruned out.; Liu, Col. 20 lines 50-65 and Col. 21 lines 1-5 further teaches the neural network pruning system can train the layer scaling parameter α to approach zero to encourage layer sparsity. As shown in FIG. 4B, as the layer scaling parameter a approaches zero, the neural network pruning system learns to utilize a greater weight of information from the previous convolutional layer (e.g., the input features 422) and less from the current convolutional layer (e.g., convolutional layer 420). As a result, when the layer scaling parameter for a current layer approaches zero, the neural network pruning system can prune out the current convolutional layer as redundant and decrease the size of the neural network. [Note: the current convolutional layer approaching zero being understood as the unselected element of the vector.]).
However, Liu does not distinctly disclose:
extracting a vector from the deep learning tensor, the vector comprising elements that are a subset of the values in the deep learning tensor, a size of the vector equal to the second pruning parameter;
determining one or more pruning probabilities for the vector, each pruning probability corresponding to a respective element in the vector;
selecting one or more elements in the vector based on the one or more pruning probabilities and a reference probability, a number of the one or more elements equal to the first pruning parameter;

Nevertheless, Sather teaches:
extracting a vector from the deep learning tensor, the vector comprising elements that are a subset of the values in the deep learning tensor, a size of the vector equal to the second pruning parameter (Sather, Col. 17, lines 19-62 teaches some embodiments use singular value decomposition (SVD) to define the decomposition of a layer. Using this process, each layer marked for decomposition is decomposed into the form of an SVD decomposition W=U diag(S)V.sup.T. The first replacement layer uses a weight tensor diag(√{square root over (S)})V.sup.T and the second replacement layer uses a weight tensor U diag (√{square root over (S)}). [See equations (7) and (8)] where x is the input to the original layer, y.sub.A is the output of layer A, and y.sub.B is the output of layer B. Additional discussion of this SVD process can be found in “Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsifiction”, by H. Yang, et al., in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 2899-2908, and which is incorporated herein by reference. Some embodiments, rather than the above formulation, move the singular values entirely into the first replacement layer [See equations (9) and (10)] With this change, V.sup.T is now the weight tensor for layer A, S is a vector of scales for layer A, and U is a weight tensor for layer B. That is, at this stage, V.sup.T represents the weight values of the first layer, but each filter in the layer is multiplied by a scale found at a corresponding entry of the vector S (or a diagonal entry of S, if treated as a matrix). This transformation makes training simpler, because the weight tensor for layer A is a learned tensor parameter rather than a function of learned parameters. In addition, the removal of the singular values from layer B simplifies training for that layer as well.; Sather, Col. 1 lines 61-67 and Col. 2, lines 1-13 teaches the decomposition, even without applying the constraints for structural sparsity, reduces the number of weight values in the decomposed layers. In some embodiments, the first set of filters for a decomposed pair of layers has fewer filters than the original layer, with these filters being the same size (i.e., having the same number of weights) and being implemented in the same manner as the original filters. The second set of filters then has the same number of filters as the original layer, but with smaller filters (e.g., 1×1 convolutions, which have fewer weights). In other embodiments, the filters of the first and second sets of filters both have fewer weights per filter than the first layer, though the first set of filters still has fewer filters than the original layer and the second set of filters has the same number of filters as the original layer. The result is that the output feature maps of the second layer (i.e., the second set of filters) have the same structure as the output feature maps of the original layer, while requiring fewer weights to be trained (and thus fewer weights to be stored by a circuit that implements the network).; Col. 20, lines 33-47 teaches the process 800 identifies (at 810) original layers (i.e., layers of the received network definition) to decompose. As described above, the original layers to decompose are manually identified (i.e., by a user of the network training system) in some embodiments, or are identified according to manually-specified characteristics (e.g., layers of at least a particular size) in other embodiments. For instance, very large layers (e.g., layers with hundreds of filters producing hundreds of output feature maps) might have a lot or redundancy that can be eliminated via layer decomposition (and subsequent filter pruning, described below). In this case, reducing the number of filters via decomposition could actually help standard training techniques (e.g., stochastic gradient descent) better explore the parameter space, leading to more accurate networks. [Note: here the network with very large layers is a second pruning parameter]); 
determining one or more pruning probabilities for the vector, each pruning probability corresponding to a respective element in the vector (Sather, Col. 20, lines 58-65 teaches irrespective of which type of decomposition is used for a particular layer, a weight tensor and a scale vector (i.e., diagonal scale matrix) is defined for the first replacement layer and a weight tensor is defined for the second replacement layer.; Sather, Col. 21 lines 55-66 teaches some embodiments use a probabilistic distribution for the ADMM projection and average the multiplier update and augmented Lagrangian over that distribution. Essentially, this method identifies the probability that a filter is assigned to a pruned or unpruned state (each filter is assigned to one of these two states). Based on this, the augmented Lagrangian is multiplied by these probabilities.; The probability distribution for each filter is the fraction of the time that the ADMM projection step assigns the filter to the pruned state. In general, a filter is more likely to be pruned if its corresponding scale is smaller. In addition, other factors accounted for in the projection step described below (e.g., if pruning the filter has a large effect on compute time, or if the filter has a larger number of weights). Some embodiments determine the probability distribution by assuming that it is a maximum-entropy distribution subject to constraints on a total pruned “energy”, the number of pruned weights, and the pruned rank for each layer.);  
selecting one or more elements in the vector based on the one or more pruning probabilities and a reference probability, a number of the one or more elements equal to the first pruning parameter (Sather, Col. 6 lines 34-52 teaches Some embodiments of the invention improve structural sparsity of a machine-trained network by decomposing one or more initial layers of the network (e.g., convolutional and/or fully-connected layers) into two successive layers and using various techniques to remove sets of weight values from the first of the two successive layers. Each of the initial layers includes a set of filters of weight values for training, and the decomposition replaces these filters with (i) a first set of filters of weight values, (ii) a set of scale values corresponding to the first set of filters, and (iii) a second set of filters. The training applies constraints that push at least some of these scale values to zero (or at least below a low threshold so that the scale can be treated as equal to zero). Because the scale values scale the weight values in the corresponding filter, all of the weight values of a filter that corresponds to a zero scale value can also be set to zero (and the filter effectively removed).; Sather, Col. 4 lines 18-36 teaches as such, some embodiments use the previously-mentioned probabilistic projection operation. Rather than projecting each scale value to either below threshold (removed) or infinity (not removed), the operation instead identifies a probability for each scale value being removed. Thus, all of the Lagrange multipliers are updated using these probabilities, and in the subsequent SGD training each scale will be pushed on with a force proportional to the probability assigned to that scale. The probability assignment uses a statistical mechanics (maximum entropy) formulation in some embodiments. It should also be noted that some embodiments apply this probabilistic projection to the later weight ternarization training. In this case, rather than projecting each weight to one of the set {0, 1, −1} or {0, α.sub.k, −α.sub.k}, the training system computes a probability projection for each weight across its possible values, and SGD training then targets this expectation value (i.e., a weighted average of the three values based on the probabilities).); 
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the neural network pruning system, as taught by Liu, to further include the probabilistic projection of network parameters and the method that identifies the probability that a filter is assigned to a pruned or unpruned state, as taught by Sather, in order to simplify training by reducing the number of weights, especially in larger layers. (Sather, Col. 1 lines 27-38 and Col. 17, lines 19-62)


	Regarding claim 2, the combination of Liu in view of Sather teaches all of the limitations of claim 1, and the combination further teaches wherein determining the one or more pruning probabilities for the vector comprises: 
determining a total value of the vector by accumulating values of the elements in the vector (Sather, Col. 34 lines 26-44 further teaches the probabilistic projection can also be extended to compute the probability that a given layer will have a multiple of C.sub.F (the circuit-specific constant used in evaluating the compute time for a layer). Because the filters need to make a coordinated decision between layers, these probabilities can be complex to compute. Some embodiments first compute the distribution for the sparsity (i.e., total weight count) constraint while ignoring the compute-time constraint to determine β and ϕ.sup.global. These embodiments then sort the filters in each layer in order of increasing E.sub.l,k and, traversing this sorted order, group filters in each layer into sets of filters needed to decrease the rank to mC.sub.F for m=ceil(rank/64)−1, . . . , 3, 2, 1. Group chemical potentials for all layers are computed in order to satisfy the compute-time constraint. This process then divides the group chemical potential for each layer by C.sub.F, divides again by n.sub.l, and assigns that as the per-layer chemical potentials. The computation of filter probabilities is then repeated with the fixed per-layer chemical potentials included.); and
 determining a pruning probability of an element in the vector based on a value of the element and the total value of the vector (Sather, Col. 21 lines 63-67 and Col. 22 lines 1-6 teaches the probability distribution for each filter is the fraction of the time that the ADMM projection step assigns the filter to the pruned state. In general, a filter is more likely to be pruned if its corresponding scale is smaller. In addition, other factors accounted for in the projection step described below (e.g., if pruning the filter has a large effect on compute time, or if the filter has a larger number of weights). Some embodiments determine the probability distribution by assuming that it is a maximum-entropy distribution subject to constraints on a total pruned “energy”, the number of pruned weights, and the pruned rank for each layer. The details of both the non-probabilistic and probabilistic ADMM formulations are described in further detail below.; Sather Col. 24 lines 5-23 further teaches as noted above, some embodiments use a probabilistic projection when enforcing these constraints, rather than an absolute projection (so that filters do not oscillate between pruned and not-pruned states). In this case, the projection stage identifies the probability that a filter is assigned to a pruned or unpruned state. The probability distribution for each filter is the fraction of the time that the projection would assign the filter to the pruned state. In general, a filter is more likely to be pruned if its corresponding scale is smaller. In addition, filters with larger numbers of weights are more likely to be pruned (to reduce the weight count) as are filters for which pruning would reduce the number of filters in a layer below the number C.sub.F and therefore reduce compute time. As mentioned, some embodiments determine the probability distribution by assuming that it is a maximum-entropy distribution subject to constraints on a total pruned “energy”, the number of pruned weights, and the pruned rank for each layer.; Sather, Col. 34 lines 26-44 further teaches the probabilistic projection can also be extended to compute the probability that a given layer will have a multiple of C.sub.F (the circuit-specific constant used in evaluating the compute time for a layer). Because the filters need to make a coordinated decision between layers, these probabilities can be complex to compute. Some embodiments first compute the distribution for the sparsity (i.e., total weight count) constraint while ignoring the compute-time constraint to determine β and ϕ.sup.global. These embodiments then sort the filters in each layer in order of increasing E.sub.l,k and, traversing this sorted order, group filters in each layer into sets of filters needed to decrease the rank to mC.sub.F for m=ceil(rank/64)−1, . . . , 3, 2, 1. Group chemical potentials for all layers are computed in order to satisfy the compute-time constraint. This process then divides the group chemical potential for each layer by C.sub.F, divides again by n.sub.l, and assigns that as the per-layer chemical potentials. The computation of filter probabilities is then repeated with the fixed per-layer chemical potentials included.).
	
Regarding claim 11, 
	Claim 11 recites the same and/or analogous limitations as claim 1 above. Therefore, it is rejected under the same rational and motivation as claim 1.
	Liu further teaches one or more non-transitory computer-readable media storing instructions executable to perform operations for compressing a deep neural network (DNN) (Liu, Col. 2, lines 1-10 teaches implementations of the present disclosure provide benefits and/or solve one or more of the foregoing or other problems in the art with systems, non-transitory computer-readable media, and methods that provide a technical improvement over existing systems by accurately and efficiently utilizing automated neural network architecture pruning.)

Regarding claim 12, 
	Claim 12 recites the same and/or analogous limitations as claim 2 above. Therefore, claim 12 is rejected under the same rationale and motivation as claim 2.

Regarding claim 21, 
	Claim 21 recites the same and/or analogous limitations as claim 1 above. Therefore, claim 21 is rejected under the same rationale and motivation as claim 1.
	Liu further recites an apparatus, comprising: a computer processor for executing computer program instructions; and a non-transitory computer-readable memory storing computer program instructions executable by the computer processor to perform operations (Liu, Col. 2, lines 1-10 teaches implementations of the present disclosure provide benefits and/or solve one or more of the foregoing or other problems in the art with systems, non-transitory computer-readable media, and methods that provide a technical improvement over existing systems by accurately and efficiently utilizing automated neural network architecture pruning.; Liu Col. 24, lines 54-64 further teaches  each of the components 606-628 of the neural network pruning system 604 can include software, hardware, or both. For example, the components 606-628 can include one or more instructions stored on a computer-readable storage medium and executable by processors of one or more computing devices, such as a client device (e.g., a mobile client device) or server device. When executed by the one or more processors, the computer-executable instructions of the neural network pruning system 604 can cause a computing device to perform the feature learning methods described herein.)

Regarding claim 22, 
	Claim 22 recites the same and/or analogous limitations as claim 2 above. Therefore, claim 22 is rejected under the same rationale and motivation as claim 2.


Claims 3, 4, 13, 14, and 23 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Sather, as applied to claim 1, and further in view of Zhao, et al., “Variational Convolutional Neural Pruning”, (2019)

	Regarding claim 3, the combination of Liu in view of Sather teaches all of the limitations of claim 1, however the combination does not distinctly disclose wherein selecting the one or more elements in the vector based on the one or more pruning probabilities and the reference probability comprises: selecting the one or more elements based on a comparison of at least one of the one or more pruning probabilities and the reference probability.
	Nevertheless, Zhao teaches wherein selecting the one or more elements in the vector based on the one or more pruning probabilities and the reference probability comprises: selecting the one or more elements based on a comparison of at least one of the one or more pruning probabilities and the reference probability (Zhao, pg. 2783, Section 3.3 teaches our goal is to fine-tune the learnable parameters with the object function. In order to remove channels, the approximate distribution should be sparse. Then the inefficient channels can be determined easily. Namely, we eliminate channels based on mean [i.e., the reference probability] and variance [i.e., variance is inherently a comparison of values with respect to the mean] of the distribution of channel saliency….The selected prior distribution has sparse property which can encourage parameters towards zero. According to this, we can straightforwardly prune ineffective layers based on channel saliency.; Zhao, Section 3.4 teaches based above section, obtained channel saliency γ obeys a gaussian distribution. Consider to the centrality property of gaussian, samples distribute around the expectation. When the expectation μ is close to zero and variance is small, the probability of variable γ [i.e., the saliency – as in the pruning probability] is close to zero. Based on this idea, we eliminate redundant channels when the optimized parameters are less than thresholds, i.e., (μ, σ) < (τ, θ) ).
	
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the neural network pruning system, as taught by Liu in view of Sather, to further include a variational Bayesian scheme for pruning convolutional neural networks, as taught by Zhao, as it can be straightforwardly inserted into off-the-shelf deep learning packages, without any special network design. This module can achieve significant reduction in network size and computational savings. (Zhao, Abstract)

	Regarding claim 4, the combination of Liu in view of Sather and Zhao teaches all of the limitations of claim 3, and the combination further teaches wherein selecting the one or more elements based on the comparison of at least one of the one or more pruning probabilities and the reference probability comprises: determining whether a pruning probability of an element is greater than the reference probability; and selecting the element based on a determination that the pruning probability of the element is greater than the reference probability (Zhao, pg. 2783, Section 3.3 teaches our goal is to fine-tune the learnable parameters with the object function. In order to remove channels, the approximate distribution should be sparse. Then the inefficient channels can be determined easily. Namely, we eliminate channels based on mean [i.e., the reference probability] and variance [i.e., a comparison of values with respect to the mean] of the distribution of channel saliency….The selected prior distribution has sparse property which can encourage parameters towards zero. According to this, we can straightforwardly prune ineffective layers based on channel saliency.; Zhao, pg. 2784, Section 3.4 teaches we optimize ELBO with the KL-divergence mentioned above to obtain the distribution of channels salience γ, where γ ∼ q(γ|φ = (μ, σ)). Then we remove redundant channel based on the following criterion. Based above section, obtained channel saliency γ obeys a gaussian distribution. Consider to the centrality property of gaussian, samples distribute around the expectation. When the expectation μ is close to zero and variance is small, the probability of variable γ is close to zero. Based on this idea, we eliminate redundant channels when the optimized parameters are less than thresholds, i.e., (μ, σ) <(τ, θ).) [Note; here the stated threshold has been understood as teaching the pruning probability is greater than a reference probability]). 
	Motivation to combine same as stated in claim 3.

Regarding claim 13, 
	Claim 13 recites the same and/or analogous limitations as claim 3 above. Therefore, claim 13 is rejected under the same rationale and motivation as claim 3.


Regarding claim 14, 
	Claim 14 recites the same and/or analogous limitations as claim 4 above. Therefore, claim 14 is rejected under the same rationale and motivation as claim 4.


	Regarding claim 23,
	Claim 23 recites the same and/or analogous limitations as claim 3 above. Therefore, claim 23 is rejected under the same rationale and motivation as claim 3.



Claims 5-6 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Sather and Zhao, as applied to claim 4, and further in view of Ye et al., “Rethinking the Smaller-Norm-Less Informative Assumption in Channel Pruning of Convolution Layers, (Feb. 2018)
	Regarding claim 5, the combination of Liu in view of Sather and Zhao teaches all of the limitations of claim 4, however the combination does not distinctly disclose modifying the deep learning tensor further by increasing an absolute value of the selected element.
Nevertheless, Ye teaches modifying the deep learning tensor further by increasing an absolute value of the selected element (Ye, pg. 6-7, Algorithm 4.2, teaches alpha is less than 1 and we divide the weights by alpha – Note: alpha is being understood as the pruning probability; Ye, pg. 6 further teaches if the model is pretrained, check the average magnitude of γs in the network, choose α such that the magnitude of rescaled γl is around 100µλlρ. We found as long as one choose those parameters in the right range of magnitudes, the optimization progress is enough robust. – Note: magnitude is considered an absolute value).
	Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the neural network pruning system, as taught by Liu in view of Sather and Zhao, to further include the algorithm, as taught by Y, as the approach is mathematically appealing from an optimization perspective and easy to reproduce. (Ye, Abstract and see also pg. 2 section 1)

Regarding claim 6, the combination of Liu in view of Sather, Zhao, and Ye teaches all of the limitations of claim 5, and the combination further teaches: 
	wherein increasing the absolute value of the element comprises: dividing the absolute value of the element by the pruning probability of the element (Ye, pg. 6, Algorithm 4.2 teaching γ as a pruning related value, and wherein we scale gamma by alpha and the weights by 1/alpha – note: alpha is being understood as the pruning probability).
	Motivation to combine same as claim 5.

Regarding claim 15,
	Claim 15 recites the same and/or analogous limitations as claim 5 above. Therefore, claim 15 is rejected under the same rationale and motivation as claim 5.

	Regarding claim 16, 
	Claim 16 recites the same and/or analogous limitations as claim 6 above. Therefore, claim 16 is rejected under the same rationale and motivation as claim 6.

	

Claims 7, 8, 9, 17, 18, 19, 24, and 25 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Sather, as applied to claim 1, and further in view of Kharaghani et al. (U.S. Patent No. 10776697, filed Apr. 18, 2017 and published Sep. 15, 2020)

	Regarding claim 7, the combination of Liu in view of Sather teaches all of the limitations of claim 1, however, the combination does not distinctly disclose wherein selecting the one or more elements in the vector comprises: forming a first subvector and a second subvector from the vector, the first subvector comprising at least two elements in the vector, the second subvector comprising at least two other elements in the vector; selecting a first element from the first subvector based on pruning probabilities of the at least two elements and the reference probability; and selecting a second element from the second subvector based on pruning probabilities of the at least two other elements and the reference probability. 
	Nevertheless, Kharaghani teaches wherein selecting the one or more elements in the vector comprises:
forming a first subvector and a second subvector from the vector, the first subvector comprising at least two elements in the vector, the second subvector comprising at least two other elements in the vector (Kharaghani, Col. 1, lines 66-67 and Col. 2, lines 1-15 teaches a method for training a neural network, the neural network comprising at least one layer comprising a plurality of input nodes, a plurality of output nodes, and a plurality of connections for connecting each one of the plurality of input nodes to each one of the plurality of output nodes. The method comprises pseudo-randomly selecting a subset of the plurality of connections, each connection of the plurality of connections having associated therewith a weight parameter and a probability of being retained in the neural network, generating output data by feeding input data over the subset of connections, computing an error between the generated output data and desired output data, and for at least one connection in the subset of connections, determining a contribution of the weight parameter to the error and updating the probability of being retained in the neural network accordingly.; Kharaghani, Col. 7, lines 25-44 teaches a subset of connections is randomly (or pseudo-randomly) chosen (step 404) with probability p (referred to herein as probability p(t)). For this purpose, in one embodiment, a binary mask matrix M(t) is randomly generated to encode the connection information such that a random number of interconnection weights are kept active while others are masked out (i.e. omitted or ignored) during propagation. The binary mask matrix is generated so as to comprise a random number of elements, which are set to zero, and a random number of elements, which are set to one. The mask matrix is generated during the training stage by randomly selecting matrix element values over a suitable distribution, such as a Gaussian distribution, a Bernoulli distribution, or the like. It should be understood that a pseudo-random distribution may also apply. As a result, a different mask matrix is generated at every iteration of the training process and applied to the interconnection weights, thereby instantiating a different connectivity.); 
selecting a first element from the first subvector based on pruning probabilities of the at least two elements and the reference probability (Kharaghani, Col. 7 lines 66-67 and Col. 8 lines 1-10 teaches Element-wise comparison of matrices P and R is then performed. In particular, each element R[i,j] of the matrix R is compared to each element P[i,j] of the probability matrix P to determine if P[i,j]>R[i,j]. The elements of the binary mask matrix M are then generated accordingly by setting M[i,j] to one if P[i,j]>R[i,j] and setting M[i,j] to zero otherwise. As discussed above, when a given mask matrix element M[i,j] is set to one, the corresponding connection is retained (i.e. included in the given iteration of the training process), whereas the connection is temporarily removed otherwise.; Kharaghani, Col. 7, lines 23-44 teaches a subset of connections is randomly (or pseudo-randomly) chosen (step 404) with probability p (referred to herein as probability p(t)). For this purpose, in one embodiment, a binary mask matrix M(t) is randomly generated to encode the connection information such that a random number of interconnection weights are kept active while others are masked out (i.e. omitted or ignored) during propagation. The binary mask matrix is generated so as to comprise a random number of elements, which are set to zero, and a random number of elements, which are set to one. The mask matrix is generated during the training stage by randomly selecting matrix element values over a suitable distribution, such as a Gaussian distribution, a Bernoulli distribution, or the like. It should be understood that a pseudo-random distribution may also apply. As a result, a different mask matrix is generated at every iteration of the training process and applied to the interconnection weights, thereby instantiating a different connectivity.); and 
selecting a second element from the second subvector based on pruning probabilities of the at least two other elements and the reference probability (Kharaghani, Col. 7 lines 66-67 and Col. 8 lines 1-10 Element-wise comparison of matrices P and R is then performed. In particular, each element R[i,j] of the matrix R is compared to each element P[i,j] of the probability matrix P to determine if P[i,j]>R[i,j]. The elements of the binary mask matrix M are then generated accordingly by setting M[i,j] to one if P[i,j]>R[i,j] and setting M[i,j] to zero otherwise. As discussed above, when a given mask matrix element M[i,j] is set to one, the corresponding connection is retained (i.e. included in the given iteration of the training process), whereas the connection is temporarily removed otherwise.; Kharaghani, Col. 7, lines 25-44 teaches a subset of connections is randomly (or pseudo-randomly) chosen (step 404) with probability p (referred to herein as probability p(t)). For this purpose, in one embodiment, a binary mask matrix M(t) is randomly generated to encode the connection information such that a random number of interconnection weights are kept active while others are masked out (i.e. omitted or ignored) during propagation. The binary mask matrix is generated so as to comprise a random number of elements, which are set to zero, and a random number of elements, which are set to one. The mask matrix is generated during the training stage by randomly selecting matrix element values over a suitable distribution, such as a Gaussian distribution, a Bernoulli distribution, or the like. It should be understood that a pseudo-random distribution may also apply. As a result, a different mask matrix is generated at every iteration of the training process and applied to the interconnection weights, thereby instantiating a different connectivity.).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the neural network pruning system, as taught by Liu in view of Sather, to further include the initialization and selection of each connection’s probability and the pruning of a network based on learned connection probabilities, as taught by Kharaghani, in order to reduce the number of connection parameters in consecutive network layers, leading to a decrease in network complexity and over-fitting. (Kharaghani, (50))

	Regarding claim 8, the combination of Liu in view of Sather and Kharaghani teaches all of the limitations of claim 7, and the combination further teaches wherein the at least two elements have a first value and a second value, and the at least two other elements have values between the first value and the second value (Kharaghani, Weight matrix W given by (7) in Col. 8, line 45 teaches a matrix with two column vectors and two row vectors each with at least two elements. That is it has a first column vector with two elements and another second vector with two other elements having values between the first value and the second value as claimed.).
	Motivation to combine same as stated in claim 7.

Regarding claim 9, the combination of Liu in view of Sather teaches all of the limitations of claim 1, however the combination does not distinctly or disclose as clearly wherein selecting the one or more elements in the vector comprises: selecting a first element from the elements in the vector based on the one or more pruning probabilities and the reference probability; determining one or more new pruning probabilities based on other elements than the first element in the vector; and selecting a second element from the other elements based on the one or more new pruning probabilities and the reference probability.

Nevertheless, Kharaghani teaches wherein selecting the one or more elements in the vector comprises: selecting a first element from the elements in the vector based on the one or more pruning probabilities and the reference probability; determining one or more new pruning probabilities based on other elements than the first element in the vector; and selecting a second element from the other elements based on the one or more new pruning probabilities and the reference probability (Kharaghani, Col. 7, lines 66-67 and Col. 8, lines 1-10 teaches Element-wise comparison of matrices P and R is then performed. In particular, each element R[i,j] of the matrix R is compared to each element P[i,j] of the probability matrix P to determine if P[i,j]>R[i,j]. The elements of the binary mask matrix M are then generated accordingly by setting M[i,j] to one if P[i,j]>R[i,j] and setting M[i,j] to zero otherwise. As discussed above, when a given mask matrix element M[i,j] is set to one, the corresponding connection is retained (i.e. included in the given iteration of the training process), whereas the connection is temporarily removed otherwise. [Note: probabilities of matrix R which are compared with those of matrix P have been understood as the reference probabilities]).
Before the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art to have modified the neural network pruning system, as taught by Liu in view of Sather, to further include the initialization and selection of each connection’s probability and the pruning of a network based on learned connection probabilities, as taught by Kharaghani, in order to reduce the number of connection parameters in consecutive network layers, leading to a decrease in network complexity and over-fitting. (Kharaghani, (50))




Regarding claim 17,
	Claim 17 recites the same and/or analogous limitations as claim 7 above. Therefore, claim 17 is rejected under the same rational and motivation as claim 7.

Regarding claim 18, 
	Claim 18 recites the same and/or analogous limitations as claim 8 above. Therefore, claim 18 is rejected under the same rationale and motivation as claim 8.

Regarding claim 19,
	Claim 19 recites the same and/or analogous limitations as claim 9 above. Therefore, claim 19 is rejected under the same rationale and motivation as claim 9.

Regarding claim 24,
	Claim 24 recites the same and/or analogous limitations as claim 7 above. Therefore, claim 24 is rejected under the same rationale and motivation as claim 7.

Regarding claim 25,
	Claim 25 recites the same and/or analogous limitations as claim 9 above. Therefore, claim 25 is rejected under the same rationale and motivation as claim 9.


Claims 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Sather, as applied to claim 1, and further in view of Shen et al. (US 20220292360 A1 filed Mar. 15, 2021 and published Sep. 15, 2022)

	Regarding claim 10, the combination of Liu in view of Sather teaches all of the limitations of claim 1, however, the combination does not distinctly disclose wherein the first pruning parameter is an integer that is equal to or greater than one, and the second pruning parameter is an integer that is greater than the first pruning parameter.
	Nevertheless, Shen teaches wherein the first pruning parameter is an integer that is equal to or greater than one, and the second pruning parameter is an integer that is greater than the first pruning parameter (Shen, [0065] teaches In at least one embodiment, a prune ratio 106, denoted by α, is a numerical value indicating a ratio of a number of neurons (e.g., nodes) of a neural network (e.g., a neural network 102) that are to be removed to a total number of neurons of said neural network, and is implemented using a data type such as an integer, floating-point number, character, string, and/or variations thereof. In at least one embodiment, for example, a prune ratio with a value of 0.3 indicates that 30% of neurons of a neural network are to be pruned, resulting in 70% of said neurons of said neural network remaining after one or more pruning processes. In at least one embodiment, a prune ratio 106 is any suitable value from a range of [0, 1] [note: here teaching the first pruning parameter equal to 1]. In at least one embodiment, a prune ratio 106 is any suitable value from any suitable range of values. [note: here teaching the second pruning parameter is greater than 1 as it can be any range of values]).

	Regarding claim 20,
	Claim 20 recites the same and/or analogous limitations as claim 10 above. Therefore, claim 20 is rejected under the same rationale and motivation as claim 10.


	Conclusion
The following prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Srinivas et al. (US 20220245457 A1) disclosing various embodiments include methods and devices for neural network pruning. Embodiments may include receiving as an input a weight tensor for a neural network, increasing a level of sparsity of the weight tensor generating a sparse weight tensor, updating the neural network using the sparse weight tensor generating an updated weight tensor, decreasing a level of sparsity of the updated weight tensor generating a dense weight tensor, increasing the level of sparsity of the dense weight tensor the dense weight tensor generating a final sparse weight tensor, and using the neural network with the final sparse weight tensor to generate inferences. Some embodiments may include increasing a level of sparsity of a first sparse weight tensor generating a second sparse weight tensor, updating the neural network using the second sparse weight tensor generating a second updated weight tensor, and decreasing the level of sparsity the second updated weight tensor.

Miret et al. (US 20220092425 A1) disclosing an apparatus to compress DNNs using filter pruning on a per-group basis.

Zhang et al. (US 20190205759 A1) disclosing a method and apparatus for compressing a neural network are provided. A specific embodiment of the method includes: acquiring a to-be-compressed trained neural network; selecting at least one layer from layers of the neural network as a to-be-compressed layer; performing following processing steps sequentially on each of the to-be-compressed layers in descending order of the number of level of the to-be-compressed layer: determining a pruning ratio based on a total number of parameters included in the to-be-compressed layer, selecting a parameter for pruning from the parameters included in the to-be-compressed layer based on the pruning ratio and a parameter value threshold, and training the pruned neural network based on a preset training sample using a machine learning method; and determining the neural network obtained after performing the processing steps on the selected at least one to-be-compressed layer as a compressed neural network, and storing the compressed neural network.


Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEATRIZ RAMIREZ BRAVO whose telephone number is 571-272-2156. The examiner can normally be reached Mon. - Fri. 7:30a.m.-5:00p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, USMAAN SAEED can be reached at 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.R.B./Examiner, Art Unit 2146                                                                                                                                                                                                        /USMAAN SAEED/Supervisory Patent Examiner, Art Unit 2146
Read full office action
Prosecution Timeline

Feb 06, 2023
Application Filed
May 19, 2026
Non-Final Rejection mailed — §101, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/535,427
Patent 12632791
SYSTEM AND METHOD FOR CONFIGURING AN ARTIFICIAL INTELLIGENCE PIPELINE
2y 5m to grant Granted May 19, 2026
18/822,390
Patent 12632704
NEURAL PROCESSING UNIT INCLUDING POST-PROCESSING UNIT
1y 8m to grant Granted May 19, 2026
18/957,896
Patent 12626134
METHOD AND APPARATUS FOR LIGHTWEIGHTING OF ARTIFICIAL INTELLIGENCE MODEL
1y 5m to grant Granted May 12, 2026
16/949,809
Patent 12619904
APPARATUS AND METHOD FOR PREDICTING TRANSFORMER STATE IN CONSIDERATION OF WHETHER OIL FILTERING IS PERFORMED
5y 5m to grant Granted May 05, 2026
16/645,425
Patent 12586348
FEATURE FUSION FOR MULTI-MODAL MACHINE LEARNING ANALYSIS
6y 0m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
63%
Grant Probability
93%
With Interview (+29.4%)
4y 6m (~1y 3m remaining)
Median Time to Grant
Low
PTA Risk
Based on 98 resolved cases by this examiner. Grant probability derived from career allowance rate.