Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action has been issued in response to Applicant’s Communication of application S/N 18/272,856 filed on July 18, 2023. Claims 1 to 19 are currently pending with the application.
Priority
Acknowledgment is made of applicant's claim for priority to 371 PCT/US2022/0136426, filed on 1/25/2022 and provisional application PRO 63/145,675 file don 2/4/2021.
Claim Objections
Claims 1, 9-11 and 12 objected to because of the following informalities:
Claims 1 and 12 recite PRC-NPTN without defining the term.
Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 9 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Desai et al. (US 2021/0342580) Published on Nov. 4, 2021 in view of Pal et al. (Learning Non-Parametric Invariances from Data with Permanent Random Connectomes) Published on August 14, 2020.
As per Claim 1, A method for reducing the complexity of a neural network using PRC- NPTN layers, the method comprising:
training network in accordance with a training protocol; (See Para.57, describing the training process of a neural network; as taught by Desai)
pruning the trained network to remove a subset of the filters in the network; ( See Para. [0011]–[0013], [0058]–[0061], [0063]–[0064], The method identifies highly correlated filter pairs and removes one filter of each pair; includes iterative removal and retraining as taught by Desai)
and fine-tuning the network by re-applying the training protocol; (See Para. [0012], [0062]. the pruned network is retrained after the iterative removal of the filter pairs” and “retrained from scratch; as taught by Desai)
Desai fails to teach the neural network having a hyperparameter G indicating a number of filters connected to each input channel and a hyperparameter CMP indicating a number of channel max pooling units and the training of a PRC-NPTN network.
On the other hand Pal et al. teaches the neural network having a hyperparameter G indicating a number of filters connected to each input channel (See page.6, fig1 description in para.2, The PRC-NPTN layer consists of a set of Nin × G filters … each of the Nin input channels connects to |G| filters.” ; as taught by Pal)
and a hyperparameter CMP indicating a number of channel max pooling units; (See page.6, fig1 description in para.2, A number of channel max pooling units randomly select a fixed number of activation maps to pool over. This is parameterized by Channel Max Pool (CMP). … pooling supports once initialized do not change through training or testing; as taught by Pal)
training of a PRC-NPTN network; (See page 6, section 3 lines 1-8 describing the training of an PRC-NPTN network; as taught by Pal)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Desai, by including the teachings of Pal because it would be reasonable to perform structured pruning within each group (ensuring at least CMP members remain per pooling unit) and then fine-tune, thereby maintaining PRC-NPTN’s functionality and invariance properties.
Claim 12 recites similar limitations to claim 1 and is rejected under the same rational.
As per Claim 9, The method of claim 1 the combination Desai and Pal teaches wherein G and CMP of the network are selected based on an application of the network and computing resources available to the network; ( See pag.6 para.2 describing the selection of G filters connected to each channel in the network also see para.1 describing how CMP=|G|, given Pal et al. show that CMP and G materially affect speed and memory (Fig. 5, Table 3), a user would select G/CMP in view of the application’s accuracy needs and the available compute/memory to achieve predictable efficiency/accuracy trade-offs as taught by Pal)
Claims 2-4, 6 and 13-15 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Desai et al. (US 2021/0342580) Published on Nov. 4, 2021 in view of Pal et al. (Learning Non-Parametric Invariances from Data with Permanent Random Connectomes) Published on August 14, 2020 and further in view of Li et al. (PRUNING FILTERS FOR EFFICIENT CONVNETS) Published on March 10, 2017.
As per Claim 2, The method of claim 1 the combination of Desai and Pal fails to teach wherein pruning the trained network comprises applying L1 pruning to the network.
On the other hand Li teaches wherein pruning the trained network comprises applying L1 pruning to the network; (See section 4.1 describing the L1-norm filter pruning; as taught by Li)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Desai and Pal, by including the teachings of Li relating to the use of L1 norms to prune filter because it would be reasonable to improve the speed of filter processing (see page.3 para.1, as taught by Li).
As per Claim 3, The method of claim 2 the combination of Desai, Pal and Li teaches wherein the filters are divided into a top segment which are retained in the network and a bottom segment which are removed from the network. (See page3, section.3 and section 3.1 describing the ranking of filters and pruning them; as taught by Li)
As per Claim 4, The method of claim 3 the combination of Desai, Pal and Li teaches wherein a filter is placed in the into the bottom segment if the L1 norm of its activation response is in a lower percentage of the total number of filters; (See page.3 section 3.1 describing the ranking of filters and pruning the smallest filters “This value gives an expectation of the magnitude of the output feature map. Filters with smaller kernel weights tend to produce feature maps with weak activations as compared to the other filters in that layer. Figure 2(a) illustrates the distribution of filters’ absolute weights sum for each convolutional layer in a VGG-16 network trained on the CIFAR-10 dataset, where the distribution varies significantly across layers. We find that pruning the smallest filters works better in comparison with pruning the same number of random or largest filters” ; as taught by Li)
As per Claim 6, The method of claim 3 the combination of Desai, Pal and Li teaches wherein further comprising: removing from the network or deactivating those filters which have been placed in the bottom segment; (See page.3 section 3.1 describing the ranking of filters and pruning the smallest filters “This value gives an expectation of the magnitude of the output feature map. Filters with smaller kernel weights tend to produce feature maps with weak activations as compared to the other filters in that layer. Figure 2(a) illustrates the distribution of filters’ absolute weights sum for each convolutional layer in a VGG-16 network trained on the CIFAR-10 dataset, where the distribution varies significantly across layers. We find that pruning the smallest filters works better in comparison with pruning the same number of random or largest filters”; as taught by Li)
Claims 13-15 and 17 recites similar limitations to claims 2-4 and 6 and are rejected under the same rational.
Claims 5, 7, 8, 16, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Desai et al. (US 2021/0342580) Published on Nov. 4, 2021 in view of Pal et al. (Learning Non-Parametric Invariances from Data with Permanent Random Connectomes) Published on August 14, 2020, in view of Li et al. (PRUNING FILTERS FOR EFFICIENT CONVNETS) Published on March 10, 2017 and further in view of Ramachandran et al. (US 2020/0364573) Published on Nov 19, 2020.
As per Claim 5, The method of claim 4 the combination of Desai, Pal and Li teaches wherein filters to be placed in the top segment and a percentage of the total number of filters which are to be placed in the bottom segment; (See page3, section.3 and section 3.1 describing the ranking of filters and pruning them; as taught by Li)
On the other hand the combination of Desai, Pal and Li fails to teach the percentage is controlled by a pruning parameter indicating a percentage of the total number of filters; ( See para.103, The pruning threshold T indicates the number (expressed here as percentage) of filters in a layer that can be pruned away from the model without creating an unacceptable impact on the accuracy of the layer. In some implementations, OSLP performs relatively well for higher threshold values. Accordingly, some implementations use a T value of 50% (or approximately 50%) for both sparse and dense CNNs; as taught by Ramachandran)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Desai, Pal and Li, by including the teachings of Ramachandran relating to the use percentage pruning of filter because it the data suggests that threshold percentages correctly selected would improve performance (see para.87 as taught by Ramachandran).
As per Claim 7, The method of claim 5 wherein pruning the network further comprises: iteratively reducing the pruning parameter and re-pruning the network such that a greater percentage of the filters are removed at each iteration until a desired trade-off between accuracy of the network and the number of remaining filters is reached; ( See para.50 and para.51, In IMP approaches, a network model is pruned one layer at a time. In some implementations, after each layer is pruned, the model is fine-tuned. This is because in some cases, pruning a layer leads to information loss and degradation of the accuracy of the CNN. Fine-tuning in this context refers to adjusting the weights of the unpruned filters to regain the accuracy (or some of the accuracy) of the CNN. In IMP, pruning of initial layers (i.e., layers closer to the input of the CNN) requires fewer epochs of fine-tuning, whereas pruning of deeper layers (i.e., layers closer to the output of the CNN) require more epochs of fine-tuning. In some implementations, IMP is cumbersome for deeper models where the number of fine-tuning epochs required to regain the accuracy (or an acceptable degree of accuracy) is unacceptably high.; as taught by Ramachandran)
As per Claim 8, The method of claim 5 wherein pruning the network further comprises: iteratively applying the pruning parameter and re-pruning the network such that additional filters are removed at each iteration until a desired trade-off between accuracy of the network and the number of remaining filters is reached; ( See para.50 and para.51, In IMP approaches, a network model is pruned one layer at a time. In some implementations, after each layer is pruned, the model is fine-tuned. This is because in some cases, pruning a layer leads to information loss and degradation of the accuracy of the CNN. Fine-tuning in this context refers to adjusting the weights of the unpruned filters to regain the accuracy (or some of the accuracy) of the CNN. In IMP, pruning of initial layers (i.e., layers closer to the input of the CNN) requires fewer epochs of fine-tuning, whereas pruning of deeper layers (i.e., layers closer to the output of the CNN) require more epochs of fine-tuning. In some implementations, IMP is cumbersome for deeper models where the number of fine-tuning epochs required to regain the accuracy (or an acceptable degree of accuracy) is unacceptably high.; as taught by Ramachandran)
Claims 16, 18 and 19 recites similar limitations to claims 5, 7 and 8 and are rejected under the same rational.
Claims 10 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Desai et al. (US 2021/0342580) Published on Nov. 4, 2021 in view of Pal et al. (Learning Non-Parametric Invariances from Data with Permanent Random Connectomes) Published on August 14, 2020 and further in view of Sandler et al. (MobileNetV2: Inverted Residuals and Linear Bottlenecks) Published on March 21, 2019.
As per Claim 10, The method of claim 9 the combination Desai and Pal fails to teach wherein a higher G and a lower CMP are selected if computing-rich environments.
On the other hand Sandler teaches wherein a higher G and a lower CMP are selected if computing-rich environments; (See section 2, States the goal of operating under mobile device resource constraints and improving efficiency compared to V1; as taught by Sandler)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Desai and Pal, by including the teachings of Sandler relating balancing of system resources using a resource-aware scaling because it would be reasonable to improve the speed of filter processing (see page.3 para.1, as taught by Li).
As per Claim 11, The method of claim 9 the combination of Desai, Pal and Sandler wherein a lower G and a higher CMP are selected for computing-constrained environments; (See section 2, States the goal of operating under mobile device resource constraints and improving efficiency compared to V1; as taught by Sandler)
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHERIEF BADAWI whose telephone number is (571)272-9782. The examiner can normally be reached Monday - Friday, 8:00am - 5:30pm, Alt Friday, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cordelia Zecher can be reached on 571-272-7771. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHERIEF BADAWI whose telephone number is (571)272-9782. The examiner can normally be reached Monday - Friday, 8:00am - 5:30pm, Alt Friday, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Cordelia Zecher can be reached on 571-272-7771. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/SHERIEF BADAWI/Supervisory Patent Examiner, Art Unit 2169