Last updated: May 04, 2026

Application No. 18/241,294

METHOD AND SYSTEM OF COMPRESSING NEURAL NETWORK MODELS BASED ON NETWORK ARCHITECTURE DESIGN

Non-Final OA §102§103

Filed

Sep 01, 2023

Priority

Jun 26, 2023 — IN 202341042829

Examiner

SHIFERAW, ELENI A

Art Unit

2497

Tech Center

2400 — Computer Networks

Assignee

L&T Technology Services Limited

OA Round

1 (Non-Final)

This examiner grants 37% of cases after interview

— +35.5% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 132 resolved cases, 2023–2026

Examiner Intelligence

SHIFERAW, ELENI A View full profile →

Grants only 37% of cases

Career Allowance Rate

49 granted / 132 resolved

-20.9% vs TC avg

Strong +36% interview lift

Without

With

+35.5%

Interview Lift

resolved cases with interview

Typical timeline

4y 3m

Avg Prosecution

10 currently pending

Career history

142

Total Applications

across all art units

Statute-Specific Performance

§101

14.5%

-25.5% vs TC avg

§103

49.7%

+9.7% vs TC avg

§102

18.1%

-21.9% vs TC avg

§112

9.5%

-30.5% vs TC avg

Black line = Tech Center average estimate • Based on career data from 132 resolved cases

Office Action

§102 §103

Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA. Claim Rejections - 35 USC § 102 The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action: A person shall be entitled to a patent unless – (a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale , or otherwise available to the public before the effective filing date of the claimed invention. Claims 1– 2 and 6 and are rejected under 35 U.S.C. § 102(a)(1) as being anticipated by US 20220092425 A1 (Miret et al.) . Regarding claim 1 Miret et al. teaches a method of compressing a neural network model (NNM) ([0017] Embodiments of the present invention relate to a DNN compressing method that uses a learnable framework and underlying structures of DNNs to prune filters. The DNN compressing method includes expressing the underlying workload of a DNN as a sequential graph representation, passing the sequential graph representation trough a trained graph neural network (GNN), which outputs group-wise pruning ratios, and pruning filters on a per-group basis based on the group-wise pruning ratios.) , the method comprising: receiving, by a first computing device, a predefined pruning ratio and one or more device configuration of a second computing device deploying the NNM ( pars. 42-44 : [0042] … the distributer 270 receives a request for a DNN from a client device 220 through the network 240. The request may include a description of a problem that the client device 220 needs to solve. The request may also include information of the client device 220, such as information describing available computing resource on the client device. The information describing available computing resource on the client device 220 can be information indicating network bandwidth, information indicating available memory size, information indicating processing power of the client device 220, and so on …. The distributer 270 receives requests from client devices including available computing resources (memory, processing power) and instructs the DNN system 250 to generate/compress a DNN accordingly ) , wherein the NNM comprises a plurality of layers in a first sequence ( pars. 28-36 : [0028] FIG. 1 illustrates an architecture of an example DNN 100 … . The DNN 100 includes a sequence of layers comprising a plurality of convolutional layers 110 (individually referred to as “convolutional layer 110”), a plurality of pooling layers 120 (individually referred to as “pooling layer 120”), and a plurality of fully connected layers 130 (individually referred to as “fully connected layer 130”) . In other embodiments, the DNN 100 may include fewer, more, or different layers ; figs. 1 &amp; 5: s hows sequential layer architecture ; pars. 75-76: describe s nodes representing layers in sequence …. . Description of DNN architecture with convolutional, pooling, fully connected layers arranged sequentially ) ; determining, by the first computing device, filter contribution information and position wise contribution information of each of the plurality of layers based on a total number of the plurality of layers in the NNM, a total number of the plurality of filters in the NNM, and a number of filters in each of the plurality of layers ( pars. 64-69: The graph pooling module 420 groups DNN hidden layers and generates pruning ratios for layer groups based on sequential graph representations and the GNN 440 . For example, the graph pooling module 420 input a sequential graph representation of a DNN into the GNN 440. The GNN 440 has been trained, e.g., by the GNN training module 460, to receive sequential graph representations and outputs layer groups and pruning ratios of layer groups . .. . The graph pooling is learnable process that analyzes relationships between nodes (e.g., based on connectivity or dependency between the layers represented by the nodes) and clusters associated nodes into a group . The group may include multiple neighboring hidden layers in the DNN. The layer group may also include an activation between two neighboring hidden layers . The group will be provided to the pruning ratio model as an input, and the pruning ratio model outputs a pruning ratio for the group, as opposed to a pruning ratio for each layer . That way, the output (pruning ratios) of the GNN is reduced to facilitate filter pruning on a per-group basis …. In some embodiments, the graph pooling model analyzes dependency between layers. For example, a convolutional layer can depend on a preceding convolutional layer (i.e., another convolutional layer in the DNN that precedes the convolutional layer), as pruning a filter in the preceding convolutional layer can cause a filter size change in the convolutional layer . After detecting such a dependency, the graph pooling model can cluster the two convolutional layers into one group. In another example, the graph pooling model clusters two convolutional layers into one group where the outputs of two convolutional layers are concatenated. In such a case, pruning a filter in one of the convolutional layers will require the same filter in the other convolutional layer to be pruned, which results in further compression. The group may include other convolutional layers that depend on one or both of the two convolutional layers . … .. The pruning ratio represents a desired sparsity level of the hidden layers in the group. Examples of the pruning ratio include 5%, 10%, 15%, 20%, or other percentages. As the pruning ratio corresponds to a layer group that includes multiple hidden layers, the number of filter groups are reduced, compared with technologies that prune filters hidden layer by hidden layer. More information about graph pooling is described below in conjunction with FIG. 6 . ….. Graph pooling module clusters layers into groups based on sequential graph representation; pruning ratio determined for each group ; ….. The filter pruning module 430 prunes filters in DNNs based on pruning ratios output from the GNN 440. The filter pruning module 430 accesses the filters of the hidden layers in a layer group and ranks the filters. In some embodiments, the filter pruning module 430 ranks the filters based on the magnitudes of the weights of the filters . … . The filter pruning module 430 ranks the filters based on the absolute magnitude sum of each of the filters. For instance, a filter having a larger absolute magnitude sum is ranked higher. In other embodiments, the filter pruning module 430 may rank the filters in different ways …. Further, the filter pruning module 430 selects a subset of the filters based on the pruning ratio. For instance, in embodiments where the pruning ratio is 10%, the filter pruning module 430 selects 10% of the filters based on the ranking, e.g., the 10% filters that have lower absolute magnitude sum than the remaining 90% filte rs. … More information about filter pruning is described below in conjunction with FIG. 7 …. Ranking filters based on magnitude of weights ) ; determining, by the first computing device, a layer score based on a type of layer for each of the plurality of layers and a predefined scoring criteria ( pars. 63-69 : … the graph generation module 410 identifies the hidden layers and activations in a trained DNN. For each hidden layer, the graph generation module 410 generates a graph representation of the hidden layer (“node”). For instance, the graph generation module 410 identifies one or more attributes of the hidden layer. Example attributes include size of input feature map, size of output feature map, size of filter, operation identity, other attributes of the hidden layer, or some combination thereof. … The graph pooling module 420 groups DNN hidden layers and generates pruning ratios for layer groups based on sequential graph representations and the GNN 440. …. Graph features include layer attributes (type, size of input/output feature maps, kernel size) which are used by the GNN to determine pruning ratios ) ; determining, by the first computing device, a pruning control parameter of each of the plurality of layers based on the layer score, the filter contribution information and the position wise contribution information of the corresponding layers ( pars. 6 3 -69 : … the graph generation module 410 identifies one or more attributes of the hidden layer. Example attributes include size of input feature map, size of output feature map, size of filter, operation identity, other attributes of the hidden layer, or some combination thereof … The graph pooling module 420 groups DNN hidden layers and generates pruning ratios for layer groups based on sequential graph representations and the GNN 440. …… the graph pooling model analyzes dependency between layers. For example, a convolutional layer can depend on a preceding convolutional layer (i.e., another convolutional layer in the DNN that precedes the convolutional layer), as pruning a filter in the preceding convolutional layer can cause a filter size change in the convolutional layer. … [0068] In some embodiments, the graph pooling module 420 also inputs an evaluation metric into the GNN 440 in addition to the sequential graph representation. The evaluation metric includes a target accuracy of the DNN after filter pruning based on pruning ratio. The evaluation metric may also include other measures, such as time for running the GNN, available resources (e.g., computing power, memory, etc.) for running the GNN, sparsity level, and so on …. The pruning ratio may be a number representing a percentage of filters to be pruned from the hidden layers in the group. The pruning ratio may range from zero to one. The pruning ratio represents a desired sparsity level of the hidden layers in the group. Examples of the pruning ratio include 5%, 10%, 15%, 20%, or other percentages . … . More information about graph pooling is described below in conjunction with FIG. 6. ….. GNN outputs pruning ratio per group using the aggregated node/layer attributes (layer score equivalent) and group context (position-wise contribution) ) ; determining, by the first computing device, a layer-wise pruning rate of each of the plurality of layers based on the pruning control parameter and the pre-defined pruning ratio ( pars. 69-71 : [0069] The pruning ratio may be a number representing a percentage of filters to be pruned from the hidden layers in the group. The pruning ratio may range from zero to one. The pruning ratio represents a desired sparsity level of the hidden layers in the group. Examples of the pruning ratio include 5%, 10%, 15%, 20%, or other percentages . As the pruning ratio corresponds to a layer group that includes multiple hidden layers, the number of filter gro ups are reduced, compared with technologies that prune filters hidden layer by hidden layer. More information about graph pooling is described below in conjunction with FIG. 6 …… Further, the filter pruning module 430 selects a subset of the filters based on the pruning ratio. For instance, in embodiments where the pruning ratio is 10%, the filter pruning module 430 selects 10% of the filters based on the ranking, e.g., the 10% filters that have lower absolute magnitude sum than the remaining 90% filters. The filter pruning module 430 may set the magnitudes of the weights of the selected filters to zero. As a result of the filter pruning, the filter pruning module 430 increases sparsity in the hidden layers and reduces the size of the hidden layers, i.e., compresses the hidden layers. In some embodiments, the filter pruning module 430 may prune filters for other layer groups. In an embodiment, the filter pruning module 430 prunes filters for all the layer groups. More information about filter pruning is described below in conjunction with FIG. 7. …… Pruning ratio applied to ranked filters in each group/layer .) ; and compressing, by the first computing device, the NNM based on the layer-wise pruning rate ( pars. 72-82: The DNN updating module 450 updates trained DNNs with compressed hidden layers . In an example, the DNN updating module 450 replaces the hidden layers in a trained DNN with the compressed hidden layers that were generated by the filter pruning module 430 by pruning the filters of the hidden layers . … In some embodiments, the GNN training module 460 trains the GNN 440 by using techniques described above in conjunction with the training module 320 in FIG. 3. …… DNN updating module replaces original layers with compressed layers containing pruned filters .) . Regarding claim 2, Miret teaches the method of claim 1, wherein the determination of the filter contribution information comprises: determining, by the first computing device, a filter contribution score of each of the plurality of layers based on a ratio of the number of filters in a corresponding layer and the total number of filters in the NNM ( pars. 70-71 : Ranking filters in each layer based on magnitude; selection proportional to pruning ratio —requires knowledge of total filters and per-layer filter counts … figs. 1 &amp; 5 shows sequential layer architecture and repeatedly describes a sequence of layers and activations, pars. 75-76 describes nodes representing layers in sequence ) . Interpretation is consistent with spec pars. 34, 44-46 &amp; fig. 3 VGG example /plurality of layers in first sequenc e . Regarding claim 6 , Miret teaches the method of claim 1, wherein the compression of the NNM comprises: determining, by the first computing device, a first number of filters to be pruned in the plurality of layers based on the predefined pruning ratio ( pars. 69 -70: The pruning ratio may be a number representing a percentage of filters to be pruned from the hidden layers in the group. The pruning ratio may range from zero to one. The pruning ratio represents a desired sparsity level of the hidden layers in the group. Examples of the pruning ratio include 5%, 10%, 15%, 20%, or other percentages. As the pruning ratio corresponds to a layer group that includes multiple hidden layers, the number of filter groups are reduced, compared with technologies that prune filters hidden layer by hidden layer. More information about graph pooling is described below in conjunction with FIG. 6. …. The filter pruning module 430 prunes filters in DNNs based on pruning ratios output from the GNN 440. The filter pruning module 430 accesses the filters of the hidden layers in a layer group and ranks the filters. In some embodiments, the filter pruning module 430 ranks the filters based on the magnitudes of the weights of the filters. In an embodiment, the filter pruning module 430 uses the following algorithm to perform filter magnitude ranking ….. teaches deriving how many filters to prune from specified pruning ratios and selecting exact per layer filters via ranking ) ; and determining, by the first computing device, a second number of filters to be pruned in each of the plurality of layers based on the layer-wise pruning rate and the first number of filters of each of the plurality of layers ( pars. 69- 7 2 and fig. 7 : … The filter pruning module 430 uses the algorithm to determine a sum of absolute magnitudes of a filter. The filter pruning module 430 ranks the filters based on the absolute magnitude sum of each of the filters. For instance, a filter having a larger absolute magnitude sum is ranked higher. In other embodiments, the filter pruning module 430 may rank the filters in different ways. …. the filter pruning module 430 selects a subset of the filters based on the pruning ratio. For instance, in embodiments where the pruning ratio is 10%, the filter pruning module 430 selects 10% of the filters based on the ranking, e.g., the 10% filters that have lower absolute magnitude sum than the remaining 90% filters. …. Pruning ratio (global target) used to select percentage of filters per group/layer; ranking determines which specific filters to prune .) . Regarding system claims 7-8 &amp; 12 and CRM claims13-14 and 18 , the claims recite limitations as method claims 1-2 and 6 and rejected based on the same rational as claims 1-2 and 6. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis ( i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claim(s) 3 -5 , 9 -11 and 15 -17 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 20220092425 A1 (Miret et al.) in view of US20230419984 A1 ( Uhle et al.). Regarding claim 3 , Miret et al. teaches t he method of claim 1, wherein the determination of the position wise contribution information comprises: creating, by the first computing device, a first layer group, a second layer group and a third layer group of the of plurality of layers, wherein each of the first layer group, the second group and the third layer group comprises an equal number of layers based on the first sequence (pars. 64-69 and FIG. 6: The graph pooling is learnable process that analyzes relationships between nodes (e.g., based on connectivity or dependency between the layers represented by the nodes) and clusters associated nodes into a group. The group may include multiple neighboring hidden layers in the DNN. The layer group may also include an activation between two neighboring hidden layers. The group will be provided to the pruning ratio model as an input, and the pruning ratio model outputs a pruning ratio for the group, as opposed to a pruning ratio for each layer. That way, the output (pruning ratios) of the GNN is reduced to facilitate filter pruning on a per-group basis. …. the graph pooling model analyzes dependency between layers. For example, a convolutional layer can depend on a preceding convolutional layer (i.e., another convolutional layer in the DNN that precedes the convolutional layer), as pruning a filter in the preceding convolutional layer can cause a filter size change in the convolutional layer. After detecting such a dependency, the graph pooling model can cluster the two convolutional layers into one group. In another example, the graph pooling model clusters two convolutional layers into one group where the outputs of two convolutional layers are concatenated. In such a case, pruning a filter in one of the convolutional layers will require the same filter in the other convolutional layer to be pruned, which results in further compression. The group may include other convolutional layers that depend on one or both of the two convolutional layers. The group may also include activations or other types of layers associated with the two convolutional layers, such as an activation between the two convolutional layers, a pooling layer between the two convolutional layers, etc. …. Graph pooling clusters layers into groups; pruning ratio determined per group using group-level attributes; group score equivalent produced by GNN ) ; determining, by the first computing device, a group score of each of the first layer group, the second layer group and the third layer group based on a cumulative filter contribution score of each layer in the first layer group, the second layer group and the third layer group respectively and a predefined weight of each of the first layer group, the second layer group and the third layer group ( pars. 69-70 The filter pruning module 430 prunes filters in DNNs based on pruning ratios output from the GNN 440. The filter pruning module 430 accesses the filters of the hidden layers in a layer group and ranks the filters. In some embodiments, the filter pruning module 430 ranks the filters based on the magnitudes of the weights of the filters. ) ; and determining, by the first computing device, a layer-wise position score of each of the plurality of the layers based on the group score of the corresponding layer group to which the layer corresponds ([0064]–[0069], FIG. 6: Graph pooling clusters layers into groups; pruning ratio determined per group using group-level attributes; group score equivalent produced by GNN) . Miret et al. fails, however Uhle et al. teaches network layers partitioned into groups each having an equal number of layers ( see par. 128 VGG (an abbreviation of Visual Geometry Group at the University of Oxford) is a DNN with CLs with small convolutional filters of shape (3×3), stride of one and padding such that the input and output shape of each layer are equal . ) It would have been obvious to one ordinary skill in the art before the effective filing date to combine a pply the Uhle et al.’s equal sized group partitioning to Miret’s group wise pruning framework as a known technique for structuring layer groups, thereby simplifying the grouping step and enabling stable, balanced pruning decisions across the depth of the network. Equal sized partitioning is a well understood way to balance capacity and computational load among groups, and its application to Miret’s group wise pruning would predictably yield a manageable pruning schedule with reduced hyperparameter burden . Regarding claim 4 Miret and Uhle teach the method of claim 3, Miret further teaches wherein the determination of the layer-wise position score comprises: sorting layers in each of the layer groups based on the layer score, the filter contribution score, a second sequence of layers in each of the layer groups ( pars. 63-69 and 75-78 teaches constructing node features (attributes include operation identity/type, input/output sizes, kernel size) which are inputs to the GNN and its pruning ratio model; these node features are used by the learned model to produce group pruning ratios ) , and upon sorting, clustering layers in each of the layer group into a predefined number of clusters based on a predefined ratio of a cumulative layer score for the corresponding layer group ( pars. 63-69 and 75-78 : GNN effectively computes per layer/group numeric signals from layer type and attributes, … via learned/derived features ) , wherein the layer-wise position score is determined based on the predefined number of clusters, a number of layers in each cluster and the group score of the corresponding layer group pars. 64-69 : Graph pooling model clusters nodes/layers; pruning ratio model processes group-level attributes; ranking within group based on filter magnitude) . Regarding claim 5 Miret and Uhle teach the method of claim 3, Miret further teaches wherein the pruning control parameter is determined based on an average of the layer-wise position score and the filter contribution score of each of the layer in the first layer group, the second layer group and the third layer group ( pars. 64-69 discusses pruning control is embodied by the GNN pruning ratio output and subsequent per group application--(GNN outputs group pruning ratios using node features including layer type, position, connectivity; par. 69: discusses pruning ratio for group determines percentage to prune; pars. 70-71: discusses within a group, actual per layer filter removals are derived by ranking &amp; selecting bottom fraction per pruning ratio-- (filter ranking and selection): GNN integrates group-level (position) and per-layer/per-filter ranking (contribution) to output pruning ratio — equivalent to combining position score and contribution score.) . Interpretation is consistent with applicant’s disclosure pars. 39-41, 66-69 and fig. 14 . Regarding system claims 9-11 and CRM claims15-17 , the claims recite similar limitations as claims 3-5 and rejected based on the same rational as claims 3-5. Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 20220237465 A1 : PERFORMING INFERENCE AND SIGNAL-TO-NOISE RATIO BASED PRUNING TO TRAIN SPARSE NEURAL NETWORK ARCHITECTURES US 20210264278 A1 : NEURAL NETWORK ARCHITECTURE PRUNING Any inquiry concerning this communication or earlier communications from the examiner should be directed to FILLIN "Examiner name" \* MERGEFORMAT ELENI A SHIFERAW whose telephone number is FILLIN "Phone number" \* MERGEFORMAT (571)272-3867 . The examiner can normally be reached FILLIN "Work Schedule?" \* MERGEFORMAT 7-3:30 M-F . Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ELENI A SHIFERAW/ Supervisory Patent Examiner, Art Unit 2497

Read full office action

Prosecution Timeline

Sep 01, 2023

Application Filed

Mar 11, 2026

Non-Final Rejection — §102, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

10/527,570

Patent 7983414

PROTECTED CRYPTOGRAPHIC CALCULATION

6y 4m to grant Granted Jul 19, 2011

11/762,885

Patent 7984512

INTEGRATING SECURITY BY OBSCURITY WITH ACCESS CONTROL LISTS

4y 1m to grant Granted Jul 19, 2011

11/688,445

Patent 7965844

SYSTEM AND METHOD FOR PROCESSING USER DATA IN AN ENCRYPTION PIPELINE

4y 3m to grant Granted Jun 21, 2011

10/511,218

Patent 7954164

METHOD OF COPY DETECTION AND PROTECTION USING NON-STANDARD TOC ENTRIES

6y 7m to grant Granted May 31, 2011

12/498,795

Patent 7954156

METHOD TO ENHANCE PLATFORM FIRMWARE SECURITY FOR LOGICAL PARTITION DATA PROCESSING SYSTEMS BY DYNAMIC RESTRICTION OF AVAILABLE EXTERNAL INTERFACES

1y 10m to grant Granted May 31, 2011

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

37%

Grant Probability

73%

With Interview (+35.5%)

4y 3m (~1y 7m remaining)

Median Time to Grant

Low

PTA Risk

Based on 132 resolved cases by this examiner. Grant probability derived from career allowance rate.