DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/10/2025 and 01/16/2025 is/are compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Election/Restrictions
Applicant’s election without traverse of group III in the reply filed on 01/05/2026 is acknowledged. The application has pending claim(s) 9-13 (non-elected claim(s) 1-8 and 14-20 is/are withdrawn from further consideration).
Office Action Summary
Claim(s) 1-8 and 14-20 is/are withdrawn.
Claim(s) 9-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vaze et al (Low-Memory CNNs Enabling Real-Time Ultrasound Segmentation Towards Mobile Deployment) in view of Ronneberger et al (U-Net: Convolutional Networks for Biomedical Image Segmentation).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 9-13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vaze et al (Low-Memory CNNs Enabling Real-Time Ultrasound Segmentation Towards Mobile Deployment) in view of Ronneberger et al (U-Net: Convolutional Networks for Biomedical Image Segmentation).
Regarding claim(s) 9, Vaze teaches a convolutional neural network training method, wherein the method comprises:
obtaining a training set, wherein the training set comprises at least one training image (Page 1061, III. Dataset, 3rd Paragraph – 4th Paragraph: “The curated dataset, which we use to train the models in this work, consists of 2323 images […] The 2323 2D slices in the dataset were divided in an 80/20 split (1858/465 images) for training and testing”); and
training a first bottleneck structure layer in a first convolutional neural network based on the training set, and a first feature extraction layer and a first feature output layer in the first convolutional neural network (read as “Original U-Net”), to obtain a second convolutional neural network (read as “Thin CNNs”), wherein the second convolutional neural network comprises the first feature extraction layer, a second bottleneck structure layer, and the first feature output layer (Figure 1; Figure 3; Page 1062, Section B. Original U-Net; Page 1062-1063, Section C. Thin CNNs: “[…] it can be seen that by halving the number of filters in every convolutional layer (thereby halving both N and M for most layers in the architecture) the number of network parameters quarters […]”; and Page 1065, Section F. Knowledge Distillation, 3rd Paragraph: “which uses a larger teacher model (the thin, regular convolution network) to supervise the training of a lighter student network (the thin, separable convolution network))”
a feature compression layer in the second bottleneck structure layer is used to compress the first feature map, to obtain a second feature map (Figure 1; Figure 3; Equation (3); and Page 1062-1063, Section C. Thin CNNs: “[…] it can be seen that by halving the number of filters in every convolutional layer (thereby halving both N and M for most layers in the architecture) the number of network parameters quarters […]”); and
a channel quantity of the second feature map is less than a channel quantity of the first feature map (Figure 1; Figure 3; Equation (3); and Page 1062-1063, Section C. Thin CNNs: “[…] it can be seen that by halving the number of filters in every convolutional layer (thereby halving both N and M for most layers in the architecture) the number of network parameters quarters […]”).
Vaze fails to teach wherein the first feature extraction layer is used to perform feature extraction on a to-be-processed image, to obtain a first feature map. However, Ronneberger teaches wherein the first feature extraction layer is used to perform feature extraction on a to-be-processed image, to obtain a first feature map (Figure 1; and Page 4, Chapter 2. Network Architecture: “[…] repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling […]”).
Vaze teaches a convolutional neural network training method in which a convolutional neural network is trained using a training set of images to obtain feature maps and to reduce memory and computational requirements by employing a bottleneck structure having a reduced number of feature channels, including training a student network derived from a teacher network to reproduce intermediate feature representations of the teacher (See Sections IV-B and IV-F). Ronneberger teaches a well-known encoder–decoder convolutional neural network architecture (U-Net) comprising feature extraction layers that generate feature maps, a bottleneck structure formed by successive down-sampling operations, and reconstruction layers that restore feature maps to a higher resolution (See Figure 1 and Section 2).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Vaze and Ronneberger before the effective filing date of the claimed invention. The motivation for this combination of references would have predictably resulted in a trained convolutional neural network comprising feature extraction layers, a bottleneck structure layer that compresses a first feature map into a second feature map having a reduced channel quantity, and a resulting convolutional neural network, thereby yielding no more than the expected result of training an efficient convolutional neural network with reduced memory usage. This motivation for the combination of Vaze and Ronneberger is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Regarding claim(s) 10, Vaze as modified by Ronneberger teaches the method according to claim 9, where Vaze teaches wherein the training a first bottleneck structure layer in a first convolutional neural network based on the training set, and a first feature extraction layer and a first feature output layer in the first convolutional neural network, to obtain a second convolutional neural network comprises:
inputting the training set into a third convolutional neural network (read as “teacher network (U-Net)”), to obtain a first set, wherein the third convolutional neural network comprises a second feature extraction layer and a second feature output layer, a parameter of the first feature extraction layer is the same as a parameter of the second feature extraction layer, a parameter of the first feature output layer is the same as a parameter of the second feature output layer, the first set comprises a fourth feature map (read as “teacher feature map”), and the fourth feature map is obtained after the second feature extraction layer and the second feature output layer are used to perform feature extraction on the training image (Figure 1; Figure 3; Equation (5); and Page 1065-1066, Section F. Knowledge Distillation: “[…] we create a distillation loss using the teacher network’s intermediate activations фs(xi), incentivising the student network to recreate the feature maps of the larger architecture […] Intuitively, each term inside the summation takes the mean L1-distance between the sth activation of the teacher network, фs(xi), and the corresponding student activation,
ф
^
s(xi), weighted by a parameter ws. In this work, S contains the final convolutional output in each network block”, Examiner’s Note: the teacher network is considered as third convolutional neural network that also comprises feature extraction and feature output layers, operating with the same parameters as the student network);
inputting the training set into the first convolutional neural network (read as “student CNN”), to obtain a second set, wherein the second set comprises a fifth feature map (read as “student network feature map after bottleneck and output”), and the fifth feature map is obtained after the first bottleneck structure layer and the first feature output layer are used to perform feature reconstruction and processing on the second feature map (Figure 1; Figure 3; Equation (5); Page 1062-1063, Section C. Thin CNNs: “[…] it can be seen that by halving the number of filters in every convolutional layer (thereby halving both N and M for most layers in the architecture) the number of network parameters quarters […]”; and Page 1065-1066, Section F. Knowledge Distillation: “[…] we create a distillation loss using the teacher network’s intermediate activations фs(xi), incentivising the student network to recreate the feature maps of the larger architecture […] Intuitively, each term inside the summation takes the mean L1-distance between the sth activation of the teacher network, фs(xi), and the corresponding student activation,
ф
^
s(xi), weighted by a parameter ws. In this work, S contains the final convolutional output in each network block”);
calculating a loss function based on the fourth feature map in the first set and the fifth feature map in the second set (Page 1065, Equation (5)); and
updating a parameter of the first bottleneck structure layer according to the loss function, to obtain the second bottleneck structure layer and obtain the second convolutional neural network (Page 1065-1066, Section F. Knowledge Distillation: “[…] we create a distillation loss using the teacher network’s intermediate activations фs(xi), incentivising the student network to recreate the feature maps of the larger architecture […] Intuitively, each term inside the summation takes the mean L1-distance between the sth activation of the teacher network, фs(xi), and the corresponding student activation,
ф
^
s(xi), weighted by a parameter ws. In this work, S contains the final convolutional output in each network block”, Examiner’s Note: student network and teacher network, both are CNNs, wherein the teacher network supervises the student network by considering the losses of each model to optimize the student network in relation with the blocks considered as bottleneck).
Regarding claim(s) 11, Vaze as modified by Ronneberger teaches the method according to claim 10, wherein the inputting the training set into the first convolutional neural network, to obtain a second set comprises:
where Ronneberger teaches performing feature extraction on the training image using the first feature extraction layer, to obtain the first feature map (Figure 1; and Page 4, Chapter 2. Network Architecture: “[…] repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling […]”);
where Vaze teaches compressing the first feature map using a feature compression layer comprised in the first bottleneck structure layer, to obtain a sixth feature map, wherein a channel quantity of the sixth feature map is less than the channel quantity of the first feature map (Figure 1; Figure 3; Table II; and Page 1062-1063, Section C. Thin CNNs: “[…] it can be seen that by halving the number of filters in every convolutional layer (thereby halving both N and M for most layers in the architecture) the number of network parameters quarters […]”);
where Ronneberger teaches reconstructing the sixth feature map using a feature reconstruction layer comprised in the first bottleneck structure layer, to obtain a third feature map, wherein a channel quantity of the third feature map is the same as the channel quantity of the first feature map (Figure 1; and Page 4, Chapter 2. Network Architecture: “Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution") that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU”); and
processing the third feature map using the second feature output layer, to obtain the fifth feature map comprised in the second set (Figure 1; and Page 4, Chapter 2. Network Architecture: “[…] At the final layer a 1x1 convolution is used to map each 64-component feature vector to the desired number of classes. In total the network has 23 convolutional layers”).
Regarding claim(s) 12, Vaze as modified by Ronneberger teaches the method according to claim 11, where Vaze teaches wherein the calculating a loss function based on the fourth feature map (read as “teacher feature map”) in the first set and the fifth feature map (read as “student feature map”) in the second set comprises:
obtaining a first distance between the fourth feature map and the fifth feature map (Equation (5); and Page 1065-1066, Section F. Knowledge Distillation: “Here, Δ(·) refers to the mean absolute difference (L1-distance) and S is the set of all layers supervised by the distillation loss. Intuitively, each term inside the summation takes the mean L1-distance between the sth activation of the teacher network, фs(xi), and the corresponding student activation,
ф
^
s(xi), weighted by a parameter ws”);
where Ronneberger teaches obtaining a second distance between the first feature map (Figure 1; and Page 4, Chapter 2. Network Architecture: “[…] repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling […]”) and the third feature map (Figure 1; and Page 4, Chapter 2. Network Architecture: “Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution") that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU”); and
where Vaze teaches calculating the loss function based on the first distance and the second distance (Page 1065, Section F. Knowledge Distillation, 3rd Paragraph: “The distillation method involves optimising the student network with respect to the sum of two losses: a hard target loss (L1, defined by Equation (1)); and a distillation loss, which encodes information on the teacher’s internal representation”).
Regarding claim(s) 13, Vaze as modified by Ronneberger teaches the method according to claim 9, where Ronneberger teaches wherein a resolution of the first feature map is W x H, a resolution of the second feature map is W' x H', and W' x H' < W x H (Figure 1; and Page 4, Chapter 2. Network Architecture: “[…] repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling […]”).
Relevant Prior Art Directed to State of Art
Hill et al (US 2020/0104692 A1) are relevant prior art not applied in the rejection(s) above. Hill discloses a method of exploiting activation sparsity in deep neural networks, comprising: retrieving an activation tensor and a weight tensor where the activation tensor is a sparse activation tensor; generating a compressed activation tensor comprising non-zero activations of the activation tensor, where the compressed activation tensor has fewer columns than the activation tensor; and processing the compressed activation tensor and the weight tensor to generate an output tensor.
Annapureddy et al (US 2016/0217369 A1) are relevant prior art not applied in the rejection(s) above. Annapureddy discloses a method of compressing a neural network, comprising: replacing at least one layer in the neural network with a plurality of compressed layers to produce a compressed neural network; inserting nonlinearity between compressed layers of the compressed network; and fine-tuning the compressed network by updating weight values in at least one of the compressed layers.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONGBONG NAH whose telephone number is (571) 272-1361. The examiner can normally be reached M - F: 9:00 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ONEAL MISTRY can be reached on 313-446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JONGBONG NAH/Examiner, Art Unit 2674
/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674