DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/09/2024, 04/25/2024, and 11/19/2024 is/are compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Office Action Summary
Claim(s) 1-2, 4, 7-8, and 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1).
Claim(s) 3, 9, and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1), further in view of Kang et al (KR 20200037700 A; See translation provided by Examiner).
Claim(s) 5 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1) and Kang et al (KR 20200037700 A; See translation provided by Examiner), further in view of Yin et al (US 2023/0073835 A1).
Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1), further in view of Yin et al (US 2023/0073835 A1).
Claim(s) 11-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1) and Kang et al (KR 20200037700 A; See translation provided by Examiner), further in view of Price et al (US 2022/0198671 A1).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-2, 4, 7-8, and 16-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1).
Regarding claim(s) 1, 19, and 20, Mo teaches an information processing apparatus comprising:
at least one processor (Figure 1; and Page 4, Chapter 3.1, Network Architecture); and
a memory coupled to the at least one processor, the memory storing instructions that, when executed by the at least one processor (Figure 1; and Page 4, Chapter 3.1, Network Architecture), cause the at least one processor to:
input an image to a first neural network in which an attention mechanism that performs image processing for improving image quality is included (Figure 1; and Page 4, Chapter 3.1, Network Architecture; Abstract: “In this work we propose a method for blind image denoising that combines frequency domain analysis and attention mechanism, named frequency attention network (FAN) […] spatial and channel mechanisms are employed to enhance feature maps at different scales for capturing contextual information”).
Mo fails to teaches to detect a redundant attention mechanism by determining whether or not a weight of attention processing generated in a process of the image processing is active; acquire a second neural network by deleting the redundant attention mechanism detected by the detection from the first neural network; and perform machine learning with the second neural network.
However, Yamamoto teaches to detect a redundant attention mechanism by determining whether or not a weight (read as “output feature value”) of attention processing generated in a process of the image processing is active (Paragraph [0054]: “[…] The attention layer 11-1 computes a feature value (output feature value) corresponding to each of the plural channels […]”; Paragraph [0071]: “The computation section 12-1 multiplies the feature values (input feature values) input from a processing layer 21-1 by the feature values (output feature values) output from the attention layer 11-1 on a per-channel basis”; and Paragraph [0075]: “The channel selection section 14-1 selects as a redundant channel a channel satisfying a predetermined relationship between the output feature values computed by the attention layer 11-1 after the learning processing has been performed by the first learning unit 16, and a predetermined threshold value”);
acquire a second neural network by deleting the redundant attention mechanism detected by the detection from the first neural network (Figure 10; and Paragraph [0077]: “The deletion section 15-1 deletes redundant channels from the processing layer 21-1 corresponding to the attention layer 11-1. The deletion section 15-1 thereby reduces the number of channels of the processing layer 21-1 that correspond to the attention layer 11-1 (namely, changes from channels of a first number of channels to channels of a second number of channels)”); and
perform machine learning with the second neural network (Figure 10; and Paragraph [0079]: “The second learning unit 18 is connected to the neural network 20, and learning processing is performed on the neural network 20 after redundant channel deletion by the deletion section 15-1 and the deletion section 15-2.”).
Mo teaches a neural network including an attention mechanism for improving image quality. Specifically, Mo discloses a Frequency Attention Network (FAN) that combines frequency domain analysis and attention mechanisms for blind image denoising and illustrates the attention based architecture in Figure 1. Yamamoto teaches an attention layer configured to compute an output feature value corresponding to each channel. Yamamoto further teaches selecting a redundant channel based on a relationship between output feature values and a predetermined threshold and optionally based on statistical values such as averages or deviations. Lastly, Yamamoto discloses deleting the redundant channel using a deletion section and performing learning after deletion using a second learning unit.
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo and Yamamoto before the effective filing date of the claimed invention. The motivation for this combination of references would have been to reduce the number of channels in a neural network by selecting redundant channels based on output feature values and deleting such redundant channels (Yamamoto, Paragraph [0075] – [0077]). This motivation for the combination of Mo and Yamamoto is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Regarding claim(s) 2, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Yamamoto teaches wherein, in the detection, a statistic of a weight value (read as “output feature value”) of the attention processing is calculated (Paragraph [0054]: “[…] The attention layer 11-1 computes a feature value (output feature value) corresponding to each of the plural channels […]”; and Paragraph [0076]: “The statistic referred to here is a expressed as a multi-level continuous value, and is, for example, an average value and deviation, central value, or the like found from at least two items of input data”), and it is determined whether or not a weight of the attention processing is active according to the statistic (Paragraph [0075]: “The channel selection section 14-1 selects as a redundant channel a channel satisfying a predetermined relationship between the output feature values computed by the attention layer 11-1 after the learning processing has been performed by the first learning unit 16, and a predetermined threshold value”).
Regarding claim(s) 4, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Yamamoto teaches wherein, in the detection, it is detected that the attention mechanism in which there are more weights of the inactive attention processing than the weight of the active attention processing, among weights of the attention processing acquired for each attention mechanism in a case in which one or more of the images are given, is redundant (Paragraph [0075]: “The channel selection section 14-1 selects as a redundant channel a channel satisfying a predetermined relationship between the output feature values computed by the attention layer 11-1 after the learning processing has been performed by the first learning unit 16, and a predetermined threshold value”; and Paragraph [0076]: “The statistic referred to here is a expressed as a multi-level continuous value, and is, for example, an average value and deviation, central value, or the like found from at least two items of input data. The output feature values change depending on the input data to the attention layer, and finding a statistic thereof enables this dependency to be suppressed”).
Regarding claim(s) 7, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Mo teaches wherein the attention mechanism incorporated in the first neural network includes an attention mechanism that generates a weight in a spatial direction of an input feature amount (Page 7, Chapter 3.3 Spatial-Channel Attention Block (SCAB), 1st Paragraph: “we used a Spatial-Channel Attention Block to extract the features in the convolutional stream […] which we can use spatial attention mechanism to refine features map […] Meanwhile, we apply channel attention mechanism […] Spatial attention is used to extract the inter-spatial relationship of images”; and Page 8, 2nd Paragraph: “spatial attention mechanism can re-weight the feature map according to the location of the features and help the network learn where to be paid attention”).
Regarding claim(s) 8, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Mo teaches wherein the attention mechanism incorporated in the first neural network includes an attention mechanism that generates a weight in a channel direction of an input feature amount (Page 7, Chapter 3.3 Spatial-Channel Attention Block (SCAB), 1st Paragraph: “we used a Spatial-Channel Attention Block to extract the features in the convolutional stream […] which we can use spatial attention mechanism to refine features map […] Meanwhile, we apply channel attention mechanism […] Spatial attention is used to extract the inter-spatial relationship of images”; and Page 7, Chapter 3.3 Spatial-Channel Attention Block (SCAB), 3rd Paragraph: “Channel attention utilizes the squeeze and excitation operation to enhance the main features of the feature map based on the inter-channel relationship […]”).
Regarding claim(s) 16, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Mo teaches wherein the memory storing further instructions that, when executed by the at least one processor (Figure 1; and Page 4, Chapter 3.1, Network Architecture), cause the at least one processor to:
generate a plurality of feature amounts by the first and second neural networks, and restore the plurality of feature amounts as an image of a desired image processing execution result (Figure 1; Page 4, Chapter 3.1, Network Architecture; Abstract: “In this work we propose a method for blind image denoising that combines frequency domain analysis and attention mechanism, named frequency attention network (FAN) […] spatial and channel mechanisms are employed to enhance feature maps at different scales for capturing contextual information”; and Page 2, 1st Paragraph: “data-driven deep convolutional neural networks(CNNs) are increasingly applied on the image denoising task in that CNN can extract high-dimensional features of images and utilize them to restore clean images”).
Regarding claim(s) 17, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Mo teaches wherein the memory storing further instructions that, when executed by the at least one processor (Figure 1; and Page 4, Chapter 3.1, Network Architecture), cause the at least one processor to:
generate feature amounts of a plurality of resolutions by the first and second neural networks, and restore the feature amounts of the plurality of resolutions as an image of a desired image processing execution result (Figure 1; Page 4, Chapter 3.1, Network Architecture; Abstract: “In this work we propose a method for blind image denoising that combines frequency domain analysis and attention mechanism, named frequency attention network (FAN) […] spatial and channel mechanisms are employed to enhance feature maps at different scales for capturing contextual information”; and Page 2, 1st Paragraph: “data-driven deep convolutional neural networks(CNNs) are increasingly applied on the image denoising task in that CNN can extract high-dimensional features of images and utilize them to restore clean images”).
Regarding claim(s) 18, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Yamamoto teaches wherein, in the detection, activation determination for weights of a plurality of types of attention processing is performed (Paragraph [0054]: “[…] The attention layer 11-1 computes a feature value (output feature value) corresponding to each of the plural channels […]”; Paragraph [0071]: “The computation section 12-1 multiplies the feature values (input feature values) input from a processing layer 21-1 by the feature values (output feature values) output from the attention layer 11-1 on a per-channel basis”; and Paragraph [0075]: “The channel selection section 14-1 selects as a redundant channel a channel satisfying a predetermined relationship between the output feature values computed by the attention layer 11-1 after the learning processing has been performed by the first learning unit 16, and a predetermined threshold value”), and an attention mechanism determined to be inactive in any activation determination method is detected as a redundant attention mechanism (Figure 10; Paragraph [0077]: “The deletion section 15-1 deletes redundant channels from the processing layer 21-1 corresponding to the attention layer 11-1. The deletion section 15-1 thereby reduces the number of channels of the processing layer 21-1 that correspond to the attention layer 11-1 (namely, changes from channels of a first number of channels to channels of a second number of channels)”; and Paragraph [0079]: “The second learning unit 18 is connected to the neural network 20, and learning processing is performed on the neural network 20 after redundant channel deletion by the deletion section 15-1 and the deletion section 15-2”).
Claim(s) 3, 9, and 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1), further in view of Kang et al (KR 20200037700 A; See translation provided by Examiner).
Regarding claim(s) 3, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Yamamoto teaches wherein, in the detection, (Paragraph [0075]: “The channel selection section 14-1 selects as a redundant channel a channel satisfying a predetermined relationship between the output feature values computed by the attention layer 11-1 after the learning processing has been performed by the first learning unit 16, and a predetermined threshold value”).
Mo and Yamamoto fails to teach wherein, in the detection, a variance value of a weight of the attention processing is calculated
However, Kang teaches wherein, in the detection, a variance value of a weight of the attention processing is calculated (Equation 3; and Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”), and it is determined that the attention processing in which the variance value is equal to or less than a predetermined threshold is not active (Equations 4-5; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”).
Yamamoto teaches detecting redundant channels in a neural network based on statistics of feature values generated by an attention layer and selecting redundant channels according to a relationship between the feature values and a threshold. Kang further teaches evaluating neural network weights using statistical measures including weighted average and weighted standard deviation of weights and determining whether weights should be removed by comparing the statistical values with a threshold.
Therefore, a person having ordinary skill in the art would have been motivated to apply the statistical weight evaluation technique of Kang when performing the redundancy detection process taught by Yamamoto in the neural network of Mo in order to more accurately determine whether weights are active or inactive and thereby improve pruning decisions in the neural network. The motivation for this combination of references would have been to improve the reliability of detecting redundant components in a neural network by using statistical evaluation of weights when determining redundancy, as taught by Kang. This motivation for the combination of Mo, Yamamoto, and Kang is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Regarding claim(s) 9, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Mo teaches wherein, in the detection, a weight of the attention processing acquired in a case in which a frequency chart is given as an input of the first neural network is divided into regions for each frequency band(Abstract: “We adopt wavelet transform to convert images from spatial domain to frequency domain with more sparse features to utilize spectral information and structure information”; and Page 6, Chapter 3.2 Wavelet Transform, 1st Paragraph: “Wavelet transformation of one image decomposes the image into different sub-bands based on the frequency information and the processing of the medium and the high frequency sub-bands can result in noise removal”).
Mo and Yamamoto fails to teach wherein,
However, Kang teaches wherein, (Equation 3-5; Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, and Kang before the effective filing date of the claimed invention. The motivation for this combination of references would have been to evaluate the importance of attention processing across different frequency components of image data by computing representative statistics of attention weights for each frequency band, thereby enabling efficient analysis and optimization of neural network attention mechanisms. This motivation for the combination of Mo, Yamamoto, and Kang is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Regarding claim(s) 10, Mo as modified by Yamamoto and Kang teaches the information processing apparatus according to Claim 9, where Mo teaches wherein, in the detection, a difference value of representative values of weights between regions divided for each frequency band is calculated (Abstract: “We adopt wavelet transform to convert images from spatial domain to frequency domain with more sparse features to utilize spectral information and structure information”; and Page 6, Chapter 3.2 Wavelet Transform, 1st Paragraph: “Wavelet transformation of one image decomposes the image into different sub-bands based on the frequency information and the processing of the medium and the high frequency sub-bands can result in noise removal”), and where Kang teaches in a case in which the difference value is equal to or less than a predetermined threshold, it is determined that a weight of the attention processing is inactive (Equation 3-5; Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”).
Claim(s) 5 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1) and Kang et al (KR 20200037700 A; See translation provided by Examiner), further in view of Yin et al (US 2023/0073835 A1).
Regarding claim(s) 5, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Yamamoto teaches wherein, (Paragraph [0006]: “The channel selection section is configured to select, as a redundant channel, a channel satisfying a predetermined relationship between the output feature values computed by the attention layer after the learning processing has been performed and a predetermined threshold value”; and Paragraph [0010]: “The channel selection section may be configured to select as the redundant channel a channel in which the output feature value is below the predetermined threshold value”).
Mo and Yamamoto fails to teach wherein, in the detection, an average value of variances of weights of attention processing acquired in a case in which one or more of the images are given is calculated
However, Kang teaches wherein, in the detection, an average value of variances of weights of attention processing (Equation 3; and Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”), and the attention mechanism that generates a weight of attention processing in which the average value is less than a predetermined threshold is detected as redundant (Equations 4-5; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”).
Mo teaches an image processing neural network employing attention processing for improving image quality. Additionally, Yamamoto teaches evaluating attention-related feature values and selecting redundant channels by comparing the values with a predetermined threshold, thereby enabling redundant components of a neural network to be detected and removed. Furthermore, Kang teaches calculating statistical measures of neural network weights, including weighted averages and dispersion measures (e.g., standard deviation), in order to determine whether particular weights contain sufficient information to be retained in the neural network.
Therefore, it would have been obvious for a person having ordinary skill in the art to apply the statistical evaluation technique of Kang to the attention processing weights in the neural network of Mo, and to employ the threshold-based redundancy detection technique of Yamamoto, in order to determine whether attention mechanisms generating weights having statistical values below a predetermined threshold should be considered redundant and removed. The motivation for this combination of references would have been to improve the efficiency of neural network processing by identifying and removing redundant attention mechanisms based on statistical evaluation of attention weights, as taught by Kang. This motivation for the combination of Mo, Yamamoto, and Kang is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Mo, Yamamoto, and Kang fails to teach wherein, in the detection,
However, Yin teaches wherein, in the detection, (Figure 1; Paragraph [0022]: “As shown in FIG. 1, step 110 includes accessing a batch B of a plurality of images, wherein each image in the batch is part of a training set of images used to train a vision transformer that includes a plurality of attention heads”; Paragraph [0024]: “determining, based on the determined similarities of step 120, an importance score for each attention head”; and Paragraph [0029]: “the magnitude of weights in attention heads (e.g., the magnitude of the dense weights that are part of the trained vision transformer) may be used directly as the importance scores […]”), and the attention mechanism that generates a weight of attention processing in which the (Abstract: “determining, for each attention head A, a similarity between (1) the output of the attention head evaluated using each image in the batch and the (2) output of each attention head evaluated using each image in the batch. The method further includes determining, based on the determined similarities, an importance score for each attention head; and pruning, based on the importance scores, one or more attention heads from the vision transformer”; and Paragraph [0025]: “Step 140 of the example method of FIG. 1 includes pruning, based on the importance scores, one or more attention heads from the vision transformer. For example, once the importance score for each state is obtained via calculating the stationary distribution, the corresponding attention heads can be ranked”).
Mo teaches an image processing neural network employing attention processing for improving image quality. Additionally, Yamamoto teaches evaluating attention-related feature values and selecting redundant channels by comparing the values with a predetermined threshold, thereby enabling redundant components of a neural network to be detected and removed. Furthermore, Kang teaches calculating statistical measures of neural network weights, including weighted averages and dispersion measures (e.g., standard deviation), in order to determine whether particular weights contain sufficient information to be retained in the neural network. Yin further teaches evaluating attention heads of a neural network using a batch of images and determining importance scores for the attention heads based on outputs obtained from the images, thereby enabling attention heads determined to be unnecessary to be pruned.
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, Kang, and Yin before the effective filing date of the claimed invention. The motivation for this combination of references would have been to determine the importance of attention mechanisms using statistical values of weights obtained from multiple images and to prune attention components determined to be redundant based on a predetermined threshold in order to improve neural network processing efficiency. This motivation for the combination of Mo, Yamamoto, Kang, and Yin is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Regarding claim(s) 15, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, but do not specifically teach wherein, in the detection, a redundant attention mechanism is detected based on a statistic of a weight of the attention processing and a processing speed of the attention processing.
However, Kang teaches wherein, in the detection, a redundant attention mechanism is detected based on a statistic of a weight of the attention (Equation 3-5; Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, and Kang before the effective filing date of the claimed invention. The motivation for this combination of references would have been to improve the reliability of detecting redundant components in a neural network by using statistical evaluation of weights when determining redundancy, as taught by Kang. This motivation for the combination of Mo, Yamamoto, and Kang is supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Mo, Yamamoto, and Kang fails to teach wherein, in the detection, a redundant attention mechanism is detected based on
However, Yin teaches wherein, in the detection, (Paragraph [0010]: “The resulting pruned model drastically reduces the model size and complexity in a configurable manner (i.e., how much to prune the model can be specified) while retaining model performance, e.g., by identifying and pruning the parameters that are relatively unimportant to model performance”; and Paragraph [0025]: “Step 140 of the example method of FIG. 1 includes pruning, based on the importance scores, one or more attention heads from the vision transformer. For example, once the importance score for each state is obtained via calculating the stationary distribution, the corresponding attention heads can be ranked”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, Kang, and Yin before the effective filing date of the claimed invention. The motivation for this combination of references would have been to removing redundant attention channels improves computational efficiency and processing speed, thereby improving the processing speed of the attention processing. This motivation for the combination of Mo, Yamamoto, Kang, and Yin is/are supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1), further in view of Yin et al (US 2023/0073835 A1).
Regarding claim(s) 6, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, but do not specifically teach wherein the image input to the first neural network includes at least one of a frequency chart indicating a change in a frequency band in an image, a character chart in which characters are written in an image, an object image in which a specific object whose image quality is desired to be improved is reflected in an image, and a color chart in which regions are divided for each color in an image.
However, Yin teaches wherein the image input to the first neural network includes at least one of a frequency chart indicating a change in a frequency band in an image, a character chart in which characters are written in an image, an object image in which a specific object whose image quality is desired to be improved is reflected in an image, and a color chart in which regions are divided for each color in an image (Figure 1; Paragraph [0022]: “As shown in FIG. 1, step 110 includes accessing a batch B of a plurality of images, wherein each image in the batch is part of a training set of images used to train a vision transformer that includes a plurality of attention heads”; and Paragraph [0008]: “Vision transformers can be used for many computer-vision tasks such as image classification, object detection, super-resolution, video classification, and semantic segmentation. A vision transformer may take an input image and divide the image into regions, or patches. Each patch may be associated with a position value identifying its position in the image”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, and Yin before the effective filing date of the claimed invention. The motivation for this combination of references would have been to use various images containing objects as input images for the neural network when performing attention-based processing and analysis of the neural network, as taught by Yin, in order to perform computer-vision tasks such as image recognition and object detection.. This motivation for the combination of Mo, Yamamoto, and Yin is/are supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Claim(s) 11-14 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mo et al (Frequency Attention Network: Blind Noise Removal for Real Images) in view of Yamamoto et al (US 2019/0378014 A1) and Kang et al (KR 20200037700 A; See translation provided by Examiner), further in view of Price et al (US 2022/0198671 A1).
Regarding claim(s) 11, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Mo teaches wherein, in the detection, a weight of the attention processing acquired in a case in which a character chart or an object image is given as an input of a neural network (Page 2, 1st Paragraph: “data-driven deep convolutional neural networks(CNNs) are increasingly applied on the image denoising task in that CNN can extract high-dimensional features of images and utilize them to restore clean images”; Page 7, Chapter 3.3 Spatial-Channel Attention Block (SCAB), 1st Paragraph: “we used a Spatial-Channel Attention Block to extract the features in the convolutional stream […] which we can use spatial attention mechanism to refine features map […] Meanwhile, we apply channel attention mechanism […] Spatial attention is used to extract the inter-spatial relationship of images”; and Page 8, 2nd Paragraph: “spatial attention mechanism can re-weight the feature map according to the location of the features and help the network learn where to be paid attention”).
Mo and Yamamoto fails to teach wherein, in the detection, a weight of the attention processing acquired
However, Price teaches wherein, in the detection, a weight of (Paragraph [0017]: “the object segmentation system provides this initial object segmentation for display via a user interface and receives (via the user interface) object user indicators such as positive or negative clicks indicating foreground or background pixels. The object segmentation system processes these object user indicators together with the initial object segmentation to generate an improved object segmentation”; and Paragraph [0050]: “an object segmentation can include a segmentation boundary (e.g., a boundary line or curve indicating an edge of one or more objects) or a binary segmentation mask (e.g., a selection that definitively includes a first set of pixels and definitively excludes a second set of pixels as corresponding to an object)”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, and Price before the effective filing date of the claimed invention. The motivation for this combination of references would have been to apply the segmentation approach of Price to the neural network image processing framework of Mo in order to distinguish object regions from background regions when processing images. This motivation for the combination of Mo, Yamamoto, and Price is/are supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Mo, Yamamoto, and Price fails to teach wherein,
However, Kang teaches wherein, (Equation 3-5; Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, Price, and Kang before the effective filing date of the claimed invention. The motivation for this combination of references would have been to apply the segmentation technique of Price to the neural network image processing system of Mo in order to distinguish object regions from background regions when processing images, and to further apply the statistical evaluation of attention weights taught by Kang to calculate representative values of attention weights for the respective regions. This motivation for the combination of Mo, Yamamoto, Price, and Kang is/are supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Regarding claim(s) 12, Mo as modified by Yamamoto, Price, and Kang teaches the information processing apparatus according to Claim 11, where Kang teaches wherein, in the detection, a difference value of representative values of weights between (Equation 3-5; Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”) where Price teaches the character region or the object region, and the background region (Paragraph [0017]: “the object segmentation system provides this initial object segmentation for display via a user interface and receives (via the user interface) object user indicators such as positive or negative clicks indicating foreground or background pixels. The object segmentation system processes these object user indicators together with the initial object segmentation to generate an improved object segmentation”; and Paragraph [0050]: “an object segmentation can include a segmentation boundary (e.g., a boundary line or curve indicating an edge of one or more objects) or a binary segmentation mask (e.g., a selection that definitively includes a first set of pixels and definitively excludes a second set of pixels as corresponding to an object)”) is calculated, and where Kang teaches in a case in which the difference value is equal to or less than a predetermined threshold, it is determined that the weight of the attention processing is inactive (Equation 3-5; Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”).
Regarding claim(s) 13, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, where Mo teaches wherein, in the detection, (Page 2, 1st Paragraph: “data-driven deep convolutional neural networks(CNNs) are increasingly applied on the image denoising task in that CNN can extract high-dimensional features of images and utilize them to restore clean images”).
Mo and Yamamoto fails to teach wherein, in the detection, a region is divided for each color and a representative value of the weight is calculated for each region
However, Price teaches wherein, in the detection, a region is divided for each color (Paragraph [0017]: “the object segmentation system provides this initial object segmentation for display via a user interface and receives (via the user interface) object user indicators such as positive or negative clicks indicating foreground or background pixels. The object segmentation system processes these object user indicators together with the initial object segmentation to generate an improved object segmentation”; and Paragraph [0050]: “an object segmentation can include a segmentation boundary (e.g., a boundary line or curve indicating an edge of one or more objects) or a binary segmentation mask (e.g., a selection that definitively includes a first set of pixels and definitively excludes a second set of pixels as corresponding to an object)”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, and Price before the effective filing date of the claimed invention. The motivation for this combination of references would have been to apply the image segmentation technique of Price to the neural network image processing framework of Mo in order to divide an input image, such as a color chart, into multiple regions corresponding to different colors. This motivation for the combination of Mo, Yamamoto, and Price is/are supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Mo, Yamamoto, and Price fails to teach wherein, in the detection,
However, Kang teaches wherein, in the detection, (Equation 3-5; Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, Price, and Kang before the effective filing date of the claimed invention. The motivation for this combination of references would have been to apply Kang’s statistical analysis of attention weights to the regions obtained by the segmentation technique of Price in the neural-network-based image processing system of Mo in order to calculate representative values of attention weights for each region of the image. This motivation for the combination of Mo, Yamamoto, Price, and Kang is/are supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Regarding claim(s) 14, Mo as modified by Yamamoto teaches the information processing apparatus according to Claim 1, Yamamoto teach wherein, in the detection, (Paragraph [0075]: “The channel selection section 14-1 selects as a redundant channel a channel satisfying a predetermined relationship between the output feature values computed by the attention layer 11-1 after the learning processing has been performed by the first learning unit 16, and a predetermined threshold value”).
Mo and Yamamoto fails to teach wherein, in the detection, a difference value of representative values of weights between regions divided for each color is calculated
However, Price teaches wherein, in the detection, (Paragraph [0017]: “the object segmentation system provides this initial object segmentation for display via a user interface and receives (via the user interface) object user indicators such as positive or negative clicks indicating foreground or background pixels. The object segmentation system processes these object user indicators together with the initial object segmentation to generate an improved object segmentation”; and Paragraph [0050]: “an object segmentation can include a segmentation boundary (e.g., a boundary line or curve indicating an edge of one or more objects) or a binary segmentation mask (e.g., a selection that definitively includes a first set of pixels and definitively excludes a second set of pixels as corresponding to an object)”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, and Price before the effective filing date of the claimed invention. The motivation for this combination of references would have been to apply the image segmentation technique of Price to the neural network image processing framework of Mo in order to divide an input image, such as a color chart, into multiple regions corresponding to different colors. This motivation for the combination of Mo, Yamamoto, and Price is/are supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Mo, Yamamoto, and Price fails to teach wherein, in the detection, a difference value of representative values of weights between regions
However, Kang teaches wherein, in the detection, a difference value of representative values of weights between regions (Equation 3; and Paragraph [0064]: “Where μω is the weighted average, σω is the weighted standard deviation, and g(ωk|μω, σω) is the distribution of weights”), and it is determined that a weight of attention processing is inactive in a case in which the difference value is equal to or less than a predetermined threshold (Equations 4-5; Paragraph [0064]: “if the value of the weight is very small, the contribution of calculating the weight is considered not to be important, so the weight can be reduced during training. From this point of view, in the present invention, the potential for a small weight value can be analyzed because it cannot be determined that it is not stable just because the weight value is small”; and Paragraph [0070]: “The pruning unit 220 determines if the amount of information for ωn, A[ωn] calculated using Equation 5 is less than the threshold calculated using Equation 4, the weight is determined as an insignificant weight during training and the ωn Can be deleted”).
Therefore, it would have been obvious to one of ordinary skill in the art to combine Mo, Yamamoto, Price, and Kang before the effective filing date of the claimed invention. The motivation for this combination of references would have been to apply Kang’s statistical evaluation of attention weights to the regions obtained by the segmentation technique of Price when performing the redundancy detection process taught by Yamamoto in the neural-network-based image processing system of Mo in order to determine whether the attention weights associated with the regions are inactive based on a comparison of representative values. This motivation for the combination of Mo, Yamamoto, Price, and Kang is/are supported by KSR exemplary rationale (G) Some teaching, suggestion, or motivation in the prior art that would have led one of ordinary skill to modify the prior art reference or to combine prior art reference teachings to arrive at the claimed invention. MPEP 2141 (III).
Relevant Prior Art Directed to State of Art
Salah (US 2024/0202532 A1) are relevant prior art not applied in the rejection(s) above. Salah discloses an information processing apparatus comprising: an acquisition unit configured to acquire a plurality of modalities associated with an object and information identifying the object; a feature generation unit configured to generate feature values for each of the plurality of modalities; a deriving unit configured to derive weights corresponding to each of the plurality of modalities based on the feature values for each of the plurality of modalities and information identifying the object; and a prediction unit configured to predict an attribute of the object from a concatenated value of the feature values for each of the plurality of modalities, weighted by the corresponding weights.
Sorakado (US 2024/0169202 A1) are relevant prior art not applied in the rejection(s) above. Sorakado discloses an information processing apparatus performing inference or learning using a neural network, the information processing apparatus comprising: one or more processors; and one or more memories that store a computer-readable instruction that, when executed by the one or more processors, configures the information processing apparatus to: generate an attention map from input data; perform a nonlinear transformation on the input data; obtain, based on the generated attention map and an output obtained based on the nonlinear transformation on the input data, a feature amount map having a channel dimension for storing an element vector and one or more spatial dimensions; and perform an inference or learning process based on the obtained feature amount map.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JONGBONG NAH whose telephone number is (571) 272-1361. The examiner can normally be reached M - F: 9:00 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ONEAL MISTRY can be reached on 313-446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/JONGBONG NAH/Examiner, Art Unit 2674
/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674