Office Action Analysis: 17651961 — NEURAL NETWORK LEARNING METHOD, COMPUTER PROGRAM PRODUCT, AND LEARNING APPARATUS

Office Action

§101 §102 §103
Detailed Action
This action is in response to the Application filed 02/22/2022, in which:
Claims 1, 9, and 15 are independent claims.
Claims 1-20 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The title of the invention is not descriptive.  A new title is required that is clearly indicative of the invention to which the claims are directed. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 9-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non- statutory subject matter. Independent Claim 9 does not fall within at least one of the four categories of patent eligible subject matter because the claim is directed to signals per se. Claim 9 recites a “computer-readable storage medium”. The specification fails to provide a special definition of the claimed medium that would exclude a signal; thus the BRI of the recited “computer-readable storage medium” encompasses signals per se. Accordingly, the recited “computer-readable storage medium” is signals per se and is not a “process”, “machine”, “manufacturer” or “composition of matter”, as defined in 35 U.S.C. 101. Thus, the Dependent Claims 10-14 of Independent Claim 9 are also rejected for the same reasons.
Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding Claim 1:
Subject Matter Eligibility Analysis Step 1:
Claim 1 recites a method, thus a process, one of the four statutory categories of patentable subject matter. 
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 1 further recites the method comprising of performing learning of a neural network so as to reduce a value of a first loss function representing a correlation between channels in feature vectors output from at least one of intermediate layers and a final layer in the neural network …  (which is a mathematical relationship between variables and/or numbers using a mathematical formula). Claim 1 thus recites an abstract idea (that falls into the “mathematical concepts” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional elements recited consists of:
A learning method to be performed by a computer, the learning method comprising (which is restricting the abstract idea to a Particular Technological Environment, by MPEP 2106.05(h))
… to which a plurality of pieces of training data has been input (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements recited, alone or in combination, do not provide significantly more than the abstract idea itself. Additional element a is only restricting the abstract idea to a Particular Technological Environment (MPEP 2106.05(h)) which cannot provide significantly more. Additional element b falls within MPEP 2106.05(d) as well-understood, routine and conventional activities of receiving or transmitting data over a network (MPEP 2106.05(d)(II): buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014)). Thus, the claim is subject-matter ineligible.

Regarding Claim 2:
Subject Matter Eligibility Analysis Step 1:
Dependent Claim 2 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 2 further recites the method comprising of:
deriving a value of a second loss function and a value of a third loss function, the value of the second loss function being derived based on the first feature vector and representing a correlation between the annotation information given to the supervised training data set and output information … , the output information corresponding to the annotation information, the value of the third loss function being a value of the first loss function representing a correlation between channels in the second feature vectors … (which is a mathematical relationship between variables and/or numbers using a mathematical formula)
… the learning of the neural network is performed so as to reduce the value of the second loss function and the value of the third loss function (which is a mathematical relationship between variables and/or numbers using a mathematical formula)
Claim 2 thus recites an abstract idea (that falls into the “mathematical concepts” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional elements recited consists of:
inputting a supervised training data set and an unsupervised training data set to the neural network as the plurality of pieces of training data, the supervised training data set including a plurality of pieces of supervised training data given annotation information, the unsupervised training data set including a plurality of pieces of unsupervised training data not given the annotation information (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
acquiring first feature vectors and second feature vectors, the first feature vectors being the feature vectors output from the neural network by inputting the supervised training data set, the second feature vectors being feature vectors output from the neural network by inputting the unsupervised training data set (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
… obtained from the neural network by inputting the supervised training data set … (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements recited, alone or in combination, do not provide significantly more than the abstract idea itself. Additional elements a-c fall within MPEP 2106.05(d) as well-understood, routine and conventional activities of receiving or transmitting data over a network (MPEP 2106.05(d)(II): buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014)). Thus, the claim is subject-matter ineligible.

Regarding Claim 3:
Subject Matter Eligibility Analysis Step 1:
Dependent Claim 3 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 3 further recites the method comprising of:
deriving a value of a second loss function, a value of a fourth loss function, and a value of a third loss function, the value of a second loss function being derived based on the first feature vectors and representing a correlation between the annotation information given to the supervised training data set and output information … , the output information corresponding to the annotation information, the value of the fourth loss function being a value of the first loss function representing a correlation between channels in the first feature vectors, the value of the third loss function being a value of the first loss function representing a correlation between channels in the second feature vectors … (which is a mathematical relationship between variables and/or numbers using a mathematical formula)
… the learning of the neural network is performed so as to reduce the value of the second loss function, the value of the third loss function, and the value of the fourth loss function (which is a mathematical relationship between variables and/or numbers using a mathematical formula)
Claim 3 thus recites an abstract idea (that falls into the “mathematical concepts” group of abstract ideas).


Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional elements recited consists of:
inputting a supervised training data set and an unsupervised training data set to the neural network as the plurality of pieces of training data, the supervised training data set including a plurality of pieces of supervised training data given annotation information, the unsupervised training data set including a plurality of pieces of unsupervised training data not given the annotation information (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
acquiring first feature vectors that are the feature vectors output from the neural network by inputting the supervised training data set and a second feature vectors that are the feature vectors output from the neural network by inputting the unsupervised training data set (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
… obtained from the neural network by inputting the supervised training data set …  (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements recited, alone or in combination, do not provide significantly more than the abstract idea itself. Additional elements a-c fall within MPEP 2106.05(d) as well-understood, routine and conventional activities of receiving or transmitting data over a network (MPEP 2106.05(d)(II): buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014)). Thus, the claim is subject-matter ineligible.

Regarding Claim 4:
Subject Matter Eligibility Analysis Step 1:
Dependent Claim 4 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 4 further recites the method comprising of:
deriving a value of a second loss function and a value of a fourth loss function, the value of the second loss function being derived based on the first feature vectors and representing a correlation between the annotation information given to the supervised training data set and output information … , the output information corresponding to the annotation information, the value of the fourth loss function being a value of the first loss function representing a correlation between channels in the first feature vectors … (which is a mathematical relationship between variables and/or numbers using a mathematical formula)
… the learning of the neural network is performed so as to reduce the value of second loss function and the value of fourth loss function
Claim 4 thus recites an abstract idea (that falls into the “mathematical concepts” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional elements recited consists of:
inputting a supervised training data set to the neural network as the plurality of pieces of training data, the supervised training data set including a plurality of pieces of supervised training data given annotation information (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
acquiring first feature vectors that are the feature vectors output from the neural network by inputting the supervised training data set (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
… obtained from the neural network by inputting the supervised training data set … (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements recited, alone or in combination, do not provide significantly more than the abstract idea itself. Additional elements a-c fall within MPEP 2106.05(d) as well-understood, routine and conventional activities of receiving or transmitting data over a network (MPEP 2106.05(d)(II): buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014)). Thus, the claim is subject-matter ineligible.

Regarding Claim 5:
Subject Matter Eligibility Analysis Step 1:
Dependent Claim 5 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 5 further recites the method comprising of wherein a correlation coefficient is used for calculation of the value of the first loss function (which is a mathematical relationship between variables and/or numbers using a mathematical formula). Claim 5 thus recites an abstract idea (that falls into the “mathematical concepts” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because there are no new additional elements recited.
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because there are no new additional elements recited. The judicial exception alone does not provide significantly more than the abstract idea itself. Thus, the claim is subject-matter ineligible.

Regarding Claim 6:
Subject Matter Eligibility Analysis Step 1:
Dependent Claim 6 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 6 does not recite any additional abstract ideas and only inherits the abstract ideas from Claim 1. Claim 6 thus recites an abstract idea (that falls into the “mathematical concepts” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the sole additional element consists wherein the plurality of pieces of training data includes a plurality of groups each including a plurality of supervised training data sets and a plurality of groups each including a plurality of unsupervised training data sets (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g)).

Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the new sole additional element recited, alone or in combination, does not provide significantly more than the abstract idea itself. The additional element falls within MPEP 2106.05(d) as well-understood, routine and conventional activities of receiving or transmitting data over a network (MPEP 2106.05(d)(II): buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014)). Thus, the claim is subject-matter ineligible.

Regarding Claim 7:
Subject Matter Eligibility Analysis Step 1:
Dependent Claim 7 recites the method of Claim 1. Claim 1 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 7 does not recite any additional abstract ideas and only inherits the abstract ideas from Claim 1. Claim 7 thus recites an abstract idea (that falls into the “mathematical concepts” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the additional elements recited consists of:
 receiving an input of a learning condition including at least one of a network structure of the neural network as a target of the learning, the training data to be used in the learning, and a description of setting to be used at a time of the learning, (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g))
wherein at the performing the learning, the learning of the neural network is performed in accordance with the learning condition having been received (to perform a mental process and the performance of an abstract idea on a computer is no more than instructions to “apply it” on a computer, by MPEP 2106.05(f))
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements recited, alone or in combination, do not provide significantly more than the abstract idea itself. Additional element a falls within MPEP 2106.05(d) as well-understood, routine and conventional activities of receiving or transmitting data over a network (MPEP 2106.05(d)(II): buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014)). Additional element b is merely applying the abstract idea on a computer (MPEP 2106.05(f)) which cannot provide significantly more. Thus, the claim is subject-matter ineligible.

Regarding Claim 8:
Subject Matter Eligibility Analysis Step 1:
Dependent Claim 8 recites the method of Claim 7. Claim 7 is a method, thus a process, one of the four statutory categories of patentable subject matter.
Subject Matter Eligibility Analysis Step 2A Prong 1:
However, Claim 8 does not recite any additional abstract ideas and only inherits the abstract ideas from Claim 7. Claim 8 thus recites an abstract idea (that falls into the “mathematical concepts” group of abstract ideas).
Subject Matter Eligibility Analysis Step 2A Prong 2:
This judicial exception is not integrated into a practical application because the sole additional element consists displaying a display screen including at least one of a learning progress state of the neural network and a content of change recommendation for the learning condition depending on the learning progress state (which is insignificant extra-solution activity of data gathering, by MPEP 2106.05(g)).
Subject Matter Eligibility Analysis Step 2B:
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the new sole additional element recited, alone or in combination, does not provide significantly more than the abstract idea itself. The additional element falls within MPEP 2106.05(d) as well-understood, routine and conventional activities of presenting offers and gathering statistics (MPEP 2106.05(d)(II): OIP Techs., 788 F.3d at 1362- 63, 115 USPQ2d at 1092-93). Thus, the claim is subject-matter ineligible.

Regarding Claims 9-14:
Claims 9-14 incorporate substantively all the limitations of Claims 1-4 and 7-8 in a computer program product (thus, a manufacture) and further recites comprising a computer-readable medium including programmed instructions, the instructions causing a computer to execute (these claim limitations appear to perform a mental process and the performance of an abstract idea on a computer is no more than instructions to “apply it” on a computer, by MPEP 2106.05(f)) and does not appear to integrate the abstract idea into a particular application; thus, the claim is subject-matter ineligible as it does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements, alone or in combination, do not provide significantly more than the abstract idea itself); thus, Claims 9-14 are rejected for reasons set forth in the rejections of Claims 1-4 and 7-8, respectively.


Regarding Claims 15-20:
Claims 15-20 incorporate substantively all the limitations of Claims 1-4 and 7-8 in a learning apparatus (thus, a system) and further recites comprising one or more hardware processors configured to (these claim limitations appear to perform a mental process and the performance of an abstract idea on a computer is no more than instructions to “apply it” on a computer, by MPEP 2106.05(f)) and does not appear to integrate the abstract idea into a particular application; thus, the claim is subject-matter ineligible as it does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements, alone or in combination, do not provide significantly more than the abstract idea itself); thus, Claims 15-20 are rejected for reasons set forth in the rejections of Claims 1-4 and 7-8, respectively.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1-7, 9-13, and 15-19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Wu et al., “Weakly Semi-Supervised Deep Learning for Multi-Label Image Annotation”, hereinafter “Wu”.
Regarding Claim 1:
	Wu teaches:
A learning method to be performed by a computer, the learning method comprising 
(Wu, Page 110, Column 1, Paragraph 2, “… we propose an approach named weakly semi-supervised deep learning (WeSed) for multi-label annotation”; Page 114, Column 2, Paragraph 2, “Since we mainly focus on the loss layer, we use the widely used basic architecture in Alexnet as the feature learning model (the CNN components in Fig. 2)”. WeSed is the proposed deep learning method to be performed by a computer utilizing Alexnet). 
performing learning of a neural network so as to reduce a value of a first loss function representing a correlation between channels in feature vectors output from at least one of intermediate layers and a final layer in the neural network to which a plurality of pieces of training data has been input. 
(Wu, Page 114, Fig. 2, Paragraph 4, “The overall objective loss we optimize is given as follows (as shown in Fig. 2): 
    PNG
    media_image1.png
    156
    333
    media_image1.png
    Greyscale
”. Fig 2. shows a feature learning model using the CNN architecture to perform the reduction of the overall objective loss (Equation 17; where the overall objective loss function is interpreted as a value of a first loss function). Equation 17 and Fig. 2 show an overall objective loss function (containing multiple loss functions) which represent a correlation between channels (interpreted as elements where the loss functions utilize semantic similarity and ranking weight to represent a correlation between feature vector elements) in feature vector outputs (f(xi), f(xj), f(xk)). The feature vector outputs are from the final layer (fully connected layer) where the training data is shown in Fig. 2 being a plurality of pieces of training data as input and parsed through the intermediate layers of the CNN architecture). 
	
Regarding Claim 2:
	Wu teaches the method according to Claim 1. Wu further teaches:
inputting a supervised training data set and an unsupervised training data set to the neural network as the plurality of pieces of training data, 
(Wu, Page 114, Fig. 2. The Training set in Fig. 2 shows the inputting of a supervised training data (Weakly labeled image set Iw) set and an unsupervised training data set (unlabeled image set Iu) to the neural network).  
the supervised training data set including a plurality of pieces of supervised training data given annotation information, 
(Wu, Page 114, Fig. 2. The Weakly labeled image (interpreted as the supervised training data) set contains given annotation information (where the labels in the experiment are the type of given annotation information within the Multi-Label Image Annotation learning of the WeSed method). For example in Fig 2., where xi = [beach, clouds, sky] (where the dashed boxes notate missing labels) and xj = [beach, clouds, ocean, sky]; thus, the Weakly labeled image is supervised training data not given the annotation information).  
the unsupervised training data set including a plurality of pieces of unsupervised training data not given the annotation information; 
(Wu, Page 114, Fig. 2. The Unlabeled image (interpreted as the unsupervised training data) set which does not contain any given annotation information (interpreted as labels by the examiner). For example in Fig 2., where xk only notates [sky, person] within the dashed boxes which notates missing labels); thus, the Unlabeled image is unsupervised training data not given the annotation information).  
acquiring first feature vectors and second feature vectors, 
(Wu, Page 114, Fig. 2. The fully connected layer (output of the CNN) within Fig 2. (f(xi), f(xj), f(xk)) shows the acquiring of the first and second feature vectors).  
the first feature vectors being the feature vectors output from the neural network by inputting the supervised training data set, 
(Wu, Page 114, Fig. 2. Fig 2: f(xi), f(xj), represents the first feature vectors which are the outputs from the CNN (neural network) by inputting the Weakly labeled image set (supervised training data set)).  
the second feature vectors being feature vectors output from the neural network by inputting the unsupervised training data set; and 
(Wu, Page 114, Fig. 2. Fig 2: f(xk), represents the second feature vectors which are the outputs from the CNN (neural network) by inputting the Unlabeled labeled image set (unsupervised training data set)).  
deriving a value of a second loss function and a value of a third loss function, the value of the second loss function being derived based on the first feature vector and representing a correlation between the annotation information given to the supervised training data set and output information obtained from the neural network by inputting the supervised training data set, the output information corresponding to the annotation information, 
(Wu, Page 114, Fig. 2; Page 111, Column 2, Paragraph 3, “… we devise a weakly weighted pairwise ranking (W2PR) loss to optimize the top-k accuracy of multi-label image annotation …: 
    PNG
    media_image2.png
    50
    356
    media_image2.png
    Greyscale
”. Fig 2. shows the W2PR loss layer which is derived based on the first feature vectors (f(xi), f(xj)) and represents a correlation between the labels/annotation information of the supervised training data set (weakly labeled image set). Thus, the loss layer utilizes a value of a second loss function as it the supervised training set is the input and is derived based on the first feature vectors). 
the value of the third loss function being a value of the first loss function representing a correlation between channels in the second feature vectors, 
(Wu, Page 114, Fig. 2; Page 113, Column 1, Paragraph 3, “Here, we expect the learnt features of images in a triplet to meet their relative semantic similarity defined by rsim. Therefore, we optimize the following objective: 
    PNG
    media_image3.png
    90
    317
    media_image3.png
    Greyscale
 … We call the objective as triplet similarity (TS) loss”. Fig 2. shows the TS loss layer which is a value of the first loss function (second half of the Equation 17 (first loss function)) and correlates unlabeled and missing labeled images. The TS loss function correlates semantic similarity between first and second features; where the second feature vectors are notated as (f(xk)). The loss function represents a correlation (rsim/semantic similarity) between the elements in the second feature vectors (f(xk)). Thus, the loss layer utilizes the value of the third loss function as it is a derivation of a value of the first loss function representing a semantic similarity of the second feature vectors). 
wherein at the performing the learning, the learning of the neural network is performed so as to reduce the value of the second loss function and the value of the third loss function.  
(Wu, Page 114, Fig. 2, Paragraph 4, “The overall objective loss we optimize is given as follows (as shown in Fig. 2): 
    PNG
    media_image1.png
    156
    333
    media_image1.png
    Greyscale
”. The overall objective loss of the neural network is contains a value of the second loss function and the third loss function; thus, the learning method is optimizing the loss function to reduce both the values of the second loss and third loss to handle the supervised/unsupervised training data sets).

Regarding Claim 3:
	Wu teaches the method according to Claim 1. Wu further teaches:
inputting a supervised training data set and an unsupervised training data set to the neural network as the plurality of pieces of training data, 
(Wu, Page 114, Fig. 2. The Training set in Fig. 2 shows the inputting of a supervised training data (Weakly labeled image set Iw) set and an unsupervised training data set (unlabeled image set Iu) to the neural network).  
the supervised training data set including a plurality of pieces of supervised training data given annotation information, 
(Wu, Page 114, Fig. 2. The Weakly labeled image (interpreted as the supervised training data) set contains given annotation information (where the labels in the experiment are the type of given annotation information within the Multi-Label Image Annotation learning of the WeSed method). For example in Fig 2., where xi = [beach, clouds, sky] (where the dashed boxes notate missing labels) and xj = [beach, clouds, ocean, sky]; thus, the Weakly labeled image is supervised training data not given the annotation information).  
the unsupervised training data set including a plurality of pieces of unsupervised training data not given the annotation information; 
(Wu, Page 114, Fig. 2. The Unlabeled image (interpreted as the unsupervised training data) set which does not contain any given annotation information (interpreted as labels by the examiner). For example in Fig 2., where xk only notates [sky, person] within the dashed boxes which notates missing labels); thus, the Unlabeled image is unsupervised training data not given the annotation information).  
acquiring first feature vectors that are the feature vectors output from the neural network by inputting the supervised training data set and a second feature vectors that are the feature vectors output from the neural network by inputting the unsupervised training data set; and 
(Wu, Page 114, Fig. 2. The fully connected layer (output of the CNN) within Fig 2. (f(xi), f(xj), f(xk)) shows the acquiring of the first and second feature vectors by inputting the supervised training data set (weakly labeled image training set) and unsupervised training data set (unlabeled image training set), respectively).  
deriving a value of a second loss function, a value of a fourth loss function, and a value of a third loss function, the value of a second loss function being derived based on the first feature vectors and representing a correlation between the annotation information given to the supervised training data set and output information obtained from the neural network by inputting the supervised training data set, the output information corresponding to the annotation information,
(Wu, Page 114, Fig. 2; Page 111, Column 2, Paragraph 3, “… we devise a weakly weighted pairwise ranking (W2PR) loss to optimize the top-k accuracy of multi-label image annotation …: 
    PNG
    media_image2.png
    50
    356
    media_image2.png
    Greyscale
”. Fig 2. shows the W2PR loss layer which is derived based on the first feature vectors (f(xi), f(xj)) and represents a correlation between the labels/annotation information of the supervised training data set (weakly labeled image set). Thus, the loss layer utilizes a value of a second loss function as it the supervised training set is the input and is derived based on the first feature vectors). 
the value of the fourth loss function being a value of the first loss function representing a correlation between channels in the first feature vectors, 
(Wu, Page 114, Fig. 2; Page 113, Column 1, Paragraph 3, “Here, we expect the learnt features of images in a triplet to meet their relative semantic similarity defined by rsim. Therefore, we optimize the following objective: 
    PNG
    media_image3.png
    90
    317
    media_image3.png
    Greyscale
 … We call the objective as triplet similarity (TS) loss”. Fig 2. shows the TS loss layer which is a value of the first loss function (second half of the Equation 17) and correlates unlabeled and missing labeled images (images not given the full annotation information). The loss function represents a correlation (rsim/semantic similarity) between the elements in the first feature vectors (f(xi), f(xj)). Thus, the loss layer utilizes the value of the fourth loss function which is a value of the first loss function representing a semantic similarity of the first feature vectors). 
the value of the third loss function being a value of the first loss function representing a correlation between channels in the second feature vectors, 
(Wu, Page 114, Fig. 2; Page 113, Column 1, Paragraph 3, “Here, we expect the learnt features of images in a triplet to meet their relative semantic similarity defined by rsim. Therefore, we optimize the following objective: 
    PNG
    media_image3.png
    90
    317
    media_image3.png
    Greyscale
 … We call the objective as triplet similarity (TS) loss”. Fig 2. shows the TS loss layer which is a value of the first loss function (second half of the Equation 17) and correlates unlabeled and missing labeled images (images not given the full annotation information). The loss function represents a correlation (rsim/semantic similarity) between the elements in the second feature vectors (f(xk)). Thus, the loss layer utilizes the value of the third loss function which is a value of the first loss function representing a semantic similarity of the second feature vectors). 
wherein at the performing the learning, the learning of the neural network is performed so as to reduce the value of the second loss function, the value of the third loss function, and the value of the fourth loss function.  
(Wu, Page 114, Fig. 2, Paragraph 4, “The overall objective loss we optimize is given as follows (as shown in Fig. 2): 
    PNG
    media_image1.png
    156
    333
    media_image1.png
    Greyscale
”. The overall objective loss of the neural network is contains a value of the second loss function, the third loss function, and the value of the fourth loss function; thus, the learning method is optimizing the loss function to reduce the values of the second, third, and fourth loss functions to handle the supervised/unsupervised training data sets).

Regarding Claim 4:
	Wu teaches the method according to Claim 1. Wu further teaches:
inputting a supervised training data set to the neural network as the plurality of pieces of training data, 
(Wu, Page 114, Fig. 2. The Training set in Fig. 2 shows the inputting of a supervised training data (Weakly labeled image set Iw) set and an unsupervised training data set (unlabeled image set Iu) to the neural network).  
the supervised training data set including a plurality of pieces of supervised training data given annotation information; 
(Wu, Page 114, Fig. 2. The Weakly labeled image (interpreted as the supervised training data) set contains given annotation information (where the labels in the experiment are the type of given annotation information within the Multi-Label Image Annotation learning of the WeSed method). For example in Fig 2., where xi = [beach, clouds, sky] (where the dashed boxes notate missing labels) and xj = [beach, clouds, ocean, sky]; thus, the Weakly labeled image is supervised training data not given the annotation information).  
acquiring first feature vectors that are the feature vectors output from the neural network by inputting the supervised training data set; and 
(Wu, Page 114, Fig. 2. The fully connected layer (output of the CNN) within Fig 2. (f(xi), f(xj)) shows the acquiring of the first feature vectors by inputting the supervised training data set (weakly labeled image training set)). 
deriving a value of a second loss function and a value of a fourth loss function, the value of the second loss function being derived based on the first feature vectors and representing a correlation between the annotation information given to the supervised training data set and output information obtained from the neural network by inputting the supervised training data set, the output information corresponding to the annotation information, 
(Wu, Page 114, Fig. 2; Page 111, Column 2, Paragraph 3, “… we devise a weakly weighted pairwise ranking (W2PR) loss to optimize the top-k accuracy of multi-label image annotation …: 
    PNG
    media_image2.png
    50
    356
    media_image2.png
    Greyscale
”. Fig 2. shows the W2PR loss layer which is derived based on the first feature vectors (f(xi), f(xj)) and represents a correlation between the labels/annotation information of the supervised training data set (weakly labeled image set). Thus, the loss layer utilizes a value of a second loss function as it the supervised training set is the input and is derived based on the first feature vectors). 
the value of the fourth loss function being a value of the first loss function representing a correlation between channels in the first feature vectors,
(Wu, Page 114, Fig. 2; Page 113, Column 1, Paragraph 3, “Here, we expect the learnt features of images in a triplet to meet their relative semantic similarity defined by rsim. Therefore, we optimize the following objective: 
    PNG
    media_image3.png
    90
    317
    media_image3.png
    Greyscale
 … We call the objective as triplet similarity (TS) loss”. Fig 2. shows the TS loss layer which is a value of the first loss function (second half of the Equation 17) and correlates unlabeled and missing labeled images (images not given the full annotation information). The loss function represents a correlation (rsim/semantic similarity) between the elements in the first feature vectors (f(xi), f(xj)). Thus, the loss layer utilizes the value of the fourth loss function which is a value of the first loss function representing a semantic similarity of the first feature vectors). 
 wherein at the performing the learning, the learning of the neural network is performed so as to reduce the value of second loss function and the value of fourth loss function.  
(Wu, Page 114, Fig. 2, Paragraph 4, “The overall objective loss we optimize is given as follows (as shown in Fig. 2): 
    PNG
    media_image1.png
    156
    333
    media_image1.png
    Greyscale
”. The overall objective loss of the neural network is contains a value of second loss function, and the value of the fourth loss function; thus, the learning method is optimizing the loss function to reduce the values of the second, third, and fourth loss functions to handle the supervised/unsupervised training data sets).

Regarding Claim 5:
	Wu teaches the method according to Claim 1. Wu further teaches:
wherein a correlation coefficient is used for calculation of the value of the first loss function.  
(Wu, Page 113, Column 1, Paragraph 2. “Here, we expect the learnt features of images in a triplet to meet their relative semantic similarity defined by rsim. Therefore, we optimize the following objective: 
    PNG
    media_image4.png
    70
    325
    media_image4.png
    Greyscale
”. rsim is a correlation coefficient that is used for calculation of the value of the first loss function as TS loss utilizes semantic similarity via rsim where the semantic similarity is a correlation coefficient).

Regarding Claim 6:
	Wu teaches the method according to Claim 1. Wu further teaches:
wherein the plurality of pieces of training data includes a plurality of groups each including a plurality of supervised training data sets and a plurality of groups each including a plurality of unsupervised training data sets.  
(Wu, Page 114, Fig. 2; Page 115, Table 1. Fig. 2. shows the method handling the training data sets which include a plurality of groups including supervised and unsupervised training data set which are shown in detail within Table 1).

Regarding Claim 7:
	Wu teaches the method according to Claim 1. Wu further teaches:
receiving an input of a learning condition including at least one of a network structure of the neural network as a target of the learning, the training data to be used in the learning, and a description of setting to be used at a time of the learning,
(Wu, Page 114, Fig. 2, Column 2, Paragraph 2, “In our network, an image triplet is first sampled from the training image set. Before feeding the images in triplets to the CNN, each image is resized into 256 256 … we mainly focus on the loss layer, we use the widely used basic architecture in Alexnet [18] as the feature learning model (the CNN components in Fig. 2) … the learnt features are fed into the triplet similarity loss layer which computes the gradient of Equation (11). If the label information is available, the learnt feature is fed into a ranking layer as the activation, and the output of the ranking layer is fed into the W2PR loss layer which computes the gradient of Equation (3). The optimization of the entire network is achieved with stochastic gradient method … parameter settings from the existing literature, the momentum is set to 0.9, and the batch size is set to 50. The learning rate for our model is set to 0.00002 at the start and we drop the learning rate after several epochs by a factor of 10”; Page 119, Column 1, Paragraph 2, “… we attempt to harness such images, i.e., the weakly labelled images and the unlabeled images, in a deep learning manner. To that end, we propose an approach named weakly semi-supervised deep learning (WeSed) for multi-label annotation. In WeSed, we devise a pairwiseranking loss and a triplet-ranking loss to fine-tune the convolutional neural network with weakly labeled and unlabeled images. The pairwise-ranking loss is employed to handle the weakly labeled images and the triplet-ranking loss is conducted to address the problem that images are possibly most unlabeled”. The WeSed approach is a that utilizes a Convolutional Neural Network architecture (structure) to perform the learning methodology for multi-label annotation (interpreted as the learning condition as the model was conditioned with the configurations/settings to optimize multi-label annotations for semi-supervised learning). The model receives an input of a learning condition (the training datasets with missing or unlabeled annotations). The learning rate is a description of a setting to be used at a time and WeSed drops the learning rate after a certain amount of time).
wherein at the performing the learning, the learning of the neural network is performed in accordance with the learning condition having been received.  
(Wu, Page 114, Fig. 2; Tables 4-10. Tables 4-10 denote the performances of the model performing the learning (training) with different datasets; where the WeSed approach utilizing the CNN denoted in Fig. 2 is curated to perform multi-label image annotations).
Regarding Claims 9-13:
Claims 9-13 incorporate substantively all the limitations of Claims 1-4 and 7 in a computer program product and further recites comprising a computer-readable medium including programmed instructions, the instructions causing a computer to execute (Page 114, Column 2, Paragraph 2, “Since we mainly focus on the loss layer, we use the widely used basic architecture in Alexnet as the feature learning model (the CNN components in Fig. 2)”. The WeSed methodology is handled on Alexnet which is a software package developed for CNN architectures for image classification tasks; which implies the process is done on a computing device, in which a CRM is inherent); thus, Claims 9-13 are rejected for reasons set forth in the rejections of Claims 1-4 and 7, respectively.

Regarding Claims 15-19:
Claims 15-19 incorporate substantively all the limitations of Claims 1-4 and 7 in a learning apparatus and further recites comprising one or more hardware processors configured to (Page 114, Column 2, Paragraph 2, “Since we mainly focus on the loss layer, we use the widely used basic architecture in Alexnet as the feature learning model (the CNN components in Fig. 2)”. The WeSed methodology is handled on Alexnet which is a software package developed for CNN architectures for image classification tasks; which implies the process is done on a computing device, in which a processor is inherent); thus, Claims 15-19 are rejected for reasons set forth in the rejections of Claims 1-4 and 7, respectively.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 8, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Wu et al., “Weakly Semi-Supervised Deep Learning for Multi-Label Image Annotation” in view of Park et al., “HyperTendril: Visual Analytics for User-Driven Hyperparameter Optimization of Deep Neural Networks”.

Regarding Claim 8:
	Wu teaches the method according to Claim 7. However, Wu does not explicitly disclose: 
displaying a display screen including at least one of a learning progress state of the neural network and a content of change recommendation for the learning condition depending on the learning progress state.
However, Park explicitly teaches:
displaying a display screen including at least one of a learning progress state of the neural network and a content of change recommendation for the learning condition depending on the learning progress state.
(Park, Page 1, Fig. 1; Page 2, Column 1, Paragraph 2, “To this end, we propose HyperTendril (Fig. 1), a web-based visual analytics system that supports HyperOpt tasks, where users can effectively perform HyperOpt through an iterative and interactive tuning procedure, allowing them to fine-tune the optimal hyperparameters based on their domain knowledge and insights obtained from the previous results. In detail, HyperTendril helps the users progressively refine their search spaces by explicitly highlighting relevant hyperparameters and the promising ranges to explore further, based on a quantitative analysis on the objective model performance (e.g., a test accuracy) (Fig. 1(C))”. The interactive display screen of HyperTendril offers statistics and settings/configurations/parameters/etc. (content of change) depending on the progress state of the training (as the display updates iteratively) for changes. Thus, the display screen includes a learning progress state (interpreted as training by the examiner) of the neural network architecture and a content of change recommendation (by supplying relevant information for the user to decide on changes)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the WeSed methodology of Wu for multi-label image annotation, with the use of the Park’s interactive display screen to review and recommend changes based off the learning state. One having ordinary skill in the art would have been motivated to implement this change before the effective filing date of the claimed invention, as this leads to more informatics for the user, alleviates tedious processes, visual representations, and the use of recommendations (see Park, Page 2, Column 1, Paragraph 2, “In order to alleviate the pain of tedious processes, human intuition for AutoML results, such as the behavior of search algorithms, the effect of optimization algorithm setting, and the characteristics of hyperparameters, should be accompanied. Thus, effective and efficient human intervention is critical during the HyperOpt process, which necessitates a visual analytics system that can leverage human insights to steer the optimization process in a user-driven manner”).

Regarding Claim 14:
Claim 14 incorporates substantively all the limitations of Claim 8 in a computer
Read full office action
NEURAL NETWORK LEARNING METHOD, COMPUTER PROGRAM PRODUCT, AND LEARNING APPARATUS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

NEURAL NETWORK LEARNING METHOD, COMPUTER PROGRAM PRODUCT, AND LEARNING APPARATUS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email