DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Note
In regards to the claim objection (claim 29), has been withdrawn in light of the instant amendments to the claims.
Response to argument
Applicant's arguments filed 12/29/2025 ("Arguments/Remarks") have been fully considered but they are not persuasive.
Argument – 1 (pg. 14 – 15) Applicant contends: “In the present application, with respect to whether the present claims recite mathematical concepts, Applicant respectfully submits that the features of the present claims are more analogous to those of Example 38 than those of Example 41, wherein, while some of the features of the independent claims may be based on mathematical concepts, the mathematical concepts are not recited in the claims. Moreover, the October 2019 Guidance makes clear the present claims do not recite any of the mathematical relationships, mathematical formulas or equations, or mathematical calculations deemed as mathematical concepts... Accordingly, Applicant respectfully submits the present claims do not recite any mathematical concepts deemed to be abstract ideas.
Regarding the above argument, the Examiner respectfully disagrees with the Applicant’s assertion that the present claims do not recite abstract idea: mathematical concepts. In the 2019 Revised Patent Subject Matter Eligibility Guidance (2019 PEG), example 38, The claim does not recite any of the judicial exceptions. In contrast, the present claims recites abstract idea: mathematical concepts. For example: “generating a first intermediate vector by applying a first activation function to first nodes” and “generating a second intermediate vector by applying a second activation function to the second nodes” recite a mathematical concept because it involves applying an activation function, which is a mathematical function, to numerical values associated with neural network nodes. Thus, the limitations describe the manipulation of mathematical relationships and calculations on numerical data.
Argument – 2 (pg. 16) Applicant contends: “ii. The claims do not recite mental processes abstract ideas The January 2019 Guidance and the USPTO's October 2019 Update: Subject Matter Eligibility ("October 2019 Guidance") make clear that the present claims do not recite "mental processes." For instance, the October 2019 Guidance states "Claims do not recite a mental process when they do not contain limitations that can practically be performed in the human mind, for instance when the human mind is not equipped to perform the claim limitations." At 7.”
Regarding the above argument, the Examiner respectfully emphasize that the rejected claims were not classified as abstract idea: mental process, but rather as a mathematical concept. Specifically, the limitations recite generating an intermediate vector by applying an activation function to nodes, which involves performing a mathematical function on numeric values represented as a vector. See the Claim Rejections - 35 USC § 101 section for the amended claim limitations.
Argument – 3 (pg. 16) Applicant contends: “i. The present claims use the claimed features in a manner that imposes a meaningful limit on the claimed features: A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the judicial exception." January 2019 Guidance at 12, 13, 18 (emphasis added).”
Regarding the above argument, the Examiner respectfully notes that the recited limitations merely perform mathematical operations within a generic machine learning model. The claims do not recite additional elements that limits the mathematical concept or apply it in a particular technological manner beyond its routine use of processing data. Thus, the claims do not integrate the judicial exception into practical application.
Argument – 4 (pg. 16 – 19) Applicant contends: “ii. The present claims recite features that reflect an improvement in the functioning of a computer, or an improvement to another technology or technical field: …
In evaluating whether the claimed as a whole integrates an alleged abstract idea into a practical application, the Office must "give weight to all additional elements, whether or not they are conventional," as "revised Step 2A specifically excludes consideration of whether the additional elements represent well-understood, routine, conventional activity." January 2019 Guidance at 19. Moreover, the present claims improve the technical fields of neural networks, neural network training, and spoof detection. For example, the published present application discloses: …”
Regarding the above argument, the Examiner respectfully notes that the cited paragraphs ([0085], [0086], [0130], [0137], [0139], [0148] and [0150]) describe improvement in a conclusory manner, such as: faster spoofing detection, reduced errors, and improved accuracy through use of intermediate layers and additional activation functions but these features are not reflected in the claim language. The claims do not recite using intermediate layer outputs, combining multiple classifier results or applying activation functions to address uncertainty or overconfidence. Accordingly, the alleged improvements described in the cited paragraphs are not reflected in the rejected claim. Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
Argument – 5 (pg. 22) Applicant contends: (pg. 22) “As shown, Zhou only reveals that the activation function is typically composed of a sigmoid function with a value range of 0 to 1 and a hyperbolic tangent function with a value range of -1 to 1. Zhou's disclosure differs not only from "the dynamic range of the second activation function is limited to [0, 1]," but also from "the peak value of the second activation function is fixed to '1'." Accordingly, Zhou fails to disclose, teach, or suggest "wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1," as recited in independent claim 1.”
Regarding the above argument with respect to amended claim(s) have been considered but are moot, because arguments/remarks are directed to amended claim limitations that were not previously examined by the examiner. The rejections are noted in the current office action to address amended claim limitations.
Argument – 6 (pg. 23) Applicant contends: “It is further respectfully submitted that neither Shirahata, Narayanan, Diamant, nor any combination thereof discloses, teaches, or suggests each and every claimed feature of independent claims 10, 16, 21, and 27…
As shown, while the Office Action asserts that Shirahata's ReLU1 discloses the claimed "first activation function" and ReLU2 discloses the claimed "second activation function," ReLU1 and ReLU2 are the same activation function (Equation 2) merely applied to different inputs at different stages of the neural network. Accordingly, Shirahata fails to disclose, teach, or suggest "a second activation function different from the first activation function," as recited in independent claim 10.”
Regarding the above argument, the Examiner respectfully notes that (Shirahata, ¶[0034]) identifies two distinct activation layers, ReLU1 and ReLU2, while sharing a common functional form, are mathematically and operationally distinct functional units defined by their unique hierarchical position (L) in the neural network. While they share the same activation function (σ), equation (2) confirms that they are parameterized by different layer specific biases (bL). Furthermore, equation (1) demonstrates that each layer processes different input data, as the convolution operation at each stage is performed on the output of the preceding layer (y L - 1). Thus, although the mathematical logic remain constant, the specific input and trainable parameters shows that ReLU1 and ReLU2 operate as separate (different) sequential steps.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim(s) 1, 3 – 13, 15 – 18, 20 – 21 and 23 – 32 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e. an abstract idea) without significantly more.
In step 1, of the 101-analysis set forth in the MPEP 2106, the examiner has determined
that the following limitations recite a process that, under the broadest reasonable interpretation, falls within one or more statutory categories (processes).
In step 2A prong 1, of the 101-analysis set forth in MPEP 2106, the examiner has determined
that the following limitations recite a process that, under broadest reasonable interpretation, covers
a mental process but for the recitation of generic computer components:
Regarding claim 1:
generating a first intermediate vector by applying a first activation function to first nodes
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves taking inputs from first node and applying a mathematical function (activation function) and producing a vector output. See (MPEP 2106.04)).
generating a second intermediate vector by applying a second activation function to the second nodes;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves taking inputs from second node and applying a mathematical function (activation function) and producing a vector output. See (MPEP 2106.04)).
If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
in a first intermediate layer adjacent to an input layer among intermediate layers of the neural network;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
transferring the first intermediate vector to second nodes in a second intermediate layer adjacent to an output layer among the intermediate layers;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation directed to mere data gathering as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity. See MPEP (2106.05(g))).
applying the second intermediate vector to an output layer of the neural network
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
wherein the second activation function is determined by a first hyperparameter of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function to fix a peak value of the second activation function.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitation (I and III), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Regarding limitation (II), additional elements considered extra/post solution activity, as analyzed above, are activity that are well-understood routine and conventional, specifically: the courts have recognized the computer functions as well‐understood, routine, and conventional functions.
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TL| Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II).
Regarding limitation (IV and V), additional elements are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because they generally link the judicial exception to the technology environment, see MPEP 2106.05(h).
As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 10,
extracting a first result value by applying the first activation function to intermediate nodes comprised in each of the intermediate layers;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves taking inputs from intermediate node and applying the mathematical function (first activation function) to extract a first result value. See (MPEP 2106.04)).
extracting a second result value by applying a second activation function different from the first activation function to additional nodes connected to intermediate nodes in one or more of the intermediate layers; and
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves taking inputs from additional nodes connected to intermediate nodes and applying the mathematical function (second activation function) to extract a second result value. See (MPEP 2106.04)).
If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
A processor-implemented method with a neural network
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
training the neural network based on a difference between the first result value and the second result value.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitation (I and II), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Regarding limitation (III), additional elements are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because they generally link the judicial exception to the technology environment, see MPEP 2106.05(h).
As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 16,
generating a first feature vector
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves evaluating input data to generate a first feature vector. See (MPEP 2106.04)).
generating a second feature vector
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves evaluating input data to generate a first feature vector. See (MPEP 2106.04)).
If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
A processor-implemented method with a neural network
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
by propagating training data input to an input layer of the neural network to first nodes that are included in a first intermediate layer adjacent to the input layer among intermediate layers of the neural network and that operate according to a first activation function;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation directed to mere data gathering as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity. See MPEP (2106.05(g))).
performing primary training on the neural network based on a difference between the first feature vector and a ground truth vector corresponding to the training data;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
by propagating the first feature vector to second nodes that are included in a second intermediate layer adjacent to an output layer among the intermediate layers of the primary trained neural network;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation directed to mere data gathering as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity. See MPEP (2106.05(g))).
performing secondary training on the primary trained neural network based on a difference between an output value output through the output layer from the second feature vector and a ground truth value corresponding to the training data.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitation (I, III and V), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Regarding limitation (II and IV), additional elements considered extra/post solution activity, as analyzed above, are activity that are well-understood routine and conventional, specifically: the courts have recognized the computer functions as well‐understood, routine, and conventional functions.
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TL| Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II).
Regarding limitation (VI), additional elements are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because they generally link the judicial exception to the technology environment, see MPEP 2106.05(h).
As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 21,
extracting one or more first feature vectors from a plurality of intermediate layers of the neural network that detects whether biometric information is spoofed from input data comprising the biometric information of a user
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves extracting one or more first feature vectors from plurality of intermediate layers by evaluating input data and determining whether biometric information is spoofed. See (MPEP 2106.04)).
detecting a first spoofing detection result of the biometric information by determining a first score based on the one or more first feature vectors;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves detecting a first spoofing result by determining a first score based on one or more feature vectors, which is evaluating numerical representations and applying rules to reach a determination. See (MPEP 2106.04)).
determining, in response to the first spoofing detection result being detected, a second score
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves determining a second score in response to a first spoofing detection result, which is evaluating prior numerical results and applying rules to generate an additional values. See (MPEP 2106.04)).
detecting a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves detection a second spoofing result by combining the first score and second score into a single value and making a determination based on that combined score. See (MPEP 2106.04)).
If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
A processor-implemented method with a neural network
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
using one or more pre-trained first classifiers
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
by applying, to a pre-trained second classifier, an output vector output from an output layer of the neural network;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
wherein either one or both of the first classifiers and the second classifier is trained by an activation function that is determined by a first hyperparameter of which a multiplier of the activation function is associated with an ascending slope of the activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the activation function to fix a peak value of the activation function for the neural network.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h)).
In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitation (I, II and III), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Regarding limitation (IV and V), additional elements are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because they generally link the judicial exception to the technology environment, see MPEP 2106.05(h).
As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 27,
extract one or more first feature vectors from a plurality of intermediate layers of the neural network configured to detect whether biometric information is spoofed from the input data
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves extracting one or more first feature vectors from plurality of intermediate layers by evaluating input data and determining whether biometric information is spoofed. See (MPEP 2106.04)).
detect a first spoofing detection result of the biometric information by determining a first score based on the one or more first feature vectors;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves detecting a first spoofing result by determining a first score based on one or more feature vectors, which is evaluating numerical representations and applying rules to reach a determination. See (MPEP 2106.04)).
determine, in response to the first spoofing detection result being detected, a second score
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves determining a second score in response to a first spoofing detection result, which is evaluating prior numerical results and applying rules to generate an additional values. See (MPEP 2106.04)).
detect a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves detection a second spoofing result by combining the first score and second score into a single value and making a determination based on that combined score. See (MPEP 2106.04)).
If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
An electronic device with a neural network, the electronic device comprising: one or more processors configured to
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
a sensor configured to capture input data comprising biometric information of a user;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation directed to mere data gathering as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity. See MPEP (2106.05(g))).
using one or more pre-trained first classifiers;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
by applying an output vector output from an output layer of the neural network to a pre-trained second classifier;
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
an output device configured to output either one or both of the first spoofing detection result and the second spoofing detection result
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation directed to mere data outputting as deemed insufficient to transform the judicial exception because claimed elements are considered insignificant extra-solution activity. See MPEP (2106.05(g))).
wherein either one or both of the first classifiers and the second classifier is trained based on an activation function that is determined by a first hyperparameter of which a multiplier of the activation function is associated with an ascending slope of the activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the activation function to fix a peak value of the activation function for the neural network.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitation (I, III and IV), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Regarding limitation (II), additional elements considered extra/post solution activity, as analyzed above, are activity that are well-understood routine and conventional, specifically: the courts have recognized the computer functions as well‐understood, routine, and conventional functions.
Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TL| Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network). See MPEP 2106.05(d)(II).
Regarding limitation (V), additional elements considered extra/post solution activity, as analyzed above, are activity that are well-understood routine and conventional, specifically: the courts have recognized the computer functions as well‐understood, routine, and conventional functions.
Data gathering and outputting: see Mayo, 566 U.S. at 79, 101 USPQ2d at 1968; OIP Techs., Inc. v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1092-93 (Fed. Cir. 2015).
Regarding limitation (VI and VII), additional elements are deemed insufficient to transform the judicial exception to a patentable invention to a patentable invention because they generally link the judicial exception to the technology environment, see MPEP 2106.05(h).
As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 29,
performing first spoofing detection by determining a first score based on one or more first feature vectors generated using a first intermediate layer of the neural network based on input data;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves detecting a first spoofing result by determining a first score based on one or more feature vectors, which is evaluating numerical representations and applying rules to reach a determination. See (MPEP 2106.04)).
in response to determining to perform the second spoofing detection based on the first score, determining a second score based on an output vector generated by an output layer of the neural network based on the one or more first feature vectors;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves after determining to perform the second spoofing detection, determining a second score based on an output vector generated by an output layer, which is evaluating prior numerical results and applying rules to generate an additional values. See (MPEP 2106.04)).
performing the second spoofing detection based on a score in which the first score and the second score are combined.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves detection a second spoofing result by combining the first score and second score into a single value and making a determination based on that combined score. See (MPEP 2106.04)).
If the claim limitations, under their broadest reasonable interpretation, covers performance of the limitations as a mental process, but for the recitation of generic computer components, then it falls within the mental process. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 of the 101-analysis, set forth in MPEP 2106, the examiner has determined that
the following additional elements do not integrate this judicial exception into a practical application:
As evaluated below:
• The preamble is deemed insufficient to transform the judicial exception to a patentable
invention to a patentable invention because the preamble generally links the use of a
judicial exception to a particular technological environment or field of use, see MPEP
2106.05(h).
A processor-implemented method with a neural network
(i.e.: deemed insufficient to transform the judicial exception to a patentable invention because the claim recites limitation which does not amount to more than a recitation of the words "apply it" (or an equivalent), such as mere instructions to implement an abstract idea on a computer. See MPEP 2106.05(f)).
In Step 2B of the 101-analysis set forth in the 2019 PEG, the examiner has determined that the
claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception:
Regarding limitation (I), recite mere application of the abstract idea or mere instructions to implement an abstract idea on a computer are deemed insufficient to transform the judicial exception to a patentable invention because the limitations generally apply the use of a generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
As analyzed above, the additional elements, analyzed above, do not integrate the noted judicial exception into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Therefore, the claim is directed to an abstract idea.
Regarding claim 3, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the second activation function is represented as a(x) and is represented by the following equation:
PNG
media_image1.png
84
424
media_image1.png
Greyscale
wherein a denotes the first hyperparameter associated with the ascending slope of the second activation function, b denotes the second hyperparameter associated with the descending slope of the second activation function, e denotes Euler's number, x denotes an input of the second nodes, and O(x) denotes a Heaviside step function that allows an output of the second activation function to be 0 when x is less than 0.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves a mathematical relationship expressed as a formula for an activation function, which defines the transformation of input x using exponential and polynomial terms, hyperparameters and a Heaviside step function, which falls within the abstract idea, mathematical concept. See (MPEP 2106.04)).
Claim(s) 13, 18, 23 and 28 recite similar subject matter as claim 3, so are rejected under the same rationale.
Regarding claim 4, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the first activation function comprises any one or any combination of any two or more of a step function, a sigmoid function, a hyperbolic tangent function, a rectified linear unit (ReLU) function, and a leaky ReLU function.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 5, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the neural network comprises any one or any combination of any two or more of a convolutional neural network (CNN), a deep neural network (DNN), and a recurrent neural network (RNN).
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 6, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the neural network is a trained neural network, and the training of the neural network comprises:
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
extracting a first result value by applying the first activation function to intermediate nodes comprised in each of the intermediate layers;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves taking inputs from intermediate node and applying the mathematical function (first activation function) to extract a first result value. See (MPEP 2106.04)).
extracting a second result value by applying the second activation to additional nodes connected to intermediate nodes in one or more of the intermediate layers;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves taking inputs from additional nodes connected to intermediate nodes and applying the mathematical function (second activation function) to extract a second result value. See (MPEP 2106.04)).
training the neural network based on a difference between the first result value and the second result value.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 7, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the first intermediate vector is generated based on training data input, and
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
performing primary training on the neural network based on a difference between the first intermediate vector and a ground truth vector corresponding to the training data;
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
performing the secondary training on the primary trained neural network based on a difference between an output value output through the output layer from the second intermediate vector and a ground truth value corresponding to the training data.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 8, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
detecting a first spoofing detection result of biometric information by determining a first score based on the first intermediate vector;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves evaluating biometric information and assign first score and deciding whether spoofing occurred. See (MPEP 2106.04)).
determining, in response to the first spoofing detection result being detected, a second score based on a result of the applying of the second intermediate vector to the output layer;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves evaluating data, calculating a second score based on the output derived by applying the second intermediate vector to the output vector. See (MPEP 2106.04)).
detecting a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process. It involves determining a second spoofing result by combining the first and second score into a single value and make determination. See (MPEP 2106.04)).
Regarding claim 9, dependent upon claim 1, and fail to resolve the deficiencies identified above by
integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method of claim 1.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 11, dependent upon claim 10, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the second activation function is determined by a first hyperparameter of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function to fix a peak value of the second activation function.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 12, dependent upon claim 10, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein a total number of the additional nodes is one less than a total number of the intermediate nodes, and the additional nodes and the intermediate nodes are fully connected.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 15, dependent upon claim 10, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the first activation function comprises any one or any combination of any two or more of a step function, a sigmoid function, a hyperbolic tangent function, a rectified linear unit (ReLU) function, and a leaky ReLU function.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 17, dependent upon claim 16, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the second activation function is determined by a first hyperparameter of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function to fix the peak value of the second activation function.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 20, dependent upon claim 16, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the first activation function comprises any one or any combination of any two or more of a step function, a sigmoid function, a hyperbolic tangent function, a rectified linear unit (ReLU) function, and a leaky ReLU function.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 24, dependent upon claim 21, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
extracting a feature vector from a first intermediate layer among the intermediate layers
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing data from a first intermediate layer to obtain a vector. See (MPEP 2106.04)).
using a classifier among the first classifiers;
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
extracting another feature vector from a second intermediate layer following the first intermediate layer
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing data from a second intermediate layer following the first intermediate layer to obtain a vector. See (MPEP 2106.04)).
using another classifier among the first classifiers;
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
extracting a combined feature vector in which the feature vector and the other feature vector are combined.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves observing data from feature vectors to obtain a combined feature vector. See (MPEP 2106.04)).
Regarding claim 25, dependent upon claim 24, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
determining the first score based on a similarity between the combined feature vector and either one or both of a registered feature vector and a spoofed feature vector that is provided in advance;
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves determining a first score by evaluating the similarity between a combined feature vector and one or more previously provided feature vector (registered or spoofed). See (MPEP 2106.04)).
classifying the first score into a score determined to be spoofed information or a score determined to be ground truth information
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves grouping the first score into either a score representing spoofed information or a score representing ground truth information. See (MPEP 2106.04)).
using the first classifiers.
Deemed insufficient to transform the judicial exception to a patentable invention because the limitation is directed to mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea and are considered to adding the words “apply it” (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to using the computer as a tool for implementing an abstract idea cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 26, dependent upon claim 21, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the biometric information comprises any one or any combination of any two or more of a fingerprint, an iris, and a face of the user.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 30, dependent upon claim 29, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein the one or more first feature vectors are generated by applying input data to a first activation function of the first intermediate layer, and
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
the determining of the second score comprises: generating one or more second feature vectors by applying the one or more first feature vectors to a second activation function of a second intermediate layer,
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves generating one or more second feature vector by applying the first feature vectors to a second activation function of a second intermediate layer. See (MPEP 2106.04)).
wherein one or more intermediate layers are disposed between the first intermediate layer and the second intermediate layer;
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
generating the output vector based on the one or more second feature vectors, using the output layer.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mathematical concept: It involves applying mathematical rules to transform one set of numerical values into another. See (MPEP 2106.04)).
Regarding claim 31, dependent upon claim 30, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
wherein a dynamic range of an output of the second activation function is less than the first activation function.
The recitation in the additional limitation simply links the judicial exception to a field of use and/or technology environment, see MPEP 2106.05(h).
Limitations directed to field of use cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Regarding claim 32, dependent upon claim 29, and fail to resolve the deficiencies identified above by integrating the judicial exception into a practical application, or introducing significantly more than the judicial exception. The claim recites:
determining not to perform the second spoofing detection in response to the first score being within a predetermined threshold value range; and
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves making a determination not to perform a second spoofing detection when the first score falls within a predetermined threshold range. See (MPEP 2106.04)).
determining to perform the second spoofing detection in response to the first score being outside the predetermined threshold value range.
(i.e.: the broadest reasonable interpretation, the claim recites abstract idea: mental process: It involves making a determination not to perform a second spoofing detection when the first score falls outside a predetermined threshold range. See (MPEP 2106.04)).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 4 – 6 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA, Pub.
No.: US20180330229A1, in view of Narayanan et al., Pub. No.: US11038520B1, Diamant et al., Pub. No.: US10740432B1 and Kang et al., US20210264247A1.
Regarding claim 1, SHIRAHATA teaches: A processor-implemented method with a neural network, the method comprising:
(SHIRAHATA, “[0004] According to an aspect of the invention, an information processing apparatus includes a memory and a processor [A processor-implemented method with a neural network] coupled to the memory and configured to set a first memory region in the memory as a region to be used for input to a first intermediate layer of a layered neural network and for output from the first intermediate layer, set a second memory region in the memory as a buffer region for the first intermediate layer, execute a recognition process of storing, in the second memory region, characteristic data corresponding to a characteristic of an input neuron data item to the first intermediate layer, and execute a learning process of determining an error of the first intermediate layer using the characteristic data stored in the second memory region.”)
generating a first intermediate vector by applying a first activation function to first nodes in a first intermediate layer adjacent to an input layer among intermediate layers of the neural network;
(SHIRAHATA, “[0062] For example, as indicated by the number “1”, the convolution operation is executed by the first convolutional layer (Conv1) on neuron data items received from the input layer (Input), a parameter is applied to the results of the operation, and the results of the application are output to the first activation function layer (ReLU1). [0063] As indicated by a number “2”, the in-place process is executed by the first activation function layer (ReLU1). Specifically, the input neuron data items are stored in a memory region secured for the first activation function layer (ReLU1), and the activation function is applied to the input neuron data items to calculate output neuron data items [generating a first intermediate vector by applying a first activation function to first nodes in a first intermediate layer adjacent to an input layer among intermediate layers of the neural network] (i.e.: ReLU1 output is the first intermediate vector). The output neuron data items are written over the input neuron data items stored in the memory region and are output to the second convolutional layer (Conv2).”)
transferring the first intermediate vector to second nodes in a second intermediate layer adjacent to an output layer among the intermediate layers;
(SHIRAHATA, “[0064] As indicated by a number “3”, when the neuron data items output from the first activation function layer (ReLU1) are input to the second convolutional layer (Conv2) [transferring the first intermediate vector to second nodes in a second intermediate layer adjacent to an output layer among the intermediate layers] (i.e.: the ReLU1 vector is fed forward to Conv2), the convolution operation is executed on the neuron data items by the second convolutional layer (Conv2), a parameter is applied to the results of the operation, and the results of the application are input to the second activation function layer (ReLU2).”)
generating a second intermediate vector by applying a second activation function to the second nodes; and applying the second intermediate vector to an output layer of the neural network,
(SHIRAHATA, “[0065] As indicated by a number “4”, the in-place process is executed by the second activation function layer (ReLU2). Specifically, the input neuron data items are stored in a memory region secured for the second activation function layer (ReLU2) [by applying a second activation function to the second nodes] (i.e.: the ReLU2 output is the second intermediate vector), the activation function is applied to the input neuron data items to calculate output neuron data items. [generating a second intermediate vector] The output neuron data items are written over the input neuron data item stored in the memory region and are output to the first pooling layer (Pool1) [applying the second intermediate vector to an output layer of the neural network] (i.e.: Pool1 is the next processing stage after the second intermediate vector).”)
wherein the second activation function is determined by a first hyperparameter
(SHIRAHATA, “[0125] As illustrated in FIGS. 8A, 8B and 8C, the whole controller 50 reads the definition information 41 and the parameter information 42 (in S1). The whole controller 50 identifies hyperparameters (learning rate, momentum, batch size, maximum number of iterations, and the like) [wherein the second activation function is determined by a first hyperparameter] based on the definition information 41 and the parameter information 42 (in S2) and acquires the number max_iter of repeated executions of the learning. Then, the whole controller 50 identifies the configuration of the neural network based on the definition information 41 and the parameter.”)
SHIRAHATA does not teach:
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function
to fix a peak value of the second activation function.
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Narayanan teaches:
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function
(Narayanan, col.14 line [32 – 50], “FIG. 9 depicts a flowchart of an example for determining LSBs by the per-neuron circuits according to one or more embodiments of the present invention. Here, the method 900 includes receiving the input data from the shared circuit 410, at block 902. The input data includes at least the MSBs 464 and the voltage interval 525. In one or more embodiments of the present invention, the input also includes a slope-identifier bit 720 that identifies a slope of the activation function 510 in this interval. In the case of an activation function 510 [of which a multiplier of the second activation function] that is monotonic, the slope-identifier bit 720 is set to a default value, e.g., 0, that is indicative of an ascending slope (or descending slope) [is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function]. In the case of non-monotonic functions, the slope-identifier bit 720 varies between 0 and 1 to indicate the ascending slope and the descending slope respectively. It is understood that in one or more embodiments of the present invention, 0 and 1 can swap roles, relative to their roles in the examples here. The slope-identifier bit 720 is determined by the controller 412 based on the LUT 416. ”)
Narayanan and SHIRAHATA are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Narayanan with teachings of SHIRAHATA to enable per-neuron circuits to output activation values with reduced computational overload, to improve memory management and error determination (Narayanan, Abstract).
SHIRAHATA in view of Narayanan do not teach:
to fix a peak value of the second activation function.
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Diamant teaches:
to fix a peak value of the second activation function.
(Diamant, col.17 Line [44 – 67], “As another example, function table 406 can also be programmed to implement non-uniform quantization, where the step size between adjacent input boundary values is different from different input subranges. The distribution of the input boundary values can be determined based on, for example, a degree of linearity as well as a degree of change of the activation function for a particular input subrange. A degree of linearity can reflect whether the slope of the activation function is a constant or is changing within that input subrange. A high degree of linearity means the slope of the activation function remains constant [to fix a peak value of the second activation function], whereas a low degree of linearity means the slope of the activation function changes. Referring to FIG. 4E, to improve the accuracy of extrapolation based on slope and/or Taylor series coefficients, input boundary values can be more sparsely distributed for input subranges where the activation function is relatively linear (e.g., input subrange 474) and where the activation function experiences very small change with respect to input (e.g., input subranges 476 and 478). On the other hand, for input subrange 480, the activation function is relatively non-linear and the input boundary values can be more densely distributed within input subrange 480 to improve the accuracy of extrapolation and the resultant activation processing result.”)
Diamant, SHIRAHATA and Narayanan are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Diamant with teachings of SHIRAHATA and Narayanan by selecting base values and parameters and applying arithmetic circuits for interpolation, the system can efficiently compute estimated outputs of neural network functions, further reducing computational cost (Diamant, Abstract).
SHIRAHATA in view of Narayanan and Diamant do not teach:
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Kang teaches:
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(Kang, “[0019] A neural network generally contains multiple neurons, and connections between those neurons. A neuron generally is a part of a neural network computer system that determines an output based on one or more inputs (that can be weighted), and the neuron can determine this output based on determining the output of an activation function with the possibly-weighted inputs. Examples of activation functions include a rectifier/rectified linear unit (ReLU) activation function, which produces an output that ranges between 0 and infinity, inclusive; tan h, which produces an output that ranges between −1 and 1 [wherein the peak value of the second activation function is fixed to 1], inclusive; and sigmoid, which produces an output that ranges between 0 and 1 [and a dynamic range of the second activation function is from a value of 0 to a value of 1], inclusive. While several of the non-limiting examples described herein concern a ReLU activation function, it can be appreciated that these techniques can be applied to other activation functions.”)
Kang, SHIRAHATA, Narayanan and Diamant are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Kang with teachings of SHIRAHATA, Narayanan and Diamant for improving the efficiency of neural network computation by predicting an output condition based on an intermediate calculation and selectively terminating further processing (Kang, Abstract).
Regarding claim 4, SHIRAHATA in view of Narayanan, Diamant and Kang teach the method of claim 1.
SHIRAHATA further teaches: wherein the first activation function comprises any one or any combination of any two or more of a step function, a sigmoid function, a hyperbolic tangent function, a rectified linear unit (ReLU) function, and a leaky ReLU function.
(SHIRAHATA, “[0059] The example illustrated in FIG. 3 indicates data and the order of processes in the case where the learning of the convolutional neural network as the neural network is executed. The neural network has a layered structure in which layers are arranged in order. The neural network includes an input layer (Input), a first convolutional layer (Conv1), a first activation function layer (ReLU1) [wherein the first activation function comprises any one or any combination of any two or more of a step function, a sigmoid function, a hyperbolic tangent function, a rectified linear unit (ReLU) function, and a leaky ReLU function], a second convolutional layer (Conv2), a second activation function layer (ReLU2), a first pooling layer (Pool1), a first fully-connected layer (Fully-conn1), and a third activation function layer (ReLU3) in this order. The neural network further includes a second fully-connected layer (Fully-conn2), a softmax layer (Softmax), and an output layer (Output) in this order. FIG. 3 exemplifies the case where the intermediate layers that execute the in-place process are the activation function layers (ReLU1, ReLU2, and ReLU3).”)
Regarding claim 5, SHIRAHATA in view of Narayanan, Diamant and Kang teach the method of claim 1.
SHIRAHATA further teaches: wherein the neural network comprises any one or any combination of any two or more of a convolutional neural network (CNN), a deep neural network (DNN), and a recurrent neural network (RNN).
(SHIRAHATA, “[0029] FIG. 1 illustrates, as an example of a neural network, an example of a convolutional neural network (CNN) [wherein the neural network comprises any one or any combination of any two or more of a convolutional neural network (CNN), a deep neural network (DNN), and a recurrent neural network (RNN).] to be used to recognize an image. The case where an image is recognized by the convolutional neural network as the neural network is described below as an example.”)
Regarding claim 6, SHIRAHATA in view of Narayanan, Diamant and Kang teach the method of claim 1.
SHIRAHATA further teaches: wherein the neural network is a trained neural network, and the training of the neural network comprises: extracting a first result value by applying the first activation function to intermediate nodes comprised in each of the intermediate layers;
(SHIRAHATA, “[0062] For example, as indicated by the number “1”, the convolution operation is executed by the first convolutional layer (Conv1) on neuron data items received from the input layer (Input), a parameter is applied to the results of the operation, and the results of the application are output to the first activation function layer (ReLU1). [0063] As indicated by a number “2”, the in-place process is executed by the first activation function layer (ReLU1). Specifically, the input neuron data items are stored in a memory region secured for the first activation function layer (ReLU1) [by applying the first activation function to intermediate nodes comprised in each of the intermediate layers], and the activation function is applied to the input neuron data items to calculate output neuron data items [extracting a first result value] (i.e.: ReLU1 output is the first intermediate vector). The output neuron data items are written over the input neuron data items stored in the memory region and are output to the second convolutional layer (Conv2).”)
extracting a second result value by applying the second activation to additional nodes connected to intermediate nodes in one or more of the intermediate layers; and
(SHIRAHATA, “[0065] As indicated by a number “4”, the in-place process is executed by the second activation function layer (ReLU2). Specifically, the input neuron data items are stored in a memory region secured for the second activation function layer (ReLU2) [by applying the second activation to additional nodes connected to intermediate nodes in one or more of the intermediate layers] (i.e.: the ReLU2 output is the second intermediate vector), the activation function is applied to the input neuron data items to calculate output neuron data items [extracting a second result value]. The output neuron data items are written over the input neuron data item stored in the memory region and are output to the first pooling layer (Pool1).”)
training the neural network based on a difference between the first result value and the second result value.
(SHIRAHATA, “[0052] In addition, in backpropagation, a gradient of an error with respect to input is calculated using a partial differential from an error in the output layer (Output) [training the neural network based on a difference between] (i.e.: error gradient is calculated during backpropagation using difference between activation results). For example, in the activation function layers (ReLU1 and ReLU2) [the first result value and the second result value] for executing the operation using the activation function such as ReLU, a gradient of an error with respect to input is calculated from the following Equation (10-1). σ′(x) is obtained by differentiating σ(x) with respect to x and calculated from the following Equation (10-2). A value used upon the recognition is used as x. The error gradient (∂E/∂xL i) is calculated by substituting σ′(x) into Equation (10-1).”)
Regarding claim 9, SHIRAHATA in view of Narayanan, Diamant and Kang teach the method of claim 1.
SHIRAHATA further teaches: A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, configure the one or more processors to perform the method of claim 1.
(SHIRAHATA, “[0086] The storage unit 20 [A non-transitory computer-readable storage medium storing instructions that] is a storage device such as a hard disk or a solid state drive (SSD). The motherboard 21 is a board to which components serving as main functions of the information processing apparatus 10 are attached. The accelerator board 22 is a board on which hardware added and to be used to improve processing power of the information processing apparatus 10 [when executed by one or more processors, configure the one or more processors to perform the method of claim 1] is installed. Multiple accelerator boards 22 may be installed. The present embodiment describes, as an example, the case there the single accelerator board 22 is installed.”)
Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA in view of Narayanan, Diamant, Kang and in further view of Vinayachandran et al., Pub. No.: US20250070877A1.
SHIRAHATA in view of Narayanan, Diamant and Kang teach the method of claim 1.
SHIRAHATA in view of Narayanan, Diamant and Kang do not teach:
wherein the first intermediate vector is generated based on training data input, and further comprising: performing primary training on the neural network based on a difference between the first intermediate vector and a ground truth vector corresponding to the training data; and performing the secondary training on the primary trained neural network based on a difference between an output value output through the output layer from the second intermediate vector and a ground truth value corresponding to the training data.
Vinayachandran teaches:
wherein the first intermediate vector is generated based on training data input, and further comprising: performing primary training on the neural network based on a difference between the first intermediate vector and a ground truth vector corresponding to the training data; and performing the secondary training on the primary trained neural network based on a difference between an output value output through the output layer from the second intermediate vector and a ground truth value corresponding to the training data.
(Vinayachandran, “[0006] The present disclosure provides a calibration apparatus that comprises at least one memory that is configured to store instructions and at least one processor. The processor is configured to execute the instructions to: train a first machine learning-based model [performing primary training on the neural network] with a first training dataset to determine configuration parameters of an intermediate pre-distortion compensator in an optical communication system that includes a transmitter, a receiver, and an optical communication channel, the transmitter including a pre-distortion compensator, the intermediate pre-distortion compensator, and a Mach Zehnder Modulator (MZM) compensator, the first training dataset including multiple pairs of: a first input data [based on a difference between the first intermediate vector] that represents a transmission symbol sequence representing a message to be sent; and a first ground-truth data [and a ground truth vector corresponding to the training data] that represents an inverse signal of distortion that is generated based on an output from an electrical path in the transmitter, the output from the electrical path being acquired by feeding the transmission symbol sequence to the transmitter in which the intermediate pre-distortion compensator and the MZM compensator are enabled while the pre-distortion compensator is disabled; train a second machine learning-based model [performing the secondary training on the primary trained neural network] with a second training dataset to determine configuration parameters of the post-distortion compensator included in the receiver, the second training dataset including multiple pairs of: a second input data that represents an output from an optical front end included in the receiver [based on a difference between an output value output through the output layer from the second intermediate vector]; and a second ground-truth data that represents the transmission symbol sequence [and a ground truth value corresponding to the training data], the output from the optical front end being acquired by feeding the transmission symbol sequence to the optical communication system in which the intermediate pre-distortion compensator, the MZM compensator, and the post-distortion compensator are enabled while the pre-distortion compensator is disabled and in which the configuration parameters determined using the first machine learning-based model are applied to the intermediate pre-distortion compensator; and train a third machine learning-based model with a third training dataset to determine configuration parameters of the pre-distortion compensator.”)
Vinayachandran, SHIRAHATA, Narayanan, Diamant and Kang are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Vinayachandran with teachings of SHIRAHATA, Narayanan, Diamant and Kang to dynamically adjust configuration parameters to improve accuracy and robustness of neural network operations under varying transmission conditions. (Vinayachandran, Abstract).
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA in view of Narayanan, Diamant, Kang and in further view of Joshi et al., Pub. No.: US11341778B1.
SHIRAHATA in view of Narayanan, Diamant and Kang teach the method of claim 1.
SHIRAHATA in view of Narayanan, Diamant and Kang do not teach:
detecting a first spoofing detection result of biometric information by determining a first score based on the first intermediate vector; determining, in response to the first spoofing detection result being detected, a second score based on a result of the applying of the second intermediate vector to the output layer; and detecting a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined.
Joshi teaches:
detecting a first spoofing detection result of biometric information by determining a first score based on the first intermediate vector; determining, in response to the first spoofing detection result being detected, a second score based on a result of the applying of the second intermediate vector to the output layer; and
(Joshi, (col. 9 line [58 – 67] – col.10 line [1 – 17]), “More sophisticated search strategies are also within the scope of this disclosure, e.g., using random configuration selection (e.g., selecting a random positional parameter for each image capture), adaptive grid spacing to fine-tune ranges of a failure space, and/or other methods. In some implementations, the search includes a greedy search in which parameters are adjusted to find locally more effective spoof parameter combinations. In some implementations, the search includes a gradient search in which one or more of the spoof parameters is adjusted slightly, spoof detection [detecting a first spoofing detection result of biometric information] results are obtained (e.g., corresponding numerical scores [by determining a first score based on the first intermediate vector] indicative of spoofing), and the spoof detection results are used to calculate a gradient vector of the spoof parameters; spoof parameters can then be adjusted in the opposite direction of the gradient vector (e.g., in a gradient descent optimization process), with the process iterated repeatedly to converge on spoof parameters that generate a failure condition, e.g., a spoof accept event. In some implementations, the search includes a genetic algorithm approach in which numerical scores are calculated for multiple random initial sets of spoof parameters [determining, in response to the first spoofing detection result being detected, a second score based on a result of the applying of the second intermediate vector to the output layer], offspring sets of spoof parameters are created by applying mutation and/or recombination operators to the initial and following sets of spoof parameters, and the process iterated repeatedly until spoof parameters that generate a failure condition (e.g., a spoof accept event) are found.”)
detecting a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined.
(Joshi, col. 9 line [12 – 21], “In some implementations, the numerical score is fused with one or more other scores, e.g., as a weighted combination, to determine an overall biometric authentication result. For example, a first numerical score may indicate a spoofing likelihood, a second numerical score may indicate a biometric matching likelihood (e.g., likelihood that a facial image matches a reference facial image for a user), and the two scores can be combined in a weighted combination [detecting a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined] that indicates overall biometric authentication success or failure in reference to a threshold value.”)
Joshi, SHIRAHATA, Narayanan, Diamant and Kang are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Joshi with teachings of SHIRAHATA, Narayanan, Diamant and Kang to dynamically adjust configuration parameters by generating varied spoof representations, testing them against biometric authentication, and identifying failure conditions, to enhance robustness against adversarial inputs or fraudulent data affecting neural network training and inference. (Joshi, Abstract).
Claim(s) 10, 12 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA in view of Persson et al., Pub. No.: US20210350240A1 and Kang.
Regarding claim 10, SHIRAHATA teaches: A processor-implemented method with a neural network, the method comprising:
(SHIRAHATA, “[0004] According to an aspect of the invention, an information processing apparatus includes a memory and a processor [A processor-implemented method with a neural network, the method comprising] coupled to the memory and configured to set a first memory region in the memory as a region to be used for input to a first intermediate layer of a layered neural network and for output from the first intermediate layer, set a second memory region in the memory as a buffer region for the first intermediate layer, execute a recognition process of storing, in the second memory region, characteristic data corresponding to a characteristic of an input neuron data item to the first intermediate layer, and execute a learning process of determining an error of the first intermediate layer using the characteristic data stored in the second memory region.”)
extracting a first result value by applying a first activation function to intermediate nodes comprised in each of intermediate layers of the neural network;
(SHIRAHATA, “[0062] For example, as indicated by the number “1”, the convolution operation is executed by the first convolutional layer (Conv1) on neuron data items received from the input layer (Input), a parameter is applied to the results of the operation, and the results of the application are output to the first activation function layer (ReLU1). [0063] As indicated by a number “2”, the in-place process is executed by the first activation function layer (ReLU1). Specifically, the input neuron data items are stored in a memory region secured for the first activation function layer (ReLU1) [by applying a first activation function to intermediate nodes comprised in each of intermediate layers of the neural network], and the activation function is applied to the input neuron data items to calculate output neuron data items [extracting a first result value] (i.e.: ReLU1 output is the first intermediate vector). The output neuron data items are written over the input neuron data items stored in the memory region and are output to the second convolutional layer (Conv2).”)
extracting a second result value by applying a second activation function different from the first activation function to additional nodes connected to intermediate nodes in one or more of the intermediate layers;
(SHIRAHATA, “[0065] As indicated by a number “4”, the in-place process is executed by the second activation function layer (ReLU2). Specifically, the input neuron data items are stored in a memory region secured for the second activation function layer (ReLU2) [by applying a second activation function different from the first activation function to additional nodes connected to intermediate nodes in one or more of the intermediate layers] (i.e.: the ReLU2 output is the second intermediate vector), the activation function is applied to the input neuron data items to calculate output neuron data items [extracting a second result value]. The output neuron data items are written over the input neuron data item stored in the memory region and are output to the first pooling layer (Pool1).”)
SHIRAHATA does not teach:
training the neural network based on a difference between the first result value and the second result value
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Persson teaches:
training the neural network based on a difference between the first result value and the second result value.
(Persson, “[0007] According to a second aspect there is provided a method performed by a processing element for adapting a trained neural network, the method comprising: inputting input data to the trained neural network and applying a plurality of filters of the neural network to the input data to generate a plurality of channels of activation data; calculating differences between corresponding activation values in the plurality of channels of activation data; determining an order of the plurality of channels based on the calculated differences; and adapting the neural network so that it will generate channels of activation data in the order determined in the determining step.”)
Persson and SHIRAHATA are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Persson with teachings of SHIRAHATA to enable compression of activation data, reducing memory and bandwidth usage while maintaining efficient learning and inference (Persson, Abstract).
SHIRAHATA in view of Persson do not teach:
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Kang teaches:
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(Kang, “[0019] A neural network generally contains multiple neurons, and connections between those neurons. A neuron generally is a part of a neural network computer system that determines an output based on one or more inputs (that can be weighted), and the neuron can determine this output based on determining the output of an activation function with the possibly-weighted inputs. Examples of activation functions include a rectifier/rectified linear unit (ReLU) activation function, which produces an output that ranges between 0 and infinity, inclusive; tan h, which produces an output that ranges between −1 and 1 [wherein the peak value of the second activation function is fixed to 1], inclusive; and sigmoid, which produces an output that ranges between 0 and 1 [and a dynamic range of the second activation function is from a value of 0 to a value of 1], inclusive. While several of the non-limiting examples described herein concern a ReLU activation function, it can be appreciated that these techniques can be applied to other activation functions.”)
Kang, SHIRAHATA and Persson are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Kang with teachings of SHIRAHATA and Persson for improving the efficiency of neural network computation by predicting an output condition based on an intermediate calculation and selectively terminating further processing (Kang, Abstract).
Regarding claim 12, SHIRAHATA in view of Persson and Kang teach the method of claim 10.
SHIRAHATA further teaches: wherein a total number of the additional nodes is one less than a total number of the intermediate nodes, and the additional nodes and the intermediate nodes are fully connected.
(SHIRAHATA, “[0030] The neural network is a layered neural network having a layered structure and may include multiple intermediate layers between an input layer and an output layer. The multiple intermediate layers [wherein a total number of the additional nodes is one less than a total number of the intermediate nodes] (i.e.: each layer has nodes (neurons), so intermediate nodes can be interpreted as the neurons within these intermediate layers) include, for example, convolutional layers, activation function layers, pooling layers, a fully-connected layer [and the additional nodes and the intermediate nodes are fully connected], and a softmax layer. The number of layers and the positions of the layers are not limited to those exemplified in FIG. 1 and may be changed based on requested architecture. Specifically, the layered structure of the neural network and the configuration of the layers may be defined by a designer based on a target to be identified.”)
Regarding claim 15, SHIRAHATA in view of Persson and Kang teach the method of claim 10.
SHIRAHATA further teaches: wherein the first activation function comprises any one or any combination of any two or more of a step function, a sigmoid function, a hyperbolic tangent function, a rectified linear unit (ReLU) function, and a leaky ReLU function.
(SHIRAHATA, “[0059] The example illustrated in FIG. 3 indicates data and the order of processes in the case where the learning of the convolutional neural network as the neural network is executed. The neural network has a layered structure in which layers are arranged in order. The neural network includes an input layer (Input), a first convolutional layer (Conv1), a first activation function layer (ReLU1) [wherein the first activation function comprises any one or any combination of any two or more of a step function, a sigmoid function, a hyperbolic tangent function, a rectified linear unit (ReLU) function, and a leaky ReLU function], a second convolutional layer (Conv2), a second activation function layer (ReLU2), a first pooling layer (Pool1), a first fully-connected layer (Fully-conn1), and a third activation function layer (ReLU3) in this order. The neural network further includes a second fully-connected layer (Fully-conn2), a softmax layer (Softmax), and an output layer (Output) in this order. FIG. 3 exemplifies the case where the intermediate layers that execute the in-place process are the activation function layers (ReLU1, ReLU2, and ReLU3).”)
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA in view of Persson, Kang and in further view of Narayanan and Diamant.
SHIRAHATA in view of Persson and Kang teach the method of claim 10.
SHIRAHATA further teaches: wherein the second activation function is determined by a first hyperparameter
(SHIRAHATA, “[0125] As illustrated in FIGS. 8A, 8B and 8C, the whole controller 50 reads the definition information 41 and the parameter information 42 (in S1). The whole controller 50 identifies hyperparameters (learning rate, momentum, batch size, maximum number of iterations, and the like) [wherein the second activation function is determined by a first hyperparameter] based on the definition information 41 and the parameter information 42 (in S2) and acquires the number max_iter of repeated executions of the learning. Then, the whole controller 50 identifies the configuration of the neural network based on the definition information 41 and the parameter.”)
SHIRAHATA in view of Persson and Kang do not teach:
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function to fix a peak value of the second activation function.
Narayanan teaches:
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function
(Narayanan, col.14 line [32 – 50], “FIG. 9 depicts a flowchart of an example for determining LSBs by the per-neuron circuits according to one or more embodiments of the present invention. Here, the method 900 includes receiving the input data from the shared circuit 410, at block 902. The input data includes at least the MSBs 464 and the voltage interval 525. In one or more embodiments of the present invention, the input also includes a slope-identifier bit 720 that identifies a slope of the activation function 510 in this interval. In the case of an activation function 510 [of which a multiplier of the second activation function] that is monotonic, the slope-identifier bit 720 is set to a default value, e.g., 0, that is indicative of an ascending slope (or descending slope) [is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function]. In the case of non-monotonic functions, the slope-identifier bit 720 varies between 0 and 1 to indicate the ascending slope and the descending slope respectively. It is understood that in one or more embodiments of the present invention, 0 and 1 can swap roles, relative to their roles in the examples here. The slope-identifier bit 720 is determined by the controller 412 based on the LUT 416. ”)
Narayanan, SHIRAHATA, Persson and Kang are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Narayanan with teachings of SHIRAHATA, Persson and Kang to enable per-neuron circuits to output activation values with reduced computational overload, to improve memory management and error determination (Narayanan, Abstract).
SHIRAHATA in view of Persson, Kang and Narayanan do not teach:
to fix a peak value of the second activation function.
Diamant teaches:
to fix a peak value of the second activation function.
(Diamant, col.17 Line [44 – 67], “As another example, function table 406 can also be programmed to implement non-uniform quantization, where the step size between adjacent input boundary values is different from different input subranges. The distribution of the input boundary values can be determined based on, for example, a degree of linearity as well as a degree of change of the activation function for a particular input subrange. A degree of linearity can reflect whether the slope of the activation function is a constant or is changing within that input subrange. A high degree of linearity means the slope of the activation function remains constant [to fix a peak value of the second activation function], whereas a low degree of linearity means the slope of the activation function changes. Referring to FIG. 4E, to improve the accuracy of extrapolation based on slope and/or Taylor series coefficients, input boundary values can be more sparsely distributed for input subranges where the activation function is relatively linear (e.g., input subrange 474) and where the activation function experiences very small change with respect to input (e.g., input subranges 476 and 478). On the other hand, for input subrange 480, the activation function is relatively non-linear and the input boundary values can be more densely distributed within input subrange 480 to improve the accuracy of extrapolation and the resultant activation processing result.”)
Diamant, SHIRAHATA, Persson, Kang and Narayanan are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Diamant with teachings of SHIRAHATA, Persson, Kang and Narayanan by selecting base values and parameters and applying arithmetic circuits for interpolation, the system can efficiently compute estimated outputs of neural network functions, further reducing computational cost (Diamant, Abstract).
Claim(s) 16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA in view of Vinayachandran and Kang.
Regarding claim 16, SHIRAHATA teaches: A processor-implemented method with a neural network, the method comprising:
(SHIRAHATA, “[0004] According to an aspect of the invention, an information processing apparatus includes a memory and a processor [A processor-implemented method with a neural network, the method comprising] coupled to the memory and configured to set a first memory region in the memory as a region to be used for input to a first intermediate layer of a layered neural network and for output from the first intermediate layer, set a second memory region in the memory as a buffer region for the first intermediate layer, execute a recognition process of storing, in the second memory region, characteristic data corresponding to a characteristic of an input neuron data item to the first intermediate layer, and execute a learning process of determining an error of the first intermediate layer using the characteristic data stored in the second memory region.”)
generating a first feature vector by propagating training data input to an input layer of the neural network to first nodes that are included in a first intermediate layer adjacent to the input layer among intermediate layers of the neural network and that operate according to a first activation function;
(SHIRAHATA, “[0062] For example, as indicated by the number “1”, the convolution operation is executed by the first convolutional layer (Conv1) on neuron data items received from the input layer (Input), a parameter is applied to the results of the operation, and the results of the application are output to the first activation function layer (ReLU1). [0063] As indicated by a number “2”, the in-place process is executed by the first activation function layer (ReLU1). Specifically, the input neuron data items are stored in a memory region secured for the first activation function layer (ReLU1), and the activation function is applied to the input neuron data items to calculate output neuron data items [generating a first feature vector by propagating training data input to an input layer of the neural network to first nodes that are included in a first intermediate layer adjacent to the input layer among intermediate layers of the neural network and that operate according to a first activation function] (i.e.: ReLU1 output is the first intermediate vector). The output neuron data items are written over the input neuron data items stored in the memory region and are output to the second convolutional layer (Conv2).”)
generating a second feature vector by propagating the first feature vector to second nodes that are included in a second intermediate layer adjacent to an output layer among the intermediate layers of the primary trained neural network; and
(SHIRAHATA, “[0065] As indicated by a number “4”, the in-place process is executed by the second activation function layer (ReLU2). Specifically, the input neuron data items are stored in a memory region secured for the second activation function layer (ReLU2) [by propagating the first feature vector to second nodes], the activation function is applied to the input neuron data items to calculate output neuron data items. [generating a second feature vector] The output neuron data items are written over the input neuron data item stored in the memory region and are output to the first pooling layer (Pool1) [that are included in a second intermediate layer adjacent to an output layer among the intermediate layers of the primary trained neural network]”)
SHIRAHATA does not teach:
performing primary training on the neural network based on a difference between the first feature vector and a ground truth vector corresponding to the training data;
performing secondary training on the primary trained neural network based on a difference between an output value output through the output layer from the second feature vector and a ground truth value corresponding to the training data
Vinayachandran teaches:
performing primary training on the neural network based on a difference between the first feature vector and a ground truth vector corresponding to the training data;
(Vinayachandran, “[0006] The present disclosure provides a calibration apparatus that comprises at least one memory that is configured to store instructions and at least one processor. The processor is configured to execute the instructions to: train a first machine learning-based model [performing primary training on the neural network] with a first training dataset to determine configuration parameters of an intermediate pre-distortion compensator in an optical communication system that includes a transmitter, a receiver, and an optical communication channel, the transmitter including a pre-distortion compensator, the intermediate pre-distortion compensator, and a Mach Zehnder Modulator (MZM) compensator, the first training dataset including multiple pairs of: a first input data [based on a difference between the first feature vector] that represents a transmission symbol sequence representing a message to be sent; and a first ground-truth data [and a ground truth vector corresponding to the training data] that represents an inverse signal of distortion that is generated based on an output from an electrical path in the transmitter,”)
performing secondary training on the primary trained neural network based on a difference between an output value output through the output layer from the second feature vector and a ground truth value corresponding to the training data.
(Vinayachandran, “[0006] The present disclosure provides a calibration apparatus that comprises at least one memory that is configured to store instructions and at least one processor. The processor is configured to execute the instructions to: train a first machine learning-based model with a first training dataset to determine configuration parameters of an intermediate pre-distortion compensator in an optical communication system that includes a transmitter, a receiver, and an optical communication channel, the transmitter including a pre-distortion compensator, the intermediate pre-distortion compensator, and a Mach Zehnder Modulator (MZM) compensator, the first training dataset including multiple pairs of: a first input data that represents a transmission symbol sequence representing a message to be sent; and a first ground-truth data that represents an inverse signal of distortion that is generated based on an output from an electrical path in the transmitter, the output from the electrical path being acquired by feeding the transmission symbol sequence to the transmitter in which the intermediate pre-distortion compensator and the MZM compensator are enabled while the pre-distortion compensator is disabled; train a second machine learning-based model [performing secondary training on the primary trained neural network] with a second training dataset to determine configuration parameters of the post-distortion compensator included in the receiver, the second training dataset including multiple pairs of: a second input data that represents an output from an optical front end included in the receiver [based on a difference between an output value output through the output layer from the second feature vector]; and a second ground-truth data that represents the transmission symbol sequence [and a ground truth value corresponding to the training data], the output from the optical front end being acquired by feeding the transmission symbol sequence to the optical communication system in which the intermediate pre-distortion compensator, the MZM compensator, and the post-distortion compensator are enabled while the pre-distortion compensator is disabled and in which the configuration parameters determined using the first machine learning-based model are applied to the intermediate pre-distortion compensator; and train a third machine learning-based model with a third training dataset to determine configuration parameters of the pre-distortion compensator.”)
Vinayachandran and SHIRAHATA are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Vinayachandran with teachings of SHIRAHATA to dynamically adjust configuration parameters to improve accuracy and robustness of neural network operations under varying transmission conditions. (Vinayachandran, Abstract).
SHIRAHATA in view of Vinayachandran do not teach:
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Kang teaches:
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(Kang, “[0019] A neural network generally contains multiple neurons, and connections between those neurons. A neuron generally is a part of a neural network computer system that determines an output based on one or more inputs (that can be weighted), and the neuron can determine this output based on determining the output of an activation function with the possibly-weighted inputs. Examples of activation functions include a rectifier/rectified linear unit (ReLU) activation function, which produces an output that ranges between 0 and infinity, inclusive; tan h, which produces an output that ranges between −1 and 1 [wherein the peak value of the second activation function is fixed to 1], inclusive; and sigmoid, which produces an output that ranges between 0 and 1 [and a dynamic range of the second activation function is from a value of 0 to a value of 1], inclusive. While several of the non-limiting examples described herein concern a ReLU activation function, it can be appreciated that these techniques can be applied to other activation functions.”)
Kang, SHIRAHATA and Vinayachandran are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Kang with teachings of SHIRAHATA and Vinayachandran for improving the efficiency of neural network computation by predicting an output condition based on an intermediate calculation and selectively terminating further processing (Kang, Abstract).
Regarding claim 20, SHIRAHATA in view of Vinayachandran and Kang teach the method of claim 16.
SHIRAHATA further teaches: wherein the first activation function comprises any one or any combination of any two or more of a step function, a sigmoid function, a hyperbolic tangent function, a rectified linear unit (ReLU) function, and a leaky ReLU function.
(SHIRAHATA, “[0059] The example illustrated in FIG. 3 indicates data and the order of processes in the case where the learning of the convolutional neural network as the neural network is executed. The neural network has a layered structure in which layers are arranged in order. The neural network includes an input layer (Input), a first convolutional layer (Conv1), a first activation function layer (ReLU1) [wherein the first activation function comprises any one or any combination of any two or more of a step function, a sigmoid function, a hyperbolic tangent function, a rectified linear unit (ReLU) function, and a leaky ReLU function], a second convolutional layer (Conv2), a second activation function layer (ReLU2), a first pooling layer (Pool1), a first fully-connected layer (Fully-conn1), and a third activation function layer (ReLU3) in this order. The neural network further includes a second fully-connected layer (Fully-conn2), a softmax layer (Softmax), and an output layer (Output) in this order. FIG. 3 exemplifies the case where the intermediate layers that execute the in-place process are the activation function layers (ReLU1, ReLU2, and ReLU3).”)
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA in view of Vinayachandran, Kang and in further view of Narayanan and Diamant.
SHIRAHATA in view of Vinayachandran and Kang teach the method of claim 16.
SHIRAHATA further teaches: wherein the second activation function is determined by a first hyperparameter
(SHIRAHATA, “[0125] As illustrated in FIGS. 8A, 8B and 8C, the whole controller 50 reads the definition information 41 and the parameter information 42 (in S1). The whole controller 50 identifies hyperparameters (learning rate, momentum, batch size, maximum number of iterations, and the like) [wherein the second activation function is determined by a first hyperparameter] based on the definition information 41 and the parameter information 42 (in S2) and acquires the number max_iter of repeated executions of the learning. Then, the whole controller 50 identifies the configuration of the neural network based on the definition information 41 and the parameter.”)
SHIRAHATA in view of Vinayachandran, Kang do not teach:
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function to fix the peak value of the second activation function.
Narayanan teaches:
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function
(Narayanan, col.14 line [32 – 50], “FIG. 9 depicts a flowchart of an example for determining LSBs by the per-neuron circuits according to one or more embodiments of the present invention. Here, the method 900 includes receiving the input data from the shared circuit 410, at block 902. The input data includes at least the MSBs 464 and the voltage interval 525. In one or more embodiments of the present invention, the input also includes a slope-identifier bit 720 that identifies a slope of the activation function 510 in this interval. In the case of an activation function 510 [of which a multiplier of the second activation function] that is monotonic, the slope-identifier bit 720 is set to a default value, e.g., 0, that is indicative of an ascending slope (or descending slope) [is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function]. In the case of non-monotonic functions, the slope-identifier bit 720 varies between 0 and 1 to indicate the ascending slope and the descending slope respectively. It is understood that in one or more embodiments of the present invention, 0 and 1 can swap roles, relative to their roles in the examples here. The slope-identifier bit 720 is determined by the controller 412 based on the LUT 416. ”)
Narayanan, SHIRAHATA, Vinayachandran and Kang are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Narayanan with teachings of SHIRAHATA, Vinayachandran and Kang to enable per-neuron circuits to output activation values with reduced computational overload, to improve memory management and error determination (Narayanan, Abstract).
SHIRAHATA in view of Vinayachandran, Kang and Narayanan do not teach:
to fix the peak value of the second activation function
Diamant teaches:
to fix the peak value of the second activation function.
(Diamant, col.17 Line [44 – 67], “As another example, function table 406 can also be programmed to implement non-uniform quantization, where the step size between adjacent input boundary values is different from different input subranges. The distribution of the input boundary values can be determined based on, for example, a degree of linearity as well as a degree of change of the activation function for a particular input subrange. A degree of linearity can reflect whether the slope of the activation function is a constant or is changing within that input subrange. A high degree of linearity means the slope of the activation function remains constant [to fix the peak value of the second activation function], whereas a low degree of linearity means the slope of the activation function changes. Referring to FIG. 4E, to improve the accuracy of extrapolation based on slope and/or Taylor series coefficients, input boundary values can be more sparsely distributed for input subranges where the activation function is relatively linear (e.g., input subrange 474) and where the activation function experiences very small change with respect to input (e.g., input subranges 476 and 478). On the other hand, for input subrange 480, the activation function is relatively non-linear and the input boundary values can be more densely distributed within input subrange 480 to improve the accuracy of extrapolation and the resultant activation processing result.”)
Diamant, SHIRAHATA, Vinayachandran, Kang and Narayanan are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Diamant with teachings of SHIRAHATA, Vinayachandran, Kang and Narayanan by selecting base values and parameters and applying arithmetic circuits for interpolation, the system can efficiently compute estimated outputs of neural network functions, further reducing computational cost (Diamant, Abstract).
Claim(s) 21 and 26 – 27 are rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA in view of Joshi, Narayanan, Diamant, Wiltshire, Pub. No.: US20210192293A1, and Kang.
Regarding claim 21, SHIRAHATA teaches: A processor-implemented method with a neural network, the method comprising:
(SHIRAHATA, “[0004] According to an aspect of the invention, an information processing apparatus includes a memory and a processor [A processor-implemented method with a neural network, the method comprising] coupled to the memory and configured to set a first memory region in the memory as a region to be used for input to a first intermediate layer of a layered neural network and for output from the first intermediate layer, set a second memory region in the memory as a buffer region for the first intermediate layer, execute a recognition process of storing, in the second memory region, characteristic data corresponding to a characteristic of an input neuron data item to the first intermediate layer, and execute a learning process of determining an error of the first intermediate layer using the characteristic data stored in the second memory region.”)
extracting one or more first feature vectors from a plurality of intermediate layers of the neural network that
(SHIRAHATA, “[0062] For example, as indicated by the number “1”, the convolution operation is executed by the first convolutional layer (Conv1) on neuron data items received from the input layer (Input), a parameter is applied to the results of the operation, and the results of the application are output to the first activation function layer (ReLU1). [0063] As indicated by a number “2”, the in-place process is executed by the first activation function layer (ReLU1). Specifically, the input neuron data items are stored in a memory region secured for the first activation function layer (ReLU1) [from a plurality of intermediate layers of the neural network that], and the activation function is applied to the input neuron data items to calculate output neuron data items [extracting one or more first feature vectors]. The output neuron data items are written over the input neuron data items stored in the memory region and are output to the second convolutional layer (Conv2).”)
SHIRAHATA does not teach:
that detects whether biometric information is spoofed from input data comprising the biometric information of a user, using one or more pre-trained first classifiers;
detecting a first spoofing detection result of the biometric information by determining a first score based on the one or more first feature vectors;
determining, in response to the first spoofing detection result being detected, a second score by applying, to a pre-trained second classifier, an output vector output from an output layer of the neural network; and
detecting a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined,
wherein either one or both of the first classifiers and the second classifier is trained
by an activation function that is determined by a first hyperparameter of which a multiplier of the activation function is associated with an ascending slope of the activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the activation function to fix a peak value of the activation function for the neural network.
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Joshi teaches:
that detects whether biometric information is spoofed from input data comprising the biometric information of a user, using one or more pre-trained first classifiers;
(Joshi, (col. 9 line [58 – 67] – col.10 line [1 – 17]), “More sophisticated search strategies are also within the scope of this disclosure, e.g., using random configuration selection (e.g., selecting a random positional parameter for each image capture), adaptive grid spacing to fine-tune ranges of a failure space, and/or other methods. In some implementations, the search includes a greedy search in which parameters are adjusted to find locally more effective spoof parameter combinations. In some implementations, the search includes a gradient search in which one or more of the spoof parameters is adjusted slightly, spoof detection [detects whether biometric information is spoofed from input data comprising the biometric information of a user, using one or more pre-trained first classifiers] results are obtained (e.g., corresponding numerical scores indicative of spoofing), and the spoof detection results are used to calculate a gradient vector of the spoof parameters;”)
detecting a first spoofing detection result of the biometric information by determining a first score based on the one or more first feature vectors;
(Joshi, (col. 9 line [58 – 67] – col.10 line [1 – 17]), “More sophisticated search strategies are also within the scope of this disclosure, e.g., using random configuration selection (e.g., selecting a random positional parameter for each image capture), adaptive grid spacing to fine-tune ranges of a failure space, and/or other methods. In some implementations, the search includes a greedy search in which parameters are adjusted to find locally more effective spoof parameter combinations. In some implementations, the search includes a gradient search in which one or more of the spoof parameters is adjusted slightly, spoof detection results are obtained (e.g., corresponding numerical scores indicative of spoofing), and the spoof detection results are used to calculate a gradient vector of the spoof parameters; spoof parameters can then be adjusted in the opposite direction of the gradient vector (e.g., in a gradient descent optimization process), with the process iterated repeatedly to converge on spoof parameters that generate a failure condition, e.g., a spoof accept event. In some implementations, the search includes a genetic algorithm approach in which numerical scores are calculated for multiple random initial sets of spoof parameters [detecting a first spoofing detection result of the biometric information by determining a first score based on the one or more first feature vectors;], offspring sets of spoof parameters are created by applying mutation and/or recombination operators to the initial and following sets of spoof parameters, and the process iterated repeatedly until spoof parameters that generate a failure condition (e.g., a spoof accept event) are found.”)
determining, in response to the first spoofing detection result being detected, a second score by applying, to a pre-trained second classifier, an output vector output from an output layer of the neural network; and
(Joshi, (col. 9 line [58 – 67] – col.10 line [1 – 17]), “More sophisticated search strategies are also within the scope of this disclosure, e.g., using random configuration selection (e.g., selecting a random positional parameter for each image capture), adaptive grid spacing to fine-tune ranges of a failure space, and/or other methods. In some implementations, the search includes a greedy search in which parameters are adjusted to find locally more effective spoof parameter combinations. In some implementations, the search includes a gradient search in which one or more of the spoof parameters is adjusted slightly, spoof detection results are obtained (e.g., corresponding numerical scores indicative of spoofing), and the spoof detection results are used to calculate a gradient vector of the spoof parameters; spoof parameters can then be adjusted in the opposite direction of the gradient vector (e.g., in a gradient descent optimization process), with the process iterated repeatedly to converge on spoof parameters that generate a failure condition, e.g., a spoof accept event. In some implementations, the search includes a genetic algorithm approach in which numerical scores are calculated for multiple random initial sets of spoof parameters [determining, in response to the first spoofing detection result being detected, a second score by applying, to a pre-trained second classifier, an output vector output from an output layer of the neural network], offspring sets of spoof parameters are created by applying mutation and/or recombination operators to the initial and following sets of spoof parameters, and the process iterated repeatedly until spoof parameters that generate a failure condition (e.g., a spoof accept event) are found.”)
detecting a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined,
(Joshi, col. 9 line [12 – 21], “In some implementations, the numerical score is fused with one or more other scores, e.g., as a weighted combination, to determine an overall biometric authentication result. For example, a first numerical score may indicate a spoofing likelihood, a second numerical score may indicate a biometric matching likelihood (e.g., likelihood that a facial image matches a reference facial image for a user), and the two scores can be combined in a weighted combination [detecting a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined] that indicates overall biometric authentication success or failure in reference to a threshold value.”)
Joshi and SHIRAHATA are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Joshi with teachings of SHIRAHATA to dynamically adjust configuration parameters by generating varied spoof representations, testing them against biometric authentication, and identifying failure conditions, to enhance robustness against adversarial inputs or fraudulent data affecting neural network training and inference. (Joshi, Abstract).
SHIRAHATA in view of Joshi do not teach:
wherein either one or both of the first classifiers and the second classifier is trained
by an activation function that is determined by a first hyperparameter of which a multiplier of the activation function is associated with an ascending slope of the activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the activation function to fix a peak value of the second activation function
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Wiltshire teaches:
wherein either one or both of the first classifiers and the second classifier is trained
(Wiltshire, “[0010] According to further aspects, disclosed methods further include clustering the first classifier and the second classifier into a group of classifiers, wherein the first classifier [wherein either one or both of the first classifiers] and the second classifier are trained [and the second classifier is trained] to identify objects belonging to an object category.”)
Wiltshire, SHIRAHATA and Joshi are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Wiltshire with teachings of SHIRAHATA and Joshi to add selecting appropriate models based on input data characteristics to enable more accurate and efficient processing across different types of data. (Wiltshire, Abstract).
SHIRAHATA in view of Joshi and Wiltshire do not teach:
by an activation function that is determined by a first hyperparameter of which a multiplier of the activation function is associated with an ascending slope of the activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the activation function to fix a peak value of the second activation function.
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Narayanan teaches:
by an activation function that is determined by a first hyperparameter of which a multiplier of the activation function is associated with an ascending slope of the activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the activation function
(Narayanan, col.14 line [32 – 50], “FIG. 9 depicts a flowchart of an example for determining LSBs by the per-neuron circuits according to one or more embodiments of the present invention. Here, the method 900 includes receiving the input data from the shared circuit 410, at block 902. The input data includes at least the MSBs 464 and the voltage interval 525. In one or more embodiments of the present invention, the input also includes a slope-identifier bit 720 that identifies a slope of the activation function 510 in this interval. In the case of an activation function 510 [by an activation function that is determined by a first hyperparameter of which a multiplier of the activation function] that is monotonic, the slope-identifier bit 720 is set to a default value, e.g., 0, that is indicative of an ascending slope (or descending slope) [is associated with an ascending slope of the activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the activation function]. In the case of non-monotonic functions, the slope-identifier bit 720 varies between 0 and 1 to indicate the ascending slope and the descending slope respectively. It is understood that in one or more embodiments of the present invention, 0 and 1 can swap roles, relative to their roles in the examples here. The slope-identifier bit 720 is determined by the controller 412 based on the LUT 416. ”)
Narayanan, SHIRAHATA, Joshi and Wiltshire are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Narayanan with teachings of SHIRAHATA, Joshi and Wiltshire to enable per-neuron circuits to output activation values with reduced computational overload, to improve memory management and error determination (Narayanan, Abstract).
SHIRAHATA in view of Joshi, Wiltshire and Narayanan do not teach:
to fix a peak value of the second activation function.
wherein the peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Diamant teaches:
to fix a peak value of the second activation function
(Diamant, col.17 Line [44 – 67], “As another example, function table 406 can also be programmed to implement non-uniform quantization, where the step size between adjacent input boundary values is different from different input subranges. The distribution of the input boundary values can be determined based on, for example, a degree of linearity as well as a degree of change of the activation function for a particular input subrange. A degree of linearity can reflect whether the slope of the activation function is a constant or is changing within that input subrange. A high degree of linearity means the slope of the activation function remains constant [to fix a peak value of the second activation function], whereas a low degree of linearity means the slope of the activation function changes. Referring to FIG. 4E, to improve the accuracy of extrapolation based on slope and/or Taylor series coefficients, input boundary values can be more sparsely distributed for input subranges where the activation function is relatively linear (e.g., input subrange 474) and where the activation function experiences very small change with respect to input (e.g., input subranges 476 and 478). On the other hand, for input subrange 480, the activation function is relatively non-linear and the input boundary values can be more densely distributed within input subrange 480 to improve the accuracy of extrapolation and the resultant activation processing result.”)
Diamant, SHIRAHATA, Joshi, Wiltshire and Narayanan are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Diamant with teachings of SHIRAHATA, Joshi, Wiltshire and Narayanan by selecting base values and parameters and applying arithmetic circuits for interpolation, the system can efficiently compute estimated outputs of neural network functions, further reducing computational cost. (Diamant, Abstract).
SHIRAHATA in view of Joshi, Wiltshire, Narayanan and Diamant do not teach:
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Kang teaches:
wherein a peak value of the second activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(Kang, “[0019] A neural network generally contains multiple neurons, and connections between those neurons. A neuron generally is a part of a neural network computer system that determines an output based on one or more inputs (that can be weighted), and the neuron can determine this output based on determining the output of an activation function with the possibly-weighted inputs. Examples of activation functions include a rectifier/rectified linear unit (ReLU) activation function, which produces an output that ranges between 0 and infinity, inclusive; tan h, which produces an output that ranges between −1 and 1 [wherein the peak value of the second activation function is fixed to 1], inclusive; and sigmoid, which produces an output that ranges between 0 and 1 [and a dynamic range of the second activation function is from a value of 0 to a value of 1], inclusive. While several of the non-limiting examples described herein concern a ReLU activation function, it can be appreciated that these techniques can be applied to other activation functions.”)
Kang, SHIRAHATA, Joshi, Wiltshire, Narayanan and Diamant are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Kang with teachings of SHIRAHATA, Joshi, Wiltshire, Narayanan and Diamant for improving the efficiency of neural network computation by predicting an output condition based on an intermediate calculation and selectively terminating further processing (Kang, Abstract).
Regarding claim 26, SHIRAHATA in view of Joshi, Wiltshire, Narayanan, Diamant and Kang teach the method of claim 21.
Joshi further teaches: wherein the biometric information comprises any one or any combination of any two or more of a fingerprint, an iris, and a face of the user.
(Joshi, col.6 line [26 – 44], “Biometric authentication systems [wherein the biometric information] can authenticate a user of a secure system based on recognizing the user's face, eye-print, iris, etc. [comprises any one or any combination of any two or more of a fingerprint, an iris, and a face of the user] Such biometric authentication systems involve capturing one or more images of the user and executing corresponding recognition processes on the captured image. Malicious attempts to breach the security of such biometric authentication systems can include presenting an alternative representation of a live person to gain access to an account or other privileges associated with the identity of the corresponding live person. The alternative representation may take the form of an image presented on a monitor, a printed image, a three-dimensional representation (e.g., a facial model or mask), or another object. Such attacks are generally known as spoof attacks, and the reliability/security of a biometric authentication system can be determined by the ability of the system to differentiate between a live person and corresponding alternative representations (also referred to as spoofs or spoof representations).”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Joshi with teachings of SHIRAHATA, Wiltshire, Narayanan, Diamant and Kang for the same reasons disclosed for claim 21.
Regarding claim 27, SHIRAHATA teaches: An electronic device with a neural network, the electronic device comprising: one or more processors configured to:
(SHIRAHATA, “[0004] According to an aspect of the invention, an information processing apparatus includes a memory and a processor [one or more processors configured to] coupled to the memory and configured to set a first memory region in the memory as a region to be used for input to a first intermediate layer of a layered neural network and for output from the first intermediate layer, set a second memory region in the memory as a buffer region for the first intermediate layer, execute a recognition process of storing, in the second memory region, characteristic data corresponding to a characteristic of an input neuron data item to the first intermediate layer, and execute a learning process of determining an error of the first intermediate layer using the characteristic data stored in the second memory region.”)
extract one or more first feature vectors from a plurality of intermediate layers of the neural network configured
(SHIRAHATA, “[0062] For example, as indicated by the number “1”, the convolution operation is executed by the first convolutional layer (Conv1) on neuron data items received from the input layer (Input), a parameter is applied to the results of the operation, and the results of the application are output to the first activation function layer (ReLU1). [0063] As indicated by a number “2”, the in-place process is executed by the first activation function layer (ReLU1). Specifically, the input neuron data items are stored in a memory region secured for the first activation function layer (ReLU1) [from a plurality of intermediate layers of the neural network configured], and the activation function is applied to the input neuron data items to calculate output neuron data items [extract one or more first feature vectors] (i.e.: ReLU1 output is the first intermediate vector). The output neuron data items are written over the input neuron data items stored in the memory region and are output to the second convolutional layer (Conv2).”)
SHIRAHATA does not teach:
a sensor configured to capture input data comprising biometric information of a user;
to detect whether biometric information is spoofed from the input data,
using one or more pre-trained first classifiers;
detect a first spoofing detection result of the biometric information by determining a first score based on the one or more first feature vectors;
determine, in response to the first spoofing detection result being detected, a second score by applying an output vector output from an output layer of the neural network to a pre-trained second classifier; and
detect a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined; and
an output device configured to output either one or both of the first spoofing detection result and the second spoofing detection result,
wherein either one or both of the first classifiers and the second classifier is trained based on an activation function
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function to fix a peak value of the second activation function.
wherein the peak value of the activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Joshi teaches:
a sensor configured to capture input data comprising biometric information of a user;
(Joshi, (col.3 Line [1 – 19]), “Another aspect of the present disclosure describes a non-transitory, computer-readable storage medium storing one or more instructions that, when executed by a computer system, cause the computer system to perform operations. The operations include obtaining multiple images of a spoof representation of a human user. The multiple images are captured automatically by an image capture device, and each of the multiple images is captured at a different relative position between the image capture device and the spoof representation. The operations also include: executing a biometric authentication process separately on each of at least a subset of the multiple images; determining that the biometric authentication process authenticates the human user based on at least a first image from the subset; [a sensor configured to capture input data comprising biometric information of a user] and identifying a relative position between the image capture device and the spoof representation corresponding to the first image as a failure condition associated with the biometric authentication process”)
to detect whether biometric information is spoofed from the input data,
(Joshi, (col.15 line [55 – 67] – col.16 line [1 – 4]), “The non-zero capture angle 1002 may arise due to inexact and/or variable positioning of the spoof capture device 1000 and/or display monitor 1004. Alternatively, or in addition, the non-zero capture angle 1002 may be intentional in order to avoid screen reflections that can be captured by the spoof capture device 1000 when the spoof capture device 1000 views the display monitor 1004 head-on (e.g., at a capture angle at or near 0 degrees, such as less than 5 degrees, less than 10 degrees, less than 20 degrees, or another angle). Some spoof detection systems are designed to detect these reflections and flag them as indicative of spoofing. As such, larger capture angles can be beneficial for preventing spoof representations from being identified by spoof detection systems as spoof representations [to detect whether biometric information is spoofed from the input data]. The non-zero capture angle 1002, in various implementations, may be greater than 10 degrees, greater than 20 degrees, greater than 30 degrees, greater than 40 degrees, or another angle.”)
detect a first spoofing detection result of the biometric information by determining a first score based on the one or more first feature vectors;
determine, in response to the first spoofing detection result being detected, a second score by applying an output vector output from an output layer of the neural network to a pre-trained second classifier; and
(Joshi, (col. 9 line [58 – 67] – col.10 line [1 – 17]), “More sophisticated search strategies are also within the scope of this disclosure, e.g., using random configuration selection (e.g., selecting a random positional parameter for each image capture), adaptive grid spacing to fine-tune ranges of a failure space, and/or other methods. In some implementations, the search includes a greedy search in which parameters are adjusted to find locally more effective spoof parameter combinations. In some implementations, the search includes a gradient search in which one or more of the spoof parameters is adjusted slightly, spoof detection [detect a first spoofing detection result of the biometric information] results are obtained (e.g., corresponding numerical scores [by determining a first score based on the one or more first feature vectors] indicative of spoofing), and the spoof detection results are used to calculate a gradient vector of the spoof parameters; spoof parameters can then be adjusted in the opposite direction of the gradient vector (e.g., in a gradient descent optimization process), with the process iterated repeatedly to converge on spoof parameters that generate a failure condition, e.g., a spoof accept event. In some implementations, the search includes a genetic algorithm approach in which numerical scores are calculated for multiple random initial sets of spoof parameters [determine, in response to the first spoofing detection result being detected, a second score by applying an output vector output from an output layer of the neural network to a pre-trained second classifier], offspring sets of spoof parameters are created by applying mutation and/or recombination operators to the initial and following sets of spoof parameters, and the process iterated repeatedly until spoof parameters that generate a failure condition (e.g., a spoof accept event) are found.”)
detect a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined; and
(Joshi, col.9 line [12 – 21], “In some implementations, the numerical score is fused with one or more other scores, e.g., as a weighted combination, to determine an overall biometric authentication result. For example, a first numerical score may indicate a spoofing likelihood, a second numerical score may indicate a biometric matching likelihood (e.g., likelihood that a facial image matches a reference facial image for a user), and the two scores can be combined in a weighted combination [detect a second spoofing detection result of the biometric information by a score in which the first score and the second score are combined] that indicates overall biometric authentication success or failure in reference to a threshold value.”)
an output device configured to output either one or both of the first spoofing detection result and the second spoofing detection result,
(Joshi, col.3 line [20 – 39], “Another aspect of the present disclosure describes another non-transitory, computer-readable storage medium storing one or more instructions that, when executed by a computer system, cause the computer system to perform operations. The operations include: causing a first display device to display a spoof representation of a human user [an output device configured to output either one or both of the first spoofing detection result and the second spoofing detection result]; and obtaining multiple images of the spoof representation. The multiple images are captured automatically by an image capture device, and each of the multiple images captures a different corresponding configuration of the spoof representation displayed on the first display device. The operations also include: executing a biometric authentication process separately on each of at least a subset of the multiple images; determining that the biometric authentication process authenticates the human user based on at least a first image from the subset; and identifying a configuration of the spoof representation corresponding to the first image as a failure condition associated with the biometric authentication process.”)
Joshi and SHIRAHATA are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Joshi with teachings of SHIRAHATA to dynamically adjust configuration parameters by generating varied spoof representations, testing them against biometric authentication, and identifying failure conditions, to enhance robustness against adversarial inputs or fraudulent data affecting neural network training and inference. (Joshi, Abstract).
SHIRAHATA in view of Joshi do not teach:
using one or more pre-trained first classifiers;
wherein either one or both of the first classifiers and the second classifier is trained based on an activation function
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function to fix a peak value of the second activation function.
wherein the peak value of the activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Wiltshire teaches:
using one or more pre-trained first classifiers;
(Wiltshire, “[0037] The server 106 can determine whether a pre-existing classifier is trained to identify at least one object in the image 102 based on a set of training metadata associated with the pre-existing classifier. In some embodiments, the training metadata can define a set of characteristics for which the pre-existing classifier is trained to identify the object [using one or more pre-trained first classifiers]. If the server 106 determines that the pre-existing classifier is trained to identify the object in the image 102, the server 106 identifies an object in the image 102 using a classifier, the server 106 can provide results to a laptop 108. In some embodiments, if the pre-existing classifier is not trained to identify the object in the image 102, the server 106 can send a “no object identified” error to the laptop 108.”)
wherein either one or both of the first classifiers and the second classifier is trained based on an activation function
(Wiltshire, “[0010] According to further aspects, disclosed methods further include clustering the first classifier and the second classifier into a group of classifiers [wherein either one or both of the first classifiers and the second classifier is trained based on an activation function], wherein the first classifier and the second classifier are trained to identify objects belonging to an object category.”)
Wiltshire, SHIRAHATA and Joshi are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Wiltshire with teachings of SHIRAHATA and Joshi to add selecting appropriate models based on input data characteristics to enable more accurate and efficient processing across different types of data. (Wiltshire, Abstract).
SHIRAHATA in view of Joshi and Wiltshire do not teach:
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function to fix a peak value of the second activation function.
wherein the peak value of the activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Narayanan teaches:
of which a multiplier of the second activation function is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function
(Narayanan, col.14 line [32 – 50], “FIG. 9 depicts a flowchart of an example for determining LSBs by the per-neuron circuits according to one or more embodiments of the present invention. Here, the method 900 includes receiving the input data from the shared circuit 410, at block 902. The input data includes at least the MSBs 464 and the voltage interval 525. In one or more embodiments of the present invention, the input also includes a slope-identifier bit 720 that identifies a slope of the activation function 510 in this interval. In the case of an activation function 510 [of which a multiplier of the second activation function] that is monotonic, the slope-identifier bit 720 is set to a default value, e.g., 0, that is indicative of an ascending slope (or descending slope) [is associated with an ascending slope of the second activation function and a second hyperparameter of which the multiplier is associated with a descending slope of the second activation function]. In the case of non-monotonic functions, the slope-identifier bit 720 varies between 0 and 1 to indicate the ascending slope and the descending slope respectively. It is understood that in one or more embodiments of the present invention, 0 and 1 can swap roles, relative to their roles in the examples here. The slope-identifier bit 720 is determined by the controller 412 based on the LUT 416. ”)
Narayanan, SHIRAHATA, Joshi and Wiltshire are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Narayanan with teachings of SHIRAHATA, Joshi and Wiltshire to enable per-neuron circuits to output activation values with reduced computational overload, to improve memory management and error determination (Narayanan, Abstract).
SHIRAHATA in view of Joshi, Wiltshire and Narayanan do not teach:
to fix a peak value of the second activation function.
wherein the peak value of the activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Diamant teaches:
to fix a peak value of the second activation function
(Diamant, col.17 Line [44 – 67], “As another example, function table 406 can also be programmed to implement non-uniform quantization, where the step size between adjacent input boundary values is different from different input subranges. The distribution of the input boundary values can be determined based on, for example, a degree of linearity as well as a degree of change of the activation function for a particular input subrange. A degree of linearity can reflect whether the slope of the activation function is a constant or is changing within that input subrange. A high degree of linearity means the slope of the activation function remains constant [to fix a peak value of the second activation function], whereas a low degree of linearity means the slope of the activation function changes. Referring to FIG. 4E, to improve the accuracy of extrapolation based on slope and/or Taylor series coefficients, input boundary values can be more sparsely distributed for input subranges where the activation function is relatively linear (e.g., input subrange 474) and where the activation function experiences very small change with respect to input (e.g., input subranges 476 and 478). On the other hand, for input subrange 480, the activation function is relatively non-linear and the input boundary values can be more densely distributed within input subrange 480 to improve the accuracy of extrapolation and the resultant activation processing result.”)
Diamant, SHIRAHATA, Joshi, Wiltshire and Narayanan are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Diamant with teachings of SHIRAHATA, Joshi, Wiltshire and Narayanan by selecting base values and parameters and applying arithmetic circuits for interpolation, the system can efficiently compute estimated outputs of neural network functions, further reducing computational cost. (Diamant, Abstract).
SHIRAHATA in view of Joshi, Wiltshire, Narayanan and Diamant do not teach:
wherein the peak value of the activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
Kang teaches:
wherein the peak value of the activation function is fixed to 1 and a dynamic range of the second activation function is from a value of 0 to a value of 1.
(Kang, “[0019] A neural network generally contains multiple neurons, and connections between those neurons. A neuron generally is a part of a neural network computer system that determines an output based on one or more inputs (that can be weighted), and the neuron can determine this output based on determining the output of an activation function with the possibly-weighted inputs. Examples of activation functions include a rectifier/rectified linear unit (ReLU) activation function, which produces an output that ranges between 0 and infinity, inclusive; tan h, which produces an output that ranges between −1 and 1 [wherein the peak value of the second activation function is fixed to 1], inclusive; and sigmoid, which produces an output that ranges between 0 and 1 [and a dynamic range of the second activation function is from a value of 0 to a value of 1], inclusive. While several of the non-limiting examples described herein concern a ReLU activation function, it can be appreciated that these techniques can be applied to other activation functions.”)
Kang, SHIRAHATA, Joshi, Wiltshire, Narayanan and Diamant are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Kang with teachings of SHIRAHATA, Joshi, Wiltshire, Narayanan and Diamant for improving the efficiency of neural network computation by predicting an output condition based on an intermediate calculation and selectively terminating further processing (Kang, Abstract).
Claim 24 is rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA in view of Joshi, Narayanan, Diamant, Wiltshire, Kang and in further view of Liu et al., Pub. No.: US20190138922A1.
SHIRAHATA in view of Joshi, Narayanan, Diamant, Wiltshire and Kang teach the method of claim 21.
SHIRAHATA further teaches: wherein the extracting of the one or more first feature vectors comprises: extracting a feature vector from a first intermediate layer among the intermediate layers
(SHIRAHATA, “[0062] For example, as indicated by the number “1”, the convolution operation is executed by the first convolutional layer (Conv1) on neuron data items received from the input layer (Input), a parameter is applied to the results of the operation, and the results of the application are output to the first activation function layer (ReLU1). [0063] As indicated by a number “2”, the in-place process is executed by the first activation function layer (ReLU1). Specifically, the input neuron data items are stored in a memory region secured for the first activation function layer (ReLU1) [from a first intermediate layer among the intermediate layers], and the activation function is applied to the input neuron data items to calculate output neuron data items [extracting a feature vector]. The output neuron data items are written over the input neuron data items stored in the memory region and are output to the second convolutional layer (Conv2).”)
extracting another feature vector from a second intermediate layer following the first intermediate layer
(SHIRAHATA, “[0065] As indicated by a number “4”, the in-place process is executed by the second activation function layer (ReLU2). Specifically, the input neuron data items are stored in a memory region secured for the second activation function layer (ReLU2) [from a second intermediate layer following the first intermediate layer] (i.e.: the ReLU2 output is the second intermediate vector), the activation function is applied to the input neuron data items to calculate output neuron data items [extracting another feature vector]. The output neuron data items are written over the input neuron data item stored in the memory region and are output to the first pooling layer (Pool1).”)
Wiltshire further teaches: using a classifier among the first classifiers;
(Wiltshire, “[0010] According to further aspects, disclosed methods further include clustering the first classifier [using a classifier among the first classifiers] and the second classifier into a group of classifiers, wherein the first classifier and the second classifier are trained to identify objects belonging to an object category.”)
using another classifier among the first classifiers; and
(Wiltshire, “[0010] According to further aspects, disclosed methods further include clustering the first classifier and the second classifier [using another classifier among the first classifiers;] into a group of classifiers, wherein the first classifier and the second classifier are trained to identify objects belonging to an object category.”)
It would have been obvious to one of ordinary skill in the art before the effective filling date of the present application to combine the teachings of Wiltshire with teachings of SHIRAHATA, Joshi, Narayanan and Diamant for the same reasons disclosed for claim 21.
SHIRAHATA in view of Joshi, Narayanan, Diamant, Wiltshire and Kang do not teach:
extracting a combined feature vector in which the feature vector and the other feature vector are combined.
Liu teaches:
extracting a combined feature vector in which the feature vector and the other feature vector are combined.
(Liu, “[0083] The intermediate result vector may be further transmitted to the master computation module 112. Multiple intermediate result vectors generated based on the segments of the input vector may be further combined by the master computation module 112 to generate a merged intermediate vector [extracting a combined feature vector in which the feature vector and the other feature vector are combined]. For example, the master computation module 112 may be configured to perform a vector addition on the received intermediate result vectors to generate the merged intermediate vector. The master computation module 112 may be configured to perform a bias operation by adding a bias value to the merged intermediate vector and to apply an activation function to the biased merged intermediate vector to generate the output vector.”)
Liu, SHIRAHATA, Joshi, Narayanan, Diamant, Wiltshire and Kang are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Liu with teachings of SHIRAHATA, Joshi, Narayanan, Diamant, Wiltshire and Kang to allow different modules to process data according to its type to improve flexibility and efficiency. (Liu, Abstract).
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over SHIRAHATA in view of Joshi, Narayanan, Diamant, Wiltshire, Kang, Liu and in further view of Lakhdharet al., Pub. No.: US20200322377A1.
SHIRAHATA in view of Joshi, Narayanan, Diamant, Wiltshire, Kang and Liu teach the method of claim 24.
SHIRAHATA in view of Joshi, Narayanan, Diamant, Wiltshire, Kang and Liu do not teach:
wherein the detecting of the first spoofing detection result of the biometric information comprises: determining the first score based on a similarity between the combined feature vector and either one or both of a registered feature vector and a spoofed feature vector that is provided in advance; and classifying the first score into a score determined to be spoofed information or a score determined to be ground truth information, using the first classifiers.
Lakhdharet teaches:
wherein the detecting of the first spoofing detection result of the biometric information comprises: determining the first score based on a similarity between the combined feature vector and either one or both of a registered feature vector and a spoofed feature vector that is provided in advance; and classifying the first score into a score determined to be spoofed information or a score determined to be ground truth information, using the first classifiers
(Lakhdharet “[0052] FIG. 2A shows a block diagram of the first spoof detection subsystem 200 that includes a per-channel energy normalization (PCEN) frontend 202, a cony-net feature extraction block 204, and a classifier 206. In operation, the first spoof detection subsystem 200 receives one or more audio signals as an input, which may be in any number of machine-readable data formats (e.g., .wav, .mp3). The PCEN frontend 202 processes an audio signal input and generates a per-channel energy transformed audio signal, as illustrated in FIG. 2B, for example. The per-channel energy transformed audio signal is processed by the cony-net feature extraction block 204, which determines a prediction score based on a set of parameters of a convolution neural network (CNN) [determining the first score based on a similarity between the combined feature vector and either one or both of a registered feature vector and a spoofed feature vector that is provided in advance] executed by the cony-net feature extraction block 204. The classifier 206 can compare the prediction score to a threshold value to determine whether the audio signal is a spoofed audio signal or a genuine audio signal. [classifying the first score into a score determined to be spoofed information or a score determined to be ground truth information, using the first classifiers] The first spoof detection subsystem 200 is an end-to-end system, in that the first subsystem 200 determines the values of various parameters of the PCEN frontend 202 and the coy-net feature extraction block 204 based on input data. Such input data can include a plurality of audio signals that can be used to train and optimize the first subsystem 200.”)
Lakhdharet, SHIRAHATA, Joshi, Narayanan, Diamant, Wiltshire, Kang and Liu are related to the same field of endeavor (i.e.: neural network training). It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to combine the teaching of Lakhdharet with teachings of SHIRAHATA, Joshi, Narayanan, Diamant, Wiltshire, Kang and Liu to add a technique to distinguish between genuine and manipulated input data. (Lakhdharet, Abstract).
Allowable Subject Matter
Claim(s) 3, 13, 18, 23 and 28 are objected to as being dependent upon a rejected base claim and would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims and amended to overcome the rejection under 35 U.S.C. 101 set forth in this Office action. The prior art made of record does not teach, make obvious, or suggest the claim limitations as disclosed in applicant's claims.
Claim 3 and analogous claim(s) 13, 18, 23 and 28 recite:
PNG
media_image1.png
84
424
media_image1.png
Greyscale
wherein the second activation function is represented as a(x) and is represented by the following equation:
wherein a denotes the first hyperparameter associated with the ascending slope of the second activation function, b denotes the second hyperparameter associated with the descending slope of the second activation function, e denotes Euler's number, x denotes an input of the second nodes, and O(x) denotes a Heaviside step function that allows an output of the second activation function to be 0 when x is less than 0.
Closest prior art:
Diamant et al., Pub. No.: US10740432B1.
Diamant describes methods and systems for carrying out hardware calculations of mathematical functions. It involves a system that includes a mapping table, which connects various base values to parameters linked to a mathematical function. A selection module is used to choose a first base value and its parameters based on an input value from the mapping table. Arithmetic circuits then receive this first base value and parameters and compute an estimated output value of the mathematical function for the given input value, based on the relationship between the input and the first base value. However, Diamant does not teach an activation function which is parametric function defined by hyperparameters a and b1 which control its rising and falling slopes. It grows with a power term, decays exponentially and output zero for negative inputs.
Zhou, Pub. No.: US20220116281A1.
Zhou describes a method and system for predicting network bandwidth. It involves collecting active network states at different times in the past and filling in any gaps with preset values. A sequence of these network states is created based on their order in time. Missing states are predicted using an auto-encoder, and finally, the network bandwidth is predicted based on the new sequence of states. However, Zhou does not teach an activation function which is parametric function defined by hyperparameters a and b1 which control its rising and falling slopes. It grows with a power term, decays exponentially and output zero for negative inputs.
Claim(s) 29 – 32 would be allowable if rewritten or amended to overcome the rejection under U.S.C. 101 set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
Claim 29 recites:
A processor-implemented method with a neural network, the method comprising: performing first spoofing detection by determining a first score based on one or more first feature vectors generated using a first intermediate layer of the neural network based on input data; determining whether to perform second spoofing detection, based on the first score; and in response to determining to perform the second spoofing detection, determining a second score based on an output vector generated by an output layer of the neural network based on the one or more first feature vectors; and performing the second spoofing detection based on a score in which the first score and the second score are combined.
Closest prior art:
Diamant et al., Pub. No.: US10740432B1.
Diamant describes methods and systems for carrying out hardware calculations of mathematical functions. It involves a system that includes a mapping table, which connects various base values to parameters linked to a mathematical function. A selection module is used to choose a first base value and its parameters based on an input value from the mapping table. Arithmetic circuits then receive this first base value and parameters and compute an estimated output value of the mathematical function for the given input value, based on the relationship between the input and the first base value. However, Diamant does not teach a process for detecting spoofing using a neural network by calculating a first score from initial feature vectors produced by the neural network's first layer based on input data. Next, it assesses whether to carry out a second level of spoofing detection based on the first score. If the decision is to proceed, a second score is determined from the output layer of the neural network using the initial feature vectors. Finally, the second spoofing detection is conducted by combining both the first and second scores.
Zhou, Pub. No.: US20220116281A1.
Zhou describes a method and system for predicting network bandwidth. It involves collecting active network states at different times in the past and filling in any gaps with preset values. A sequence of these network states is created based on their order in time. Missing states are predicted using an auto-encoder, and finally, the network bandwidth is predicted based on the new sequence of states. However, Zhou does not teach does not teach a process for detecting spoofing using a neural network by calculating a first score from initial feature vectors produced by the neural network's first layer based on input data. Next, it assesses whether to carry out a second level of spoofing detection based on the first score. If the decision is to proceed, a second score is determined from the output layer of the neural network using the initial feature vectors. Finally, the second spoofing detection is conducted by combining both the first and second scores.
In summary, the references made of record, fail to disclose the required claimed technical
features recited by claim 29 as a whole. The dependent claim(s) 30 – 32, would be allowable if rewritten or amended to overcome the rejection under U.S.C. 101 because of their dependency on claim 29.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Troy et al., Pub. No.: US10438076B2.
The text discusses systems that can gather 3D biometric image data and identify fake biometrics using anti-spoofing methods. Fake biometrics can include things like fake fingerprints or faces made from materials like silicone or plastic. The systems are designed to use various techniques to detect these fake biometrics.
Henry et al., Pub. No.: US10474628B2.
A processor has parts that retrieve and decode instructions from a set of architectural instructions. It includes a register that holds a value, set by executing instructions, and an execution unit. This unit has two types of memory: one for data and another for program instructions. Multiple processing units in the execution unit carry out the program instructions at a specific rate. This execution rate matches the first rate when the register holds a certain value, but is slower when it holds a different value.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner
should be directed to MATIYAS T MARU whose telephone number is (571)270-0902 or via email: matiyas.maru@uspto.gov. The examiner can normally be reached Monday - Friday (8:00am - 4:00pm) EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a
USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to
use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor,
Michelle Bechtold can be reached on (571)431-0762. The fax phone number for the organization were this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from
Patent Center. Unpublished application information in Patent Center is available to registered users.
To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit
https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and
https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional
questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like
assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA)
or 571-272-1000.
/M.T.M./ Examiner, Art Unit 2148
/MICHELLE T BECHTOLD/ Supervisory Patent Examiner, Art Unit 2148