DETAILED ACTION
This Non Final Office Action is in response to Request for Continued Examination filed on 01/28/2026. Claims 1-2, 10, 23-25, and 29-31 have been amended. Claims 2-3 and 11-12 were previously cancelled. Claims 1-31 filed on 01/05/2026 remain pending in the application.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Drawings
The drawings filed on 04/08/2022 are accepted.
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 01/28/2026 have been considered. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly an initialed and dated copy of Applicant's IDS form 1449 filed 01/28/2026 are attached to the instant Office action.
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 01/28/2026 has been entered.
Response to Arguments filed on 01/05/2026
With respect to the USC 112 rejection, and in response to the applicant’s remarks in Pages 13-16, examiner respectfully disagrees. Examiner submits that the claim invokes 112(f) interpretations, however, the specification does not definitively recite that the claimed functions of the algorithm is performed by a structure. The specification recites in publication [0133] that “The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor.”, emphasis underlined. Therefore, there may be a software, i.e. non-structured, component performing the claimed function. Examiner submits that the specification as drafted allow for the means to be performed by a software. Therefore, the USC 112 rejection in the office action mailed on 10/31/225 is maintained.
With respect to the rejection under USC 103, the applicant’s remarks in Pages 16-19 are considered moot in light of the newly found prior art. Please see detailed rejection below.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) with “means for…receiving…extracting… combining…determining…taking” in claim 29.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Structure of the receiving limitation is disclosed in the following Figures and paragraphs of the specification: Publication [0041] and Figure 4 illustrates the structure and method of receiving biometric data.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claim 29 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA the applicant regards as the invention.
Claim limitation “means for extracting… combining…determining…taking ” invokes 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The specification is devoid of adequate structure to perform the claimed function. In particular, the specification merely states the claimed function of in e.g. [0039-0057, 0103], there is no disclosure of any particular structure, either explicitly or inherently, to perform the data arrangement. The use of the term “means for” is not adequate to structure for performing the function because performing the claimed function can be done in a number of ways, hardware, software program or combination, hence, “means for” does not describe a particular structure for the function and provide enough description for one of ordinary skill in the art to understand which “input unit” structure or structures perform the claimed function. Furthermore, the recitation in the instant application in [0133] “The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor. Generally, where there are operations illustrated in figures, those operations may have corresponding counterpart means-plus-function components with similar numbering…” does not definitively recite a structure. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA 35 U.S.C. 112, second paragraph. Dependent clams are also rejected based on dependency.
Applicant may:
(a) Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph;
(b) Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(c) Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either:
(a) Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or
(b) Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claim 29 is rejected under 35 U.S.C. 112(a) or pre-AIA 35 U.S.C. 112, first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA the inventor(s), at the time the application was filed, had possession of the claimed invention. As described above, the disclosure does not provide adequate structure to perform the claimed functions. The specification does not demonstrate that applicant has made an invention that achieves the claimed function because the invention is not described with sufficient detail such that one of ordinary skill in the art can reasonably conclude that the inventor had possession of the claimed invention.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-5, 10, 15, 21-26, 29-30 are rejected under 35 U.S.C. 103 as being unpatentable over Razumenic et. al. (US 20200394289 A1) in view of Cheng et. al. (US 20170116459 A1), hereinafter Cheng and Khosla (US 10997421 B2), hereinafter Khosla.
Regarding claim 1, Razumenic teaches a method of biometric authentication (Razumenic [0054-0057] and Figure 2), comprising:
receiving an image of a biometric data source for a user (Razumenic Figure 2A verification image 204A);
extracting, through a first artificial neural network, features for at least the received image, the first artificial neural network being trained to extract (Razumenic Figure 2A verifications layer C-R 216A in [0035, 0050-0051, 0074], [0050] “…the C-R layers may also be implemented using a CNN, with certain constraints. The network that is formed by the C-R layers and the matching CNN may be trained “end-to-end…the C-R layers and the matching CNN may be trained using the backward propagation of errors (backpropagation). The C-R layers and the matching CNN may be trained so that the matching CNN outputs a metric that indicates the probability of a match between the enrollment image and the verification image. In some implementations, the C-R layers and the matching CNN may be trained using a binary cross-entropy loss function.”, and further discloses in [0055] extracting feature such that verification of images correspond to a live human and not a spoof attempt, i.e. anti-spoofing indicating anti-spoofing image features, “[0055] The techniques disclosed herein may also be used to perform liveness detection. In this context, the term “liveness detection” may refer to any technique for attempting to prevent imposters from gaining access to something (e.g., a device, a building or a space within a building). An imposter may, for example, attempt to trick an iris verification system by presenting an image of another person's eye to the camera, or playing a video of another person in front of the camera. An RNN-based framework that enables a comparison to be performed involving a plurality of verification images may be trained to provide an additional output that indicates the likelihood that the plurality of verification images correspond to a live human being and is not a spoof attempt.”, where the C-R layers correspond to CNN, Razumenic discloses the technique with RNN-based framework associated with CNN layers receiving verification images and enrollment images, and subsequently determining matching, are all utilized for detecting imposters, i.e. anti-spoofing, where the CNN receiving the verification images in e.g. Figure 2A 216A correspond to the first artificial neural network);
combining the extracted features for the at least the received image and a combined feature representation of a plurality of enrollment biometric data source images to generate a composite image [[using a second artificial neural network]] (Razumenic [0067-0068], where both are combined/concatenated as input to the matching CNN 222 in Figure 2, “[0068] The matching section includes a matching CNN 222. The set of enrollment image features 218 and the set of verification image features 220 may be concatenated and provided as input to the matching CNN 222.”, this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091] “[0091] The matching section includes a matching CNN 422 that may be similar to the matching CNNs 122, 222 discussed previously, except that the matching CNN 422 in the system 400 shown in FIG. 4 may be trained to accommodate a plurality of enrollment observations.”, where the intended use of composite image corresponds to the combined/concatenated features of the images in Figure 4),
the combined feature representation of the plurality of enrollment biometric data source images comprising features extracted by a third artificial neural network trained to extract user-related [and sensor-related] features from enrollment biometric data source images(Razumenic [0027] “…obtain a plurality of sets of enrollment image features corresponding to a plurality of enrollment images,”, [0051] “…enables a plurality of enrollment images to be compared to the verification image. The matching CNN may be trained to process a plurality of sets of features extracted from a plurality of enrollment images along with the set of features from the verification image.”, Figure 4 and [0088-0091] further discloses the enrollment features 418a-n extracted by 414a-n, i.e. third artificial neural network from the enrollment images 402a-n, where the 414a-n trained to extract user related features, e.g. user iris, where the enrollment features 418a-n are concatenated/combined along with the verification features as disclosed in [0068, 0091] “The matching section includes a matching CNN 222. The set of enrollment image features 218 and the set of verification image features 220 may be concatenated and provided as input to the matching CNN 222.”, where Figure 4 illustrates the same process as 222, as disclosed in [0091], except for a plurality of the enrollment biometric data source image and their corresponding features, where 422, similar to 222 in Figure 2, receives concatenated features from a plurality of images);
determining, using the composite image as input into a fourth artificial neural network, whether the received image of the biometric data source for the user is from a real biometric data source or a copy of the real biometric data source (Razumenic Figure 2 (222) determines, using the concatenated/combined extracted features for the received image 220 and the feature of the enrollment biometric as input into the CNN 222, whether the received image is from a living person, i.e. real, as opposed to an image/copy of a person, as disclosed in [0054-0055, 0068], this is further illustrated in Figure 4 CNN 422 with the concatenated/combined extracted features of the plurality of enrollment images and verification image as disclosed in [0088-0091], where CNN 222/ CNN 422 corresponds to the fourth artificial neural network); and
taking one or more actions to allow or deny the user access to a protected resource based on the determination (Razumenic taking action to use a device 1100 based on the determination as disclosed in [0142-0145]).
Razumenic does not explicitly disclose extract user-related and sensor-related features from enrollment biometric data source images, emphasis in bold-italic.
Chen discloses extract user-related and sensor-related features from enrollment biometric data source images (Chen [0003, 0035, 0046, 0049-0050] discloses extracting features of different sensors utilized in biometric recognitions for facilitating biometric recognition, where the process is performed overtime to account for eye movement, as disclosed in [0048-0049]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic to incorporate the teaching of Chen to utilize the above feature, with the motivation of accuracy, and tracking eye movement and estimating point of regard, as recognized by (Chen [0035] ).
Razumenic in view of Chen discloses biometric data source images combined. Razumenic in view of Chen does not disclose the features of received image and enrolment images are combined using a second artificial NN. Emphasis in italic below.
Khosla discloses wherein the extracted features for the at least the received image and a combined feature representation of a plurality of enrollment biometric data source images are combined using a second artificial neural network (Khosla Col. 9 line 56-57 “…a Recurrent Neural Network (RNN 310) that extracts temporal sequence features based on the outputs from CNN 308…the RNN 310 concatenates features from multiple frames (i.e., a temporal sequence).”, where 310, interpreted as second NN, is used to concatenate/combine a plurality features from a plurality of frames in a video to track object(s) in each frame of the video, where the current frame, interpreted as the received image/frame, and the previous frames where the object has already been identified and tracked is interpreted as the enrollment images as disclosed in e.g. Col. 7 line 60-67, Col. 8 line 1-12 and Col. 10 line 10-25).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Chen to incorporate the teaching of Khosla to utilize the above feature, with the motivation of identifying and tracking objects of interests, as recognized by (Khosla Abstract and Col. 9 line 12-20).
Claims 23, 29 recite similar limitations to claim 1, therefore rejected with the same rationale applied to claim 1.
Regarding claim 2, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, further comprising aggregating features extracted by the third neural network from information derived from a plurality of enrollment biometric data source images into the combined feature representation of the plurality of enrollment biometric data source images (Razumenic [0143] One or more sets of enrollment image features 1118 may be stored on the computing device 1100. The set(s) of enrollment image features 1118 may correspond to one or more enrollment images 1102. In some embodiments, the enrollment images 1102 may be stored on the computing device 1100 as well.”, where the plurality of enrollment biometric features extracted by CNN are collected/aggregated from a plurality of images as further disclosed in [0050, 0066-068, 0074], this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091], Khosla further discloses combining/aggregating/concatenating the features of the previous frames of the video corresponding to the enrolment images/frames where the tracked object of interest has already been identified. See rationale and motivation in claim 1).
Claim 24 recites similar limitations to claim 2, therefore, rejected with the same rationale applied to claim 2.
Regarding claim 3, Razumenic in view of Cheng and Khosla teaches the method of Claim 2, wherein the features extracted from the information derived from the plurality of enrollment biometric data source images are extracted during user biometric authentication enrollment (Razumenic Figures 4-5 and [0088-0093] discloses extracting features of the plurality of biometric enrollment images 504 as a next step after receiving the plurality of enrollment images 502).
Regarding claim 4, Razumenic in view of Cheng and Khosla teaches the method of Claim 2, wherein the features extracted from the information derived from the plurality of enrollment biometric data source images comprise features extracted from a representation derived from each of the plurality of enrollment biometric data source images (Razumenic Figures 4 and [0088-0093]).
Regarding claim 5, Razumenic in view of Cheng and Khosla teaches the method of Claim 2, wherein aggregating features extracted from the information derived from the plurality of enrollment biometric data source images into the combined feature representation comprises concatenating features extracted from each of the plurality of enrollment biometric data source images into a single set of features (Razumenic discloses in [0068] Figure 2 the enrollment biometric features extracted from the enrollment biometric data source image is concatenated with the verification image features 220, Figure 4 illustrates the same process, as disclosed in [0091], except for a plurality of the enrollment biometric data source image and their corresponding features, where 422 receives concatenated features from a plurality of images).
Regarding claim 10, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein extracting features for the at least the received image comprises: combining the received image and the plurality of enrollment biometric data source images into a stack of images; and extracting the features for the received image and features for each of the plurality of enrollment biometric data source images by processing the stack of images through the third artificial neural network (Razumenic Figure 4 illustrates C-R layers 414a-n and 416, construed as sub-blocks of an overall block interpreted a first artificial neural network, where the sub-blocks extract features utilizing similar algorithms as disclosed in [0050, 0066], where the received image and plurality of enrollment images are arranged into ordered images, i.e. nth image is associated with the 414nth algorithm, indicating stacking/assembling/arrangement of images, , Khosla further discloses combining/aggregating/concatenating the features of the previous frames of the video corresponding to the enrolment images/frames where the tracked object of interest has already been identified. See rationale and motivation in claim 1).
Claim 25 recites similar limitations to claim 10, therefore, rejected with the same rationale applied to claim 10.
Regarding claim 15, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein determining whether the received image of the biometric data source for the user is from a real biometric data source or a copy of the real biometric data source comprises calculating a distance metric comparing the received image and the plurality of enrollment biometric data source images (Razumenic Figure 2 (222) determines, using the extracted feature for the received image 220 and the feature of the enrollment biometric as input into the CNN 222, whether the received image is from a living person, i.e. real, as opposed to an image/copy of a person, as disclosed in [0054-0055, 0068], this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091], further illustrated in Figure 8 [0120-0121], where a distance metric is calculated as disclosed in [0048]).
Claim 26 recites similar limitations to claim 115, therefore, rejected with the same rationale applied to claim 15.
Regarding claim 21, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein the received image of the biometric data source for the user comprises an image of a fingerprint of the user (Razumenic [0009] “Other biometric verification techniques may compare enrollment and verification images of other distinguishing biological traits, such as retina patterns, fingerprints”).
Regarding claim 22, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein the received image of the biometric data source for the user comprises an image of a face of the user (Razumenic [0012] “the enrollment image and the verification image may both include a human iris. In some embodiments, the enrollment image and the verification image may both include a human face.”).
Regarding claim 30 (Currently Amended), Razumenic teaches a non-transitory computer-readable medium having instructions stored thereon which, when executed by a processor, cause[[s]] the processor to perform an operation (Razumenic [0054-0057] and Figures 2 and 4) comprising:
receiving an image of a biometric data source for a user (Razumenic Figure 2A verification image 204A);
extracting, through a first artificial neural network, features for at least the received image, the first artificial neural network being trained to extract (Razumenic Figure 2A verifications layer C-R 216A in [0035, 0050-0051, 0074], [0050] “…the C-R layers may also be implemented using a CNN, with certain constraints. The network that is formed by the C-R layers and the matching CNN may be trained “end-to-end…the C-R layers and the matching CNN may be trained using the backward propagation of errors (backpropagation). The C-R layers and the matching CNN may be trained so that the matching CNN outputs a metric that indicates the probability of a match between the enrollment image and the verification image. In some implementations, the C-R layers and the matching CNN may be trained using a binary cross-entropy loss function.”, and further discloses in [0055] extracting feature such that verification of images correspond to a live human and not a spoof attempt, i.e. anti-spoofing indicating anti-spoofing image features, “[0055] The techniques disclosed herein may also be used to perform liveness detection. In this context, the term “liveness detection” may refer to any technique for attempting to prevent imposters from gaining access to something (e.g., a device, a building or a space within a building). An imposter may, for example, attempt to trick an iris verification system by presenting an image of another person's eye to the camera, or playing a video of another person in front of the camera. An RNN-based framework that enables a comparison to be performed involving a plurality of verification images may be trained to provide an additional output that indicates the likelihood that the plurality of verification images correspond to a live human being and is not a spoof attempt.”, where the C-R layers correspond to CNN, Razumenic discloses the technique with RNN-based framework associated with CNN layers receiving verification images and enrollment images, and subsequently determining matching, are all utilized for detecting imposters, i.e. anti-spoofing, where the CNN receiving the verification images in e.g. Figure 2A 216A correspond to the first artificial neural network);
combining the extracted features for the at least the received image and a combined feature representation of a plurality of enrollment biometric data source images (Razumenic [0067-0068], where both are combined/concatenated as input to the matching CNN 222 in Figure 2, this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091]),
the combined feature representation of the plurality of enrollment biometric data source images comprising features extracted by a second artificial neural network trained to extract user-related [and sensor-related] features from enrollment biometric data source images, wherein the extracted features for the at least the received image and the combined feature representation of the plurality of enrollment biometric data source images are combined through a process (Razumenic [0027] “…obtain a plurality of sets of enrollment image features corresponding to a plurality of enrollment images,”, [0051] “…enables a plurality of enrollment images to be compared to the verification image. The matching CNN may be trained to process a plurality of sets of features extracted from a plurality of enrollment images along with the set of features from the verification image.”, Figure 4 and [0088-0091] further discloses the enrollment features 418a-n extracted by 414a-n, i.e. second artificial neural network from the enrollment images 402a-n, where the 414a-n trained to extract user related features, e.g. user iris, where the enrollment features 418a-n are concatenated/combined along with the verification features as disclosed in [0068, 0091] “The matching section includes a matching CNN 222. The set of enrollment image features 218 and the set of verification image features 220 may be concatenated and provided as input to the matching CNN 222.”, where Figure 4 illustrates the same process as 222, as disclosed in [0091], except for a plurality of the enrollment biometric data source image and their corresponding features, where 422, similar to 222 in Figure 2, receives concatenated features from a plurality of images ) comprising:
embedding the extracted features for the received image into a query vector using a first multi-layer perceptron; embedding the features extracted from the plurality of enrollment biometric data source images into a key vector using a second multi-layer perceptron; [and determining an aggregated feature based on the calculating a softmax based on the query vector and key vector] (Razumenic Figure 2A 214A-216A illustrating multilayers, further in Figure 4 414-416, where the extracted features of the received image features along with the enrollment images features are fed into multi-layer process, where this process combine feature elements to determine a metric as illustrated in e.g. Figure 8 824.
Examiner notes that the above limitation allows for broader interpretations, for example:
1) The above overall features are embedded/fed into the multilayer CNN 822 and undergo a process of comparison and matching, using the overall features/elements to produces metrics, e.g. 824, here examiner notes that the process in CNN 822 receives verification image features, interpreted as query vectors, and the enrollment images vectors, interpreted as key vectors (where the query vector and key vector represents elements/features that are used for calculating the similarity between a query vector and key vector in a sequence, which is performed by the prior art to determine match between received image and enrollment image), where a matching CNN, e.g. CNN 822, utilizes multilayers as illustrated in Figure 2 222A, interpreted as first and second multi-layer perceptron, since there is no distinguishable feature in the limitation that distinguish first and second perceptron,
2) The above overall features, including received image features, i.e. query vector, are embedded/fed into the multilayer CNN 822 and undergo a process of comparison and matching, using the overall features/elements to produces metrics, e.g. 824, here examiner notes that the process in CNN 822 receives verification image features and uses CNN 822 to produce metric 824a, here examiner interprets this process as using a first multi-layer perceptron. Then, the process is in CNN 822 is repeated with the overall features, including enrollment images features, i.e. key vector, to produces 824a-b and eventually 834, here examiner interprets this process as using a second multi-layer perceptron, as disclosed in e.g. [0114-0115]);
determining, using the combined extracted features for the at least the received image and the combined feature representation of the plurality of enrollment biometric data source images as input into a third artificial neural network, whether the received image of the biometric data source for the user is from a real biometric data source or a copy of the real biometric data source based on using a function to generate a probability distribution based on the query vector and the key vector (Razumenic Figure 2 (222) determines, using the extracted feature for the received image 220 and the feature of the enrollment biometric as input into the CNN 222, whether the received image is from a living person, i.e. real, as opposed to an image/copy of a person, as disclosed in [0054-0055, 0068], this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091], and further disclosed in [0114-0115] and illustrated in Figure 8 where the process in CNN 822 is repeated, last sequence, which can be interpreted as an artificial neural network to arrive at the liveness metric 834, where a metric 834 that indicates the probability that the verification images 804a-c represent a live human being, where the probability is based on metrics calculated based on repeated process of CNN 822 and based on received/query features/elements and enrollment/key features/elements); and
taking one or more actions to allow or deny the user access to a protected resource based on the determination (Razumenic taking action to use a device 1100 based on the determination as disclosed in [0142-0145]).
Razumenic does not explicitly disclose extract user-related and sensor-related features from enrollment biometric data source images, emphasis in italic.
Chen discloses extract user-related and sensor-related features from enrollment biometric data source images (Chen [0003, 0035, 0046, 0049-0050] discloses extracting features of different sensors utilized in biometric recognitions for facilitating biometric recognition, where the process is performed overtime to account for eye movement, as disclosed in [0048-0049]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic to incorporate the teaching of Chen to utilize the above feature, with the motivation of accuracy, and tracking eye movement and estimating point of regard, as recognized by (Chen [0035] ).
While Razumenic discloses the aforementioned limitations, and further discloses different neural network components, which would have made it obvious to conceive of one, second and third NN, however, Razumenic in view of Chen do not explicitly disclose the features of received image and enrolment images are combined and determining an aggregated feature based on the calculating a softmax based on the query vector and key vector. Emphasis in italic below.
Khosla discloses disclose the features of received image and enrolment images are combined and determining an aggregated feature based on the calculating a softmax based on the query vector and key vector (Khosla Col. 9 line 56-57 “…a Recurrent Neural Network (RNN 310) that extracts temporal sequence features based on the outputs from CNN 308…the RNN 310 concatenates features from multiple frames (i.e., a temporal sequence).”, where 310, interpreted as second NN, is used to concatenate/combine a plurality features from a plurality of frames in a video to track object(s) in each frame of the video, where the current frame, interpreted as the received image/frame, and the previous frames where the object has already been identified and tracked is interpreted as the enrollment images as disclosed in e.g. Col. 7 line 60-67, Col. 8 line 1-12 and Col. 10 line 10-25, further discloses in Col. 10 line 53-55 “… softmax refers to normalizing the node values so they sum to 1, and the highest value then becomes the declared activity…”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Chen to incorporate the teaching of Khosla to utilize the above feature, with the motivation of identifying and tracking objects of interests, as recognized by (Khosla Abstract and Col. 9 line 12-20 ).
Claims 6 and 31 are rejected under 35 U.S.C. 103 as being unpatentable over Razumenic et. al. (US 20200394289 A1) in view of Cheng, Khosla, and Carlson et. al. (US 9436876 B1), hereinafter Carlson.
Regarding claim 6, Razumenic in view of Cheng and Khosla teaches the method of Claim 2, wherein aggregating features extracted from the information derived from the plurality of enrollment biometric data source images into the combined feature representation comprises generating a feature output [based on an autoregressive model] and features extracted from each of the plurality of enrollment biometric data source images (Figure 4 illustrates 424 collecting the extracted features from the plurality of enrollment biometric data source images, [0068] “The set of enrollment image features 218 and the set of verification image features 220 may be concatenated and provided as input to the matching CNN 222”).
Razumenic discloses utilizing previous measurements for future measurements as illustrated in [0114]. However, Razumenic in view of Cheng and Khosla does not explicitly disclose autoregressive model.
Carlson discloses generating a feature output based on an autoregressive model (Carlson Col. 3 line 63-67 and Col. 4 line 1-28 “Systems and approaches in accordance with various embodiments enable more precise segmentation of videos, particularly with respect to gradual transitions or soft cuts. FIG. 2 illustrates a flow diagram 200 for an example approach for video segmentation that can be used in accordance with an embodiment. It should be understood that, for any system discussed herein, there can be additional, fewer, or alternative components performing similar functionality or functionality in alternative orders, or in parallel, within the scope of the various embodiments unless otherwise stated. In this example, an input video 202 is provided to a feature extraction module 204 (or software application, service, or other element located within at least one working memory device of a system having a processor) for capturing features of the input video 202. Features characterize the content of the video, and can be extracted from the visual content, audio, text (e.g., speech-to-text translation, closed captioning, subtitles, screenplay or script, etc.), metadata, or other data corresponding to the video. Visual features utilized for video segmentation include luminance (e.g., average grayscale luminance or the luminance channel in a color model such as hue-saturation-luminance (HSL)); color histograms; image edges; texture-based features (e.g., Tamura features, simultaneous autoregressive models, orientation features, co-occurrence matrices); features of objects in the video (e.g., faces or color, texture, and/or size of detected objects); transform coefficients (e.g., Discrete Fourier Transform, Discrete Cosine Transform, wavelet); and motion; among others. The size of the region from which features are extracted can also vary. For example, features can be extracted on a pixel-by-pixel basis, at a rectangular block level, according to various shaped regions, or by a whole frame, among other approaches.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Carlson to utilize the above feature, to automate segmentation of digital video, where the auto-regression model is one of the one of the finite learning models that is obvious to try.
Regarding claim 31 (New), Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein the combined feature representation of the plurality of enrollment biometric data source images is generated based on providing the plurality of enrollment biometric data source images as input to [[an autoregressive]] model (Figure 4 illustrates 424 collecting the extracted features from the plurality of enrollment biometric data source images, [0068] “The set of enrollment image features 218 and the set of verification image features 220 may be concatenated and provided as input to the matching CNN 222”, Khosla further discloses aggregating features, see rationale and motivation in claim 1).
Razumenic in view of Cheng discloses enrollment biometric data source images and biometric received image. Razumenic in view of Cheng do not disclose the below limitation. Emphasis in italic.
Khosla discloses wherein the model comprises calculating an activation for a given layer based on a product of. an activation for a preceding layer; and an enrollment feature (Khosla Col. 9 line 56-57 “…a Recurrent Neural Network (RNN 310) that extracts temporal sequence features based on the outputs from CNN 308…the RNN 310 concatenates features from multiple frames (i.e., a temporal sequence).”, where 310, interpreted as second NN, is used to concatenate/combine a plurality features from a plurality of frames in a video to track object(s) in each frame of the video, where the current frame, interpreted as the received image/frame, and the previous frames where the object has already been identified and tracked is interpreted as the enrollment images as disclosed in e.g. Col. 7 line 60-67, Col. 8 line 1-12 and Col. 10 line 10-25, where the Feature Extractor 314, which includes CNN 308, and Recurrent Neural Network (RNN 310) are a sequence of layers as disclosed in Col. 9 line 20-5, where the calculation of a given layer is based on the preceding layers and previously extracted features from previously tracked objects in previous frames, where the previous frames are construed as enrollment frames/images).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Chen to incorporate the teaching of Khosla to utilize the above feature, with the motivation of identifying and tracking objects of interests, as recognized by (Khosla Abstract and Col. 9 line 12-20).
Razumenic in view of Cheng and Khosla discloses utilizing previous measurements for future measurements as illustrated in e.g. Razumenic [0114]. Similarly with Khosla disclosing sequential/temporal measurements of streaming frames to identify and track objects. However, Razumenic in view of Cheng and Khosla does not explicitly disclose autoregressive model.
Carlson discloses generating a feature output based on an autoregressive model (Carlson Col. 3 line 63-67 and Col. 4 line 1-28 “Systems and approaches in accordance with various embodiments enable more precise segmentation of videos, particularly with respect to gradual transitions or soft cuts. FIG. 2 illustrates a flow diagram 200 for an example approach for video segmentation that can be used in accordance with an embodiment. It should be understood that, for any system discussed herein, there can be additional, fewer, or alternative components performing similar functionality or functionality in alternative orders, or in parallel, within the scope of the various embodiments unless otherwise stated. In this example, an input video 202 is provided to a feature extraction module 204 (or software application, service, or other element located within at least one working memory device of a system having a processor) for capturing features of the input video 202. Features characterize the content of the video, and can be extracted from the visual content, audio, text (e.g., speech-to-text translation, closed captioning, subtitles, screenplay or script, etc.), metadata, or other data corresponding to the video. Visual features utilized for video segmentation include luminance (e.g., average grayscale luminance or the luminance channel in a color model such as hue-saturation-luminance (HSL)); color histograms; image edges; texture-based features (e.g., Tamura features, simultaneous autoregressive models, orientation features, co-occurrence matrices); features of objects in the video (e.g., faces or color, texture, and/or size of detected objects); transform coefficients (e.g., Discrete Fourier Transform, Discrete Cosine Transform, wavelet); and motion; among others. The size of the region from which features are extracted can also vary. For example, features can be extracted on a pixel-by-pixel basis, at a rectangular block level, according to various shaped regions, or by a whole frame, among other approaches.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Carlson to utilize the above feature, to automate segmentation of digital video, where the auto-regression model is one of the one of the finite learning models that is obvious to try.
Claims 7 is rejected under 35 U.S.C. 103 as being unpatentable over Razumenic et. al. (US 20200394289 A1) in view of Cheng, Khosla and Aoki (US 20190065819 A1), hereinafter Aoki.
Regarding claim 7, Razumenic in view of Cheng and Khosla teaches the method of Claim 2, wherein aggregating features extracted from the information derived from the plurality of enrollment biometric data source images into the combined feature representation comprises generating, from the features extracted from the plurality of enrollment biometric data source images, an average [and a standard deviation] associated with the features extracted from the plurality of enrollment biometric data source images (Razumenic [0113] where an average of the metrices 824a-c are determined, where the matrices are a process of features of the enrollment biometric features).
While Razumenic discloses a statistical value, i.e. Average, which would make it obvious to conceive of other statistical values from the average, however, Razumenic in view of Cheng and Khosla does not explicitly disclose standard deviation
Aoki discloses from the features extracted from the plurality of biometric data source images; a standard deviation associated with the features extracted (Aoki [0037-0038]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Aoki to utilize the above feature, with the motivation of normalizing the feature values, as recognized by (Aoki [0037] ).
Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Razumenic et. al. (US 20200394289 A1) in view of Cheng, Khosla and Noda (US 20210209452 A1), hereinafter Noda.
Regarding claim 8, Razumenic in view of Cheng and Khosla teaches the method of Claim 2, wherein: the first artificial neural network and the third artificial neural network comprise convolutional neural networks, [and the first artificial neural network shares at least a subset of weights associated with the third artificial neural network] (Razumenic Figure 2 [0048, 0050, 0066]).
Razumenic in view of Cheng and Khosla does not disclose sharing weights.
Noda discloses the first artificial neural network shares at least a subset of weights associated with the second artificial neural network (Noda [0089] and Figure 2 [0089] “Next, a variation of the second embodiment will be described. In the description of the variation, description similar to the description in the second embodiment will be omitted, and points different from the second embodiment will be described. At least two or more neural networks of first neural networks 101a and 101b, a second neural network 102, and a third neural network 103 share at least part of weights.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Noda to utilize the above feature, with the motivation of improving generalization performance, as recognized by (Noda [0022] ).
Regarding claim 9, Razumenic in view of Cheng and Khosla teaches the method of Claim 2, further comprising extracting additional features from the received image and the plurality of enrollment biometric data source images using [a weight-shared] convolutional neural network, the extracted features for the received image, and the features extracted from the plurality of enrollment biometric data source images (Razumenic [0048] discloses using CNN, Figure 4, where a plurality of features are extracted for each of the enrollment biometric data source images, and plurality of features for the received image, where the plurality of features include, features and additional features).
Razumenic in view of Cheng and Khosla does not disclose sharing weights.
Noda discloses weight-shared (Noda [0089] and Figure 2 [0089] “Next, a variation of the second embodiment will be described. In the description of the variation, description similar to the description in the second embodiment will be omitted, and points different from the second embodiment will be described. At least two or more neural networks of first neural networks 101a and 101b, a second neural network 102, and a third neural network 103 share at least part of weights.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Noda to utilize the above feature, with the motivation of improving generalization performance, as recognized by (Noda [0022] ).
Claims 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Razumenic et. al. (US 20200394289 A1) in view of Cheng, Khosla and Liu et. al. (US 20200160547 A1), hereinafter Liu.
Regarding claim 11, Razumenic in view of Cheng and Khosla teaches the method of Claim 10, wherein combining the received image and the plurality of enrollment biometric data source images into the stack of images comprises:
Razumenic discloses the biometric image data received and matched with enrolled image data. Razumenic in view of Cheng and Khosla does not disclose the below limitation. Emphasis in italic.
Liu discloses identifying, relative to at least one image of the plurality of enrollment biometric data source images, a transformation to apply to the received image such that the received image is aligned with at least a portion of the at least one image of the plurality of enrollment biometric data source images; modifying the received image based on the identified transformation; and generating a stack including the modified received image and the at least the one image of the plurality of enrollment biometric data source images (Liu Figure 6 illustrates e.g. image 602 modified to extract features 605, where the disparity and matching of the first and second maps of the images are calculated, as disclosed in e.g. [0009, 0100-0102], [0071] “The output apparatus removes noise from an overlay result using a matching information accumulating module (refer to reference numeral 615 of FIG. 6).”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Liu to utilize the above feature, with the motivation of prediction of objects in advance and for feature such as driving assistance, as recognized by (Liu Abstract ).
Regarding claim 12, Razumenic in view of Cheng, Khosla and Liu teaches the method of Claim 11.
Razumenic discloses the biometric image data received and matched with enrolled image data. Razumenic in view of Cheng and Khosla does not disclose the below limitation. Emphasis in italic.
Liu discloses wherein generating the stack including the modified received image and the plurality of enrollment biometric data source images comprises one or more of stacking the modified received image and the at least the one image of the plurality of enrollment biometric data source images on a channel dimension, subtracting the modified received image from the at least the one image of the plurality of enrollment biometric data source images, overlaying the received image on the at least the one image of the plurality of enrollment biometric data source images, outputting an intersection of the modified received image and the at least the one image of the plurality of enrollment biometric data source images, or transforming the modified received image based on a stitched version of the plurality of enrollment biometric data source images (Liu Figure 6 illustrates e.g. image 602 modified to extract features 605, where the disparity and matching of the first and second maps of the images are calculated, as disclosed in e.g. [0009, 0100-0102], [0071] “The output apparatus removes noise from an overlay result using a matching information accumulating module (refer to reference numeral 615 of FIG. 6).”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Liu to utilize the above feature, with the motivation of prediction of objects in advance and for feature such as driving assistance, as recognized by (Liu Abstract ).
Regarding claim 13, Razumenic in view of Cheng and Khosla teaches the method of Claim 10.
Razumenic discloses the biometric image data received and matched with enrolled image data. Razumenic in view of Cheng and Khosla does not disclose the below limitation. Emphasis in italic.
Liu discloses wherein combining the received image and the plurality of enrollment biometric data source images into the stack of images comprises: identifying, relative to the received image, a transformation to apply at least one image of the plurality of enrollment biometric data source images such that the received image is aligned with at least a portion of the at least one image of the plurality of enrollment biometric data source images; modifying the at least the one image of the plurality of enrollment biometric data source images based on the identified transformation; and generating a stack including the received image and the modified at least the one image of the plurality of enrollment biometric data source images (Liu Figure 6 illustrates e.g. images 601 and 602 modified to extract features 605, where the disparity and matching of the first and second maps of the images are calculated, as disclosed in e.g. [0009, 0100-0102], [0071] “The output apparatus removes noise from an overlay result using a matching information accumulating module (refer to reference numeral 615 of FIG. 6).”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Liu to utilize the above feature, with the motivation of prediction of objects in advance and for feature such as driving assistance, as recognized by (Liu Abstract ).
Regarding claim 14, Razumenic in view of Cheng and Khosla and Liu teaches the method of Claim 13.
Razumenic discloses the biometric image data received and matched with enrolled image data. Razumenic does not disclose the below limitation. Emphasis in italic.
Liu discloses wherein generating the stack including the received image and the modified at least the one image of the plurality of enrollment biometric data source images comprises: stacking the received image and the modified at least the one image of the plurality of enrollment biometric data source images on a channel dimension, subtracting the received image from the modified at least the one image of the plurality of enrollment biometric data source images, overlaying the received image on the modified at least the one image of the plurality of enrollment biometric data source images, or outputting an intersection of the received image and the modified at least the one image of the plurality of enrollment biometric data source images (Liu Figure 6 illustrates e.g. images 601 and 602 modified to extract features 605, where the disparity and matching of the first and second maps of the images are calculated, as disclosed in e.g. [0009, 0100-0102], [0071] “The output apparatus removes noise from an overlay result using a matching information accumulating module (refer to reference numeral 615 of FIG. 6).”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Liu to utilize the above feature, with the motivation of prediction of objects in advance and for feature such as driving assistance, as recognized by (Liu Abstract ).
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Razumenic et. al. (US 20200394289 A1) in view of Cheng, Khosla and Aoki (US 20190065819 A1), hereinafter Aoki.
Regarding claim 16, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein determining whether the received image of the biometric data source for the user is from a real biometric data source or a copy of the real biometric data source comprises calculating a log likelihood of the received image being a real biometric data source, given a mean [and a standard deviation] associated with the features extracted from the plurality of enrollment biometric data source images (Razumenic Figure 2 (222) determines, using the extracted feature for the received image 220 and the feature of the enrollment biometric as input into the CNN 222, whether the received image is from a living person, i.e. real, as opposed to an image/copy of a person, as disclosed in [0054-0055, 0068], this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091], further illustrated in Figure 8 [0120-0121], where a distance metric is calculated as disclosed in [0048], [0121] “[0121] The method 900 also includes determining 914 an additional metric 834 (which may be referred to as a liveness metric 834) that indicates a likelihood that the plurality of verification images 804a-c correspond to a live human being. As indicated above, this liveness metric 834 may be updated as additional verification images 804a-c are processed.”).
While Razumenic discloses a statistical value, i.e. Average, which would make it obvious to conceive of other statistical values from the average, however, Razumenic in view of Cheng and Khosla does not explicitly disclose standard deviation
Aoki discloses standard deviation associated with the features extracted (Aoki [0037-0038]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Aoki to utilize the above feature, with the motivation of normalizing the feature values, as recognized by (Aoki [0037] ).
Claims 17 and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Razumenic et. al. (US 20200394289 A1) in view of Cheng, Khosla and Chang (US 20200320408 A1), hereinafter Chang2 .
Regarding claim 17, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein determining whether the received image of the biometric data source for the user is from a real biometric data source or a copy of the real biometric data source comprises weighting the extracted features for the received image and the features extracted from the plurality of enrollment biometric data source images [using a key-query-value attention layer] (Razumenic Figure 2 (222) determines, using the extracted feature for the received image 220 and the feature of the enrollment biometric as input into the CNN 222, whether the received image is from a living person, i.e. real, as opposed to an image/copy of a person, as disclosed in [0054-0055, 0068], this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091], further illustrated in Figure 8 [0120-0121], where a distance metric is calculated as disclosed in [0048], [0121] “[0121] The method 900 also includes determining 914 an additional metric 834 (which may be referred to as a liveness metric 834) that indicates a likelihood that the plurality of verification images 804a-c correspond to a live human being. As indicated above, this liveness metric 834 may be updated as additional verification images 804a-c are processed.”).
Razumenic in view of Cheng and Khosla does not disclose using a key-query-value attention layer.
Chang2 discloses using a key-query-value attention layer (Chang1 [0053-0054, 0060).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Chang1 to utilize the above feature, with the motivation of decreasing complexity, as recognized by (Chang1 [0053-0054] ).
Claim 27 recites limitation similar to claim 17, therefore rejected with the same rationale and motivation applied to claim 17.
Claims 18 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Razumenic et. al. (US 20200394289 A1) in view of Cheng, Khosla and Barnes (US 20210158495 A1), hereinafter Barnes.
Regarding claim 18, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein determining whether the received image of the biometric data source for the user is from a real biometric data source or a copy of the real biometric data source comprises:
embedding the extracted features for the received image into a query vector using a first multi-layer perceptron (Razumenic Figure 2A 216A illustrating multilayers, Figure 4 416 and further see rationale in claim 30);
embedding the features extracted from the plurality of enrollment biometric data source images into a key vector using a second multi-layer perceptron (Razumenic Figure 2A 214A illustrating multilayers, Figure 4 414a-n and further see rationale in claim 30);
embedding the features extracted from the plurality of enrollment biometric data source images into a value vector using a third multi-layer perceptron (Razumenic Figure 2A CN222A illustrating multilayers, Figure 4 422 and further see rationale in claim 30); and generating a value corresponding to a likelihood that the received image is from a real biometric data source based [on an inner product between] the query vector and the key vector, conditioned on features embedded into the query vector (Razumenic [0050-0054], Figure 2 (222) determines, using the extracted feature for the received image 220 and the feature of the enrollment biometric as input into the CNN 222, whether the received image is from a living person, i.e. real, as opposed to an image/copy of a person, as disclosed in [0054-0055, 0068], this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091], further illustrated in Figure 8 [0120-0121], where a distance metric is calculated as disclosed in [0048], [0121] “[0121] The method 900 also includes determining 914 an additional metric 834 (which may be referred to as a liveness metric 834) that indicates a likelihood that the plurality of verification images 804a-c correspond to a live human being. As indicated above, this liveness metric 834 may be updated as additional verification images 804a-c are processed.” and further see rationale in claim 30).
Razumenic in view of Cheng and Khosla does not explicitly disclose that 834 in Figure 8 is based on inner-product.
Barnes likelihood that the received image is from… based on an inner product between the query vector and the key vector (Barnes [0044] “At block 306, the process 300 involves comparing a query of the target image 102 to the set of keys of the reference image 114 to generate matching costs. For instance, the output weighting engine 110 can take a dot product or an L2 distance of the feature vector of the query with the feature vectors of the keys. More generally, the comparison of the query to the set of keys may be a sum of (i) a bilinear form between the query and each key of the set of keys and (ii) a second bilinear form of each key with itself. In such a sum, bilinear form weights may be hand-specified or manually learned. That is, an inner product or a distance metric between the query and each key of the set of keys is determined, and the inner product or distance metric includes an equal weight for all input components, a hand-specified input weighting, or an input weighting that is learned by a trainable module. The dot product and the L2 distance described above are special cases of the bilinear forms. In other examples, the comparisons may be performed using a sum of squared differences between the query and the set of keys. The dot products or the sums of squared differences may provide indications of differences between the query and each of the keys. The differences may be referred to as matching costs.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Barnes to utilize the above feature, with the motivation of improvements in matching mages, as recognized by (Barnes [0020] ).
Claim 28 recites limitation similar to claim 18, therefore rejected with the same rationale and motivation applied to claim 18.
Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Razumenic et. al. (US 20200394289 A1) in view of Cheng and Khosla and Kassner (US 20220198789 A1), hereinafter Kassner.
Regarding claim 19, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein determining whether the received image of the biometric data source for the user is from a real biometric data source or a copy of the real biometric data source comprises [gating] one or more of the extracted features for the received image based on features extracted from the plurality of enrollment biometric data source images (Razumenic [0050-0054], Figure 2 (222) determines, using the extracted feature for the received image 220 and the feature of the enrollment biometric as input into the CNN 222, whether the received image is from a living person, i.e. real, as opposed to an image/copy of a person, as disclosed in [0054-0055, 0068], this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091], further illustrated in Figure 8 [0120-0121], where a distance metric is calculated as disclosed in [0048], [0121] “[0121] The method 900 also includes determining 914 an additional metric 834 (which may be referred to as a liveness metric 834) that indicates a likelihood that the plurality of verification images 804a-c correspond to a live human being. As indicated above, this liveness metric 834 may be updated as additional verification images 804a-c are processed.”).
Razumenic in view of Cheng and Khosla does not explicitly disclose the gating. Emphasis in italic.
Kassner discloses gating one or more of the extracted features for the received image (Kassner [0271] “In a preferred embodiment, the neural network additionally uses one or more so-called “squeeze—and—excitation” (SE) blocks (layers). Such blocks perform feature recalibration. Input data or features U (W×H×C corresponding to image width×image height×number of channels) are first passed through a squeeze operation, which aggregates the feature maps across spatial dimensions W×H to produce a channel descriptor (1×1×C). This descriptor embeds the global distribution of channel-wise feature responses, enabling information from the global receptive field of the network to be leveraged by its lower layers. This is followed by an excitation operation, in which sample-specific activations, learned for each channel by a self-gating mechanism based on channel dependence, govern the excitation of each channel. The feature maps U are then reweighted channel-wise by these additionally learned parameters to generate the output of the SE block which can then be fed directly into subsequent layers.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Kassner to utilize the above feature, with the motivation of reweighting channel-wise by the additionally learned parameters, as recognized by (Kassner [0271] ).
Regarding claim 20, Razumenic in view of Cheng and Khosla teaches the method of Claim 1, wherein: determining whether the received image of the biometric data source for the user is from a real biometric data source or a copy of the real biometric data source comprises [gating] the extracted features for the received image [in a squeeze-excite network] based on the features extracted from the plurality of enrollment biometric data source images; the extracted features are represented [by a height dimension, a width dimension, and a channel dimension]; and the [gating] is performed on the channel dimension (Razumenic [0050-0054], Figure 2 (222) determines, using the extracted feature for the received image 220 and the feature of the enrollment biometric as input into the CNN 222, whether the received image is from a living person, i.e. real, as opposed to an image/copy of a person, as disclosed in [0054-0055, 0068], this is further illustrated in Figure 4 with plurality of enrollment images and associated features as disclosed in [0088-0091], further illustrated in Figure 8 [0120-0121], where a distance metric is calculated as disclosed in [0048], [0121] “[0121] The method 900 also includes determining 914 an additional metric 834 (which may be referred to as a liveness metric 834) that indicates a likelihood that the plurality of verification images 804a-c correspond to a live human being. As indicated above, this liveness metric 834 may be updated as additional verification images 804a-c are processed.”).
Razumenic in view of Cheng and Khosla does not explicitly disclose the below limitations. Emphasis in italic.
Kassner discloses gating the extracted features for the received image in a squeeze-excite network, features are represented by a height dimension, a width dimension, and a channel dimension; and the gating is performed on the channel dimension (Kassner [0271] “In a preferred embodiment, the neural network additionally uses one or more so-called “squeeze—and—excitation” (SE) blocks (layers). Such blocks perform feature recalibration. Input data or features U (W×H×C corresponding to image width×image height×number of channels) are first passed through a squeeze operation, which aggregates the feature maps across spatial dimensions W×H to produce a channel descriptor (1×1×C). This descriptor embeds the global distribution of channel-wise feature responses, enabling information from the global receptive field of the network to be leveraged by its lower layers. This is followed by an excitation operation, in which sample-specific activations, learned for each channel by a self-gating mechanism based on channel dependence, govern the excitation of each channel. The feature maps U are then reweighted channel-wise by these additionally learned parameters to generate the output of the SE block which can then be fed directly into subsequent layers.”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Razumenic in view of Cheng and Khosla to incorporate the teaching of Kassner to utilize the above feature, with the motivation of reweighting channel-wise by the additionally learned parameters, as recognized by (Kassner [0271] ).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Yang (US 20210097290 A1) discloses video retrieval in feature descriptor domain in an artificial intelligence semiconductor solution
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BASSAM A NOAMAN whose telephone number is (571)272-2705. The examiner can normally be reached Monday-Friday 8:30 AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Eleni A. Shiferaw can be reached at (571) 272-3867. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BASSAM A NOAMAN/Primary Examiner, Art Unit 2497