DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments, see remarks, filed 11/25/2025, with respect to the rejection of the claims under 35 U.S.C. 112(b) have been fully considered and are persuasive. The rejection of the claims under 35 U.S.C. 112(b) has been withdrawn.
Applicant's arguments filed, 11/25/2025, with respect to the rejection of claim 1 under 35 U.S.C. 101 have been fully considered but they are not persuasive. On page 7 of the remarks, Applicant argues “
PNG
media_image1.png
210
640
media_image1.png
Greyscale
”.
Examiner disagrees. A pair of identically weighted networks is generic and lacking sufficient technical structure, is generic, and because of the lack of technical structure sufficient to distinguish it from a generic computer, it is therefore equivalent to two human beings with the same level of vision (same weighting) with the ability to do feature extraction of images using their respective visions; there is no specific machine learning/neural network/artificial intelligence algorithm indicated in the claim and because the lack of specific architecture that indicates a specific machine, Examiner is allowed to do the 35 U.S.C. 101 analysis making the generic computer machine analogous with human vision in the field of image processing; there is no mathematical sense in the claim as argued by Applicant, nor is there any discussion of comparing input vectors in the claim; human vision easily executes feature extraction of images with no issues; further, there is no delineation in the claims as to what exactly “equally weighting” means in a technical sense. Applicant cites McRO, Inc. v. Bandai Namco Games America Inc. to indicate that structures must be differentiated from human process and Applicant argues that identically weighted networks that outputs features extracted from images is somehow differentiated from human vision; however, there are no technical details in the claim that make this differentiation; Examiner asserts that the same arguments are made for the “match head.”
Applicant further argues, on page 8 of the remarks that “
PNG
media_image2.png
182
636
media_image2.png
Greyscale
”.
Examiner disagrees. First, claim 1 only recites the “identically weighted networks” performing feature extraction on images and does not connect the weighted networks to any other structure of the claim such as the memory, scanning element, photocopier, or the match head (the connection between match head and the networks is made in dependent claim 2, which was not rejected under 35 U.S.C. 101); so it is not, as applicant argues, that the networks detect a match with the security pattern, rather it is the match head that does this in the claim. Applicant fails to explain how control of the operation of a photocopier’s scanning element, which is easily done by a human, is improving upon the technology of the photocopier itself, when the process of comparing an image with another image from a registered security pattern, is a process humans do; A human already executes the processes of feature extraction from images and is able to compare two images to see if they are similar, and then controls a photocopier to stop photocopying if the two images match indicating a high security document that should be photocopied; there is no improvement in the claim as written, and the claim is using generic machines (networks and match head) to do processes humans already accomplish. Lastly, Applicant states that the claim modifies the physical operation of the photocopier’s scanning hardware, however, turning off a photocopier because the document scanned is a security risk based on a match score of two images is not modifying the photocopier; turning the photocopier on or off is not an actual technical improvement to the photocopier, but rather strictly a control mechanism; the functioning of the photocopier itself or its scanning element are not directly improved upon.
Finally, Applicant argues, on page 9 of the remarks, that “
PNG
media_image3.png
362
638
media_image3.png
Greyscale
”.
Examiner disagrees. The identically weighted networks are not connected to the memory or the match head in the claim at all; therefore, the specific arrangement Applicant argues is not recited in the claim since the networks are not arranged in the claim to have any connection to the other components; the match head executes the main process of the claim of comparing images and controls the scanning element based on its output; therefore there is no specific arrangement of components except for the match head, memory, and the scanning element; this does not amount to anything beyond comparing documents since the match head is a generic computer component in relation to generic components of memory and the scanning element; the match head is the only component with functionality in relation to the scanning element; the feature extraction of the networks is unconnected to the remainder of the claim, therefore a specific arrangement argument cannot be made.
Therefore, the rejection of claim 1 under 35 U.S.C. 101 is maintained.
Applicant's arguments filed, 11/25/2025, with respect to the rejection of claims 1-2 under 35 U.S.C. 103 have been fully considered but they are not persuasive. Applicant argues, on page 10 of the remarks, that there is no motivation to combine primary reference Ledinauskas and secondary reference Romero because Ledinauskas does post-processing of visa page stamps and Romero des security scanning using OCR. Examiner disagrees. Ledinauskas teaches verifying, using a match head, the validity of a stamp in a scanned passport for security purposes; Romero teaches validating scanned documents that have some indication of security level access and preventing the printing or copying of the document using the scanner; The fields of endeavor are similar in terms of dealing with security of documents that are being scanned. Further Applicant argues that the combined references do not teach the claimed scanning control section of claim 1. However, as explained in the previous office action, one of ordinary skill in the art, when combing Ledinauskas and Romero, obtains the ability to use the Siamese ResNet machine learning model to verify a scanned passport, taught by Ledinauskas, to then control the scanner the passport was inserted into and throwing an alert to prevent the scanner (if it is confirmed that a matching stamp of the document has been found by the ML model) from making copies or printing out the passport to a user of the scanner without further verification, taught by Romero. It does not matter that Romero uses OCR or not for the image comparison for the claim to be taught. Applicant argues that Examiner is using impermissible hindsight reconstruction, when it is clear from Examiner’s explanation above that by combining the references, the twin neural networks verify a scanned passport and then the same scanner is controlled if there is a security alert based on matching the image of the passport to another image indicating a security breach. Improper hindsight relies on knowledge gleaned solely from the applicant’s disclosure, and there is nothing In Examiner’s combination that assumes knowledge of Applicant’s disclosure.
Therefore, the rejection of claims 1-2 under 35 U.S.C. 103 is maintained.
Applicant’s arguments, see remarks, filed 11/25/2025, with respect to the rejection of claims 3-20 under 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejection has been withdrawn. However, upon further consideration, a new ground of rejection is made in view of U.S. Patent Publication No.: 12,254,668 (Stoian et al.) and non-patent literature "Generalized contrastive optimization of Siamese networks for place recognition"; arXiv preprint arXiv:2103.06638 (2021) (Vallina et al.) (hereinafter Vallina).
Claim Interpretation
Applicant’s present set of claims use the language “logic for …” in system claims which invoke a 112(f) claim interpretation assuming Applicant’s specification does not provide sufficient structure for the “logic”. Upon review of Applicant’s present specification, para. [0238] recites: “Various functional operations described herein may be implemented in logic that is referred to using a noun or noun phrase reflecting said operation or function. For example, an association operation may be carried out by an "associator" or "correlator". Likewise, switching may be carried out by a "switch", selection by a "selector", and so on. "Logic" refers to machine memory circuits and non-transitory machine-readable media comprising machine- executable instructions (software and firmware), and/or circuitry (hardware) which by way of its material and/or material-energy configuration comprises control and/or procedural signals, and/or settings and values (such as resistance, impedance, capacitance, inductance, current/voltage ratings, etc.), that may be applied to influence the operation of a device. Magnetic media, electronic circuits, electrical and optical memory (both volatile and nonvolatile), and firmware are examples of logic. Logic specifically excludes pure signals or software per se (however does not exclude machine memories comprising software and thereby forming configurations of matter).” Therefore, 112(f) claim interpretation is not invoked.”
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claim 1 is rejected are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without integration into a practical application or recitation of significantly more.
Claim 1 is directed to one of the four statutory categories of eligible subject matter (an apparatus); thus, the claims pass Step 1 of the Subject Matter Eligibility Test (See flowchart in MPEP 2106).
Step 2A, prong 1 analysis:
The independent claims are directed to performing feature extraction on images; an image pair comprising a first image generated by a scanning element of the photocopier, and a second image obtained from registered security patterns, outputting a match score for the image pair; and controlling operation of the scanning element.
Each of the above steps can be performed mentally. In particular, a human security evaluator takes another human’s passport to verify its authenticity and scans it in a photocopier; the image is then evaluated. features are extracted using human vision, and compared to a database of stamps/watermarks that are supposed to show on the passport and match accordingly; the human evaluator compares the reference stamps/watermark to the stamp/watermark on the passport using their own human vision and decides how well they match; if there is not a match then the human security evaluator stops the scanning of the passport and prevents any printing, copying, or transmitting of the scanned passport because it might be inauthentic/fake; therefore, this process can all be done mentally.
As such, the description in independent claim 1 is an abstract idea – namely, a mental process. Accordingly, the analysis under prong one of step 2A of the Subject Matter Eligibility Test does not result in a conclusion of eligibility (See flowchart in MPEP 2106).
Additional elements:
The additional element recited in independent claims 1, 13, and 19 are a pair of identically weighted networks, a memory, a match head.
Step 2A, prong 2 analysis:
The above-identified additional elements do not integrate the judicial exception into a practical application.
Each of the other additional elements (a pair of identically weighted networks, a memory, a match head) amounts to merely using different devices as tools to perform the claimed mental process. Implementing an abstract idea on a computer or using known generic devices does not integrate a judicial exception into a practical application (See MPEP 2106.05(f)).
Moreover, the additional elements of the claims do not recite an improvement in the functioning of a computer or other technology or technical field, the claimed steps are not performed using a particular machine, the claimed steps do not effect a transformation, and the claims do not apply the judicial exception in any meaningful way beyond generically linking the use of the judicial exception to a particular technological environment (See MPEP 2106.04(d)). Therefore, the analysis under prong two of step 2A of the Subject Matter Eligibility Test does not result in a conclusion of eligibility (See flowchart in MPEP 2106).
Step 2B:
Finally, the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.
Each of the other additional elements (a pair of identically weighted networks, a memory, a match head) are generic computer features which perform generic computer functions that are well-understood, routine, and conventional and do not amount to more than implementing the abstract idea with a computerized system. Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea).
Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation, and mere implementation on a generic computer does not add significantly more to the claims. Accordingly, the analysis under step 2B of the Subject Matter Eligibility Test does not result in a conclusion of eligibility (See flowchart in MPEP 2106).
For all of the foregoing reasons, independent claim 1 does not recite eligible subject matter under 35 USC 101.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2 are rejected under 35 U.S.C. 103 as being unpatentable over non-patent literature "Automatic travel pattern extraction from visa page stamps using CNN models"; arXiv preprint arXiv:2112.00348 (2021) (Ledinauskas et al.) (hereinafter Ledinauskas), in view of U.S. Patent Application Publication No.: 2021/0297546 (Romero et al.) (hereinafter Romero).
Regarding claim 1, Ledinauskas teaches a pair of identically weighted networks configured to perform feature extraction on images (Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition: “For general stamp country and entry/exit recognition we propose to use Siamese networks. Siamese networks able to work with multiple input images simultaneously and select relevant features for various machine learning tasks, including the estimation of similarity. Similarity estimation can be used for classification where instead of classifying inputs into predefined classes, Siamese networks find the most similar example in the database of class examples. The model consists of two parallel embedding networks that share weights between themselves.”);
a memory storing registered security patterns; and a match head configured to receive an image pair, the image pair comprising a first image generated by a scanning element of the photocopier, and a second image obtained from the registered security patterns, the match head configured to output a match score for the image pair (Ledinauskas, page 3, FIG. 1; Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition; page 5, left-side column, para. 1-4: “Fig. 1: The proposed page processing pipeline. The image of visa page could be obtained by a passport scanner or from a video feed by the passport facing camera that is mounted on the table. After that the stamps are detected with a detector model. The detected stamps are later compared with a set of known stamp embeddings by using the similarity metric learning model which provides the country and travel direction of the most similar stamp in the database of stamp embeddings. If the stamp belongs to a Schengen country, then additionally the Schengen template and several additional models are being used to extract date, travel direction and a country.”
PNG
media_image4.png
206
940
media_image4.png
Greyscale
”;
“A similarity score can then be assigned to pairs of images by measuring the Euclidean distance (or some other distance metric) between their corresponding embedding vectors produced by the networks. While this approach requires more bookkeeping because of the database (in comparison to the classification model), the architecture is able to perform one-shot learning”; “The goal of learning a similarity metric is to train a model that would work in such a way that the distance between embeddings of two images of same stamp class would be as small as possible and the distance between embeddings of two different stamp classes would be as large as possible (note that same class here and below refers to stamps being from the same country and having the same entry or exit direction). We achieved this by employing a Siamese network that was trained using triplet loss … The architecture of our embedding network is a modified ResNet18 … ResNet18 is used to construct feature maps which are converted to embedding vectors by the head (the last layers) of the network. Our network head consists of average pooling layer, followed by fully connected layer and finally a layer that normalizes the output to unit vectors.”).
Ledinauskas fails to teach
a scanning control for a photocopier, the scanning control comprising: and an output of the match head coupled to control operation of the scanning element.
Romero teaches
a scanning control for a photocopier, the scanning control comprising: control operation of the scanning element in response to satisfying a verification (Romero, abstract; para. [0058]: “a system and method of operating a data security scanner including: a control unit configured to: examine a document to detect an instance of confidential data, generate an alert based on finding the instance of confidential data, initiate a verification process based on generating the alert, wherein the verification process fulfills the print request by comparing a print parameter of the print request with a release condition to determine whether the release condition is satisfied, determine whether to send the document for printing based on whether the release condition is satisfied; and a communication unit, coupled to the control unit, configured to: receive the print request, and send the document for printing on a printer if the release condition is satisfied.”; “The scan module 306 can enable examination of the document 310 in a variety of ways. For example, in one embodiment, the scan module 306 can implement a parsing functionality 328 to examine metadata of the document 310 to detect whether the document 310 contains an instance of confidential data. In one embodiment, the scan module 306 can further implement a pattern recognition functionality 324 to examine the document 310 for a pattern or sequence of text, numbers, or a combination thereof indicating an instance of confidential data.”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the networks, memory, and match head, as taught by Ledinauskas, to include a scanning control for a photocopier including a control operation of the scanning element in response to satisfying a verification, as taught by Romero.
The suggestion/motivation for doing so would have been that “the system 100 significantly improves data security for organizations because it prevents documents containing instances of confidential data from being printed to unauthorized users or unauthorized devices by implementing a rule-based process which can determine what documents should or should not be printed after a document has been identified as containing an instance of confidential data; it has been further discovered that the system 100 described above significantly improves data security because it allows print requests that are unauthorized to be terminated so as to prevent unauthorized printing and distribution of documents containing instances of confidential data; it has been further discovered that the system 100 significantly improves the ability to control the level of data security measures taken by an organization because it allows the customization of thresholds needed to print documents containing instances of confidential data so as to change requirements based on organizational needs and requirements.” (Romero, para. [0070]).
Ledinauskas, in view of Romero, teaches an output of the match head coupled to control operation of the scanning element (Ledinauskas, page 3, FIG. 1; Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition; page 5, left-side column, para. 1-4; Romero, abstract; para. [0058]; Ledinauskas teaches verifying, using the match head, the validity of the stamp in a scanned passport for security purposes; Romero teaches validating scanned documents that have some indication of security level access and preventing the printing or copying of the document using the scanner; one of ordinary skill in the art, when combing Ledinauskas and Romero, obtains the ability to use the Siamese ResNet machine learning model to verify a scanned passport, taught by Ledinauskas, to then control the scanner the passport was inserted into and throwing an alert to prevent the scanner (if it is confirmed that a matching stamp of the document has been found by the ML model) from making copies or printing out the passport to a user of the scanner without further verification, taught by Romero).
Therefore, it would have been obvious to combine Ledinauskas, with Romero, to obtain the invention as specified in claim 1.
Regarding claim 2, Ledinauskas, in view of Romero, teaches the scanning control of claim 1, further comprising: a contrastive loss function element disposed along a feedback path between the match head and the pair of identically weighted networks (Ledinauskas, page 5, left-side column, para. 5; page 5, right-side column, para. 1-2: “Triplet margin loss incentivizes the model to cluster visa pass stamps by their class. The loss function takes a triplet which consists of three embeddings: one arbitrary embedding called an anchor, a positive embedding which is of the same class as the anchor, and a negative embedding from a different class than the anchor. The minimum desired distance between anchor-positive and anchor-negative embedding pairs is parameterized by a hyperparameter α called a margin. We use the standard Euclidean distance to measure distance. Therefore, the triplet margin loss is given by the equation
PNG
media_image5.png
36
458
media_image5.png
Greyscale
where anchor, positive, and negative inputs are represented as a, p, and n respectively;
PNG
media_image6.png
46
440
media_image6.png
Greyscale
is the square of the Euclidean distance. As the model learns, anchors and positives get closer together while anchors and negatives become more distant. In each batch, we made use of the hard triplet mining method: we first computed embeddings and then took each item as an anchor and generated triplets by matching it with the hardest positive (the one furthest away from the anchor) and the hardest negative (the one closest to the anchor) from the current batch.”; the loss is used as feedback to train the Siamese ResNet equally weighted neural networks).
Claims 3-4, 6-8, 10, 12-13, 15-17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ledinauskas, in view of U.S. Patent Publication No.: 12,254,668 (Stoian et al.) (hereinafter Stoian), and in view of non-patent literature "Generalized contrastive optimization of Siamese networks for place recognition"; arXiv preprint arXiv:2103.06638 (2021) (Vallina et al.) (hereinafter Vallina).
Regarding claim 3, Ledinauskas teaches a system comprising: a pair of identically weighted networks configured to perform feature extraction on images generated by the scanner (Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition; abstract: “For general stamp country and entry/exit recognition we propose to use Siamese networks. Siamese networks able to work with multiple input images simultaneously and select relevant features for various machine learning tasks, including the estimation of similarity. Similarity estimation can be used for classification where instead of classifying inputs into predefined classes, Siamese networks find the most similar example in the database of class examples. The model consists of two parallel embedding networks that share weights between themselves.”; We propose an automated document analysis system that processes scanned visa pages and automatically extracts the travel pattern from detected stamps.)
a scanner; a match head comprising: at least one fully connected layer configured to output a match score for the images (Ledinauskas, page 3, FIG. 1; Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition; page 5, left-side column, para. 1-4: “Fig. 1: The proposed page processing pipeline. The image of visa page could be obtained by a passport scanner or from a video feed by the passport facing camera that is mounted on the table. After that the stamps are detected with a detector model. The detected stamps are later compared with a set of known stamp embeddings by using the similarity metric learning model which provides the country and travel direction of the most similar stamp in the database of stamp embeddings. If the stamp belongs to a Schengen country, then additionally the Schengen template and several additional models are being used to extract date, travel direction and a country.”
PNG
media_image4.png
206
940
media_image4.png
Greyscale
”;
“A similarity score can then be assigned to pairs of images by measuring the Euclidean distance (or some other distance metric) between their corresponding embedding vectors produced by the networks. While this approach requires more bookkeeping because of the database (in comparison to the classification model), the architecture is able to perform one-shot learning”; “The goal of learning a similarity metric is to train a model that would work in such a way that the distance between embeddings of two images of same stamp class would be as small as possible and the distance between embeddings of two different stamp classes would be as large as possible (note that same class here and below refers to stamps being from the same country and having the same entry or exit direction). We achieved this by employing a Siamese network that was trained using triplet loss … The architecture of our embedding network is a modified ResNet18 … ResNet18 is used to construct feature maps which are converted to embedding vectors by the head (the last layers) of the network. Our network head consists of average pooling layer, followed by fully connected layer and finally a layer that normalizes the output to unit vectors.”); and
a contrastive loss function element disposed along a feedback path from the fully connected layer to the pair of identically weighted networks (Ledinauskas, page 5, left-side column, para. 5; page 5, right-side column, para. 1-2: “Triplet margin loss incentivizes the model to cluster visa pass stamps by their class. The loss function takes a triplet which consists of three embeddings: one arbitrary embedding called an anchor, a positive embedding which is of the same class as the anchor, and a negative embedding from a different class than the anchor. The minimum desired distance between anchor-positive and anchor-negative embedding pairs is parameterized by a hyperparameter α called a margin. We use the standard Euclidean distance to measure distance. Therefore, the triplet margin loss is given by the equation
PNG
media_image5.png
36
458
media_image5.png
Greyscale
where anchor, positive, and negative inputs are represented as a, p, and n respectively;
PNG
media_image6.png
46
440
media_image6.png
Greyscale
is the square of the Euclidean distance. As the model learns, anchors and positives get closer together while anchors and negatives become more distant. In each batch, we made use of the hard triplet mining method: we first computed embeddings and then took each item as an anchor and generated triplets by matching it with the hardest positive (the one furthest away from the anchor) and the hardest negative (the one closest to the anchor) from the current batch.”; the loss is used as feedback to train the Siamese ResNet equally weighted neural network).
Ledinauskas fails to teach
a match head comprising a concatenation layer.
Stoian teaches
a match head comprising a concatenation layer (Stoian, col. 4, lines 20-67; col. 5, lines 1-36; FIG. 1: “The first step is to obtain a pair of feature maps 110 a, 110 b describing the most relevant information from both input images 106 a, 106 (e.g., IA and IB). Additionally, the goal is to build the pair of feature maps 110 a, 110 b in a co-dependent manner. Feature maps 110 a, 110 b comprise an array of features, with each feature being a numerical representation of the image data within an area of the input image … Coder 108 is a specialized image encoder network trained to generate feature maps 110 a, 110 b from a pair of input images 106a, 106b …
PNG
media_image7.png
503
520
media_image7.png
Greyscale
”;
As shown in annotated FIG. 1 below, the two input images 106A and 106 B are input into a Siamese neural network architecture outputting feature maps, and then the feature maps are concatenated 112 before being inputted into respective segmentation heads 114a 114b before continuing the matching process to check if the images have matching objects; the concatenation step 112 and the remainder of the rest of the steps from 114a – 124 are part of a “match head”; Examiner is interpreting the match head to include the concatenation step 112 similar to Applicant’s FIG. 15 in the present specification, where the concatenation step is the first step within the match head;
PNG
media_image8.png
512
1008
media_image8.png
Greyscale
).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the match head, as taught by Ledinauskas, to include a concatenation layer, as taught by Stoian.
The suggestion/motivation for doing so would have been that concatenating feature maps from different layers or branches of a neural network allows the model to combine information at various levels of abstraction; this leads to a more comprehensive and robust feature representation, improving the model's ability to understand complex visual patterns.
Ledinauskas, in view of Stoian, fails to teach
logic to apply gradient descent to optimize the contrastive loss function element.
Vallina teaches
logic to apply gradient descent to optimize the contrastive loss function element (Vallina, page 5, Section 3.2.3 Graident of the Generalized Contrastive Loss: “
PNG
media_image9.png
790
452
media_image9.png
Greyscale
”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the system, as taught by Ledinauskas, in view of Stoian, to include logic to apply gradient descent to optimize the contrastive loss function element, as taught by Vallina.
The suggestion/motivation for doing so would have been that gradient descent is widely used to optimize contrastive loss in Siamese networks due to its simplicity, computational efficiency, and scalability for high-dimensional data, making it a practical choice for large-scale image processing tasks.
Therefore, it would have been obvious to combine Ledinauskas, with Stoian and Vallina, to obtain the invention as specified in claim 3.
Regarding claim 4, Ledinauskas, in view of Stoian, and in view of Vallina, teaches the system of claim 3, wherein the pair of identically weighted networks comprises a pair of Resnet networks (Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition; page 5, left-side column, para. 1-4; see rejection of claim 3 above; Ledinauskas teaches Siamese (twin) neural networks that are each ResNet18 (residual networks) for embedding).
Regarding claim 6, Ledinauskas, in view of Stoian, and in view of Vallina, teaches the system of claim 3, the match head trained to generate predictions of whether or not an input image pair comprises a matching stamp pattern (Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition; page 5, left-side column, para. 1-4; see rejection of claim 3 above; a Siamese twin ResNet network is used to compare the stamp in a passport to a stamp database to see if there is a matching stamp pattern; FIG. 11; FIG. 12 show examples of predictions of matching stamps in passports to database of stamps:
PNG
media_image10.png
650
802
media_image10.png
Greyscale
;
PNG
media_image11.png
394
762
media_image11.png
Greyscale
).
Regarding claim 7, Ledinauskas, in view of Stoian, and in view of Vallina, teaches the system of claim 3, the contrastive loss function element configured to determine a distance-based loss metric (Ledinauskas, page 5, left-side column, para. 5; page 5, right-side column, para. 1-2; see rejection of claim 3 above; Euclidean distance is used to calculate the contrastive loss).
Regarding claim 8, Ledinauskas, in view of Stoian, and in view of Vallina, teaches the system of claim 3, the contrastive loss function element modeling the operation of a mechanical spring (Ledinauskas, page 5, left-side column, para. 5; page 5, right-side column, para. 1-2; see rejection of claim 3 above: “As the model learns, anchors and positives get closer together while anchors and negatives become more distant. In each batch, we made use of the hard triplet mining method: we first computed embeddings and then took each item as an anchor and generated triplets by matching it with the hardest positive (the one furthest away from the anchor) and the hardest negative (the one closest to the anchor) from the current batch”; Euclidean distance is used in contrastive loss function to see how close the weighted networks outputs are to each other to determine stamp matching for passports; Contrastive loss, inspired by a mechanical spring, aims to pull similar data points (positive pairs) closer together while pushing dissimilar data points (negative pairs) further apart in a learned embedding space; this behavior is analogous to how a spring exerts force to return to its equilibrium position when stretched or compressed; in contrastive learning, the "equilibrium position" is determined by the similarity of the data points, with positive pairs aiming to be close and negative pairs far apart; this is what is what taught in the learning model from Ledinauskas using Euclidean distance between the feature maps output by the ResNet Siamese neural network architecture to determine similarity).
Regarding claim 10, Ledinauskas, in view of Stoian, and in view of Vallina, teaches the system of claim 3, the contrastive loss function element operable to learn embeddings for the pair of identically weighted networks in which two similar points have a low Euclidean distance when compared to a lower first threshold and two dissimilar points have a large Euclidean distance when compared to an upper second threshold, wherein the upper second threshold is greater than the lower first threshold (Ledinauskas, page 5, left-side column, para. 5; page 5, right-side column, para. 1-2; see rejection of claim 3 above: “As the model learns, anchors and positives get closer together while anchors and negatives become more distant. In each batch, we made use of the hard triplet mining method: we first computed embeddings and then took each item as an anchor and generated triplets by matching it with the hardest positive (the one furthest away from the anchor) and the hardest negative (the one closest to the anchor) from the current batch”; Euclidean distance is used in contrastive loss function to see how close the weighted networks outputs are to each other to determine stamp matching for passports).
Regarding claim 12, Ledinauskas teaches a method comprising: operating a pair of identically weighted networks to perform feature extraction on an image pair, (Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition; abstract: “For general stamp country and entry/exit recognition we propose to use Siamese networks. Siamese networks able to work with multiple input images simultaneously and select relevant features for various machine learning tasks, including the estimation of similarity. Similarity estimation can be used for classification where instead of classifying inputs into predefined classes, Siamese networks find the most similar example in the database of class examples. The model consists of two parallel embedding networks that share weights between themselves.”; We propose an automated document analysis system that processes scanned visa pages and automatically extracts the travel pattern from detected stamps.)
wherein one image of the image pair is generated by a scanner and the other image of the image pair is a reference image; applying feature maps resulting from the feature extraction to at least one fully connected layer configured to output a match score for the image pair (Ledinauskas, page 3, FIG. 1; Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition; page 5, left-side column, para. 1-4: “Fig. 1: The proposed page processing pipeline. The image of visa page could be obtained by a passport scanner or from a video feed by the passport facing camera that is mounted on the table. After that the stamps are detected with a detector model. The detected stamps are later compared with a set of known stamp embeddings by using the similarity metric learning model which provides the country and travel direction of the most similar stamp in the database of stamp embeddings. If the stamp belongs to a Schengen country, then additionally the Schengen template and several additional models are being used to extract date, travel direction and a country.”
PNG
media_image4.png
206
940
media_image4.png
Greyscale
”;
“A similarity score can then be assigned to pairs of images by measuring the Euclidean distance (or some other distance metric) between their corresponding embedding vectors produced by the networks. While this approach requires more bookkeeping because of the database (in comparison to the classification model), the architecture is able to perform one-shot learning”; “The goal of learning a similarity metric is to train a model that would work in such a way that the distance between embeddings of two images of same stamp class would be as small as possible and the distance between embeddings of two different stamp classes would be as large as possible (note that same class here and below refers to stamps being from the same country and having the same entry or exit direction). We achieved this by employing a Siamese network that was trained using triplet loss … The architecture of our embedding network is a modified ResNet18 … ResNet18 is used to construct feature maps which are converted to embedding vectors by the head (the last layers) of the network. Our network head consists of average pooling layer, followed by fully connected layer and finally a layer that normalizes the output to unit vectors.”); and operating a contrastive loss function element disposed along a feedback path from the fully connected layer to the pair of identically weighted networks Ledinauskas, page 5, left-side column, para. 5; page 5, right-side column, para. 1-2: “Triplet margin loss incentivizes the model to cluster visa pass stamps by their class. The loss function takes a triplet which consists of three embeddings: one arbitrary embedding called an anchor, a positive embedding which is of the same class as the anchor, and a negative embedding from a different class than the anchor. The minimum desired distance between anchor-positive and anchor-negative embedding pairs is parameterized by a hyperparameter α called a margin. We use the standard Euclidean distance to measure distance. Therefore, the triplet margin loss is given by the equation
PNG
media_image5.png
36
458
media_image5.png
Greyscale
where anchor, positive, and negative inputs are represented as a, p, and n respectively;
PNG
media_image6.png
46
440
media_image6.png
Greyscale
is the square of the Euclidean distance. As the model learns, anchors and positives get closer together while anchors and negatives become more distant. In each batch, we made use of the hard triplet mining method: we first computed embeddings and then took each item as an anchor and generated triplets by matching it with the hardest positive (the one furthest away from the anchor) and the hardest negative (the one closest to the anchor) from the current batch.”; the loss is used as feedback to train the Siamese ResNet equally weighted neural network).
Ledinauskas fails to teach
a match head comprising a concatenation layer.
Stoian teaches
a match head comprising a concatenation layer (Stoian, col. 4, lines 20-67; col. 5, lines 1-36; FIG. 1: “The first step is to obtain a pair of feature maps 110 a, 110 b describing the most relevant information from both input images 106 a, 106 (e.g., IA and IB). Additionally, the goal is to build the pair of feature maps 110 a, 110 b in a co-dependent manner. Feature maps 110 a, 110 b comprise an array of features, with each feature being a numerical representation of the image data within an area of the input image … Coder 108 is a specialized image encoder network trained to generate feature maps 110 a, 110 b from a pair of input images 106a, 106b …
PNG
media_image7.png
503
520
media_image7.png
Greyscale
”;
As shown in annotated FIG. 1 below, the two input images 106A and 106 B are input into a Siamese neural network architecture outputting feature maps, and then the feature maps are concatenated 112 before being inputted into respective segmentation heads 114a 114b before continuing the matching process to check if the images have matching objects; the concatenation step 112 and the remainder of the rest of the steps from 114a – 124 are part of a “match head”; Examiner is interpreting the match head to include the concatenation step 112 similar to Applicant’s FIG. 15 in the present specification, where the concatenation step is the first step within the match head;
PNG
media_image8.png
512
1008
media_image8.png
Greyscale
).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the match head, as taught by Ledinauskas, to include a concatenation layer, as taught by Stoian.
The suggestion/motivation for doing so would have been that concatenating feature maps from different layers or branches of a neural network allows the model to combine information at various levels of abstraction; this leads to a more comprehensive and robust feature representation, improving the model's ability to understand complex visual patterns.
Ledinauskas, in view of Stoian, fails to teach
logic to apply gradient descent to optimize the contrastive loss function element.
Vallina teaches
logic to apply gradient descent to optimize the contrastive loss function element (Vallina, page 5, Section 3.2.3 Graident of the Generalized Contrastive Loss: “
PNG
media_image9.png
790
452
media_image9.png
Greyscale
”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the method, as taught by Ledinauskas, in view of Stoian, to include logic to apply gradient descent to optimize the contrastive loss function element, as taught by Vallina.
The suggestion/motivation for doing so would have been that gradient descent is widely used to optimize contrastive loss in Siamese networks due to its simplicity, computational efficiency, and scalability for high-dimensional data, making it a practical choice for large-scale image processing tasks. Therefore, it would have been obvious to combine Ledinauskas, with Stoian and Vallina, to obtain the invention as specified in claim 12.
With regards to dependent claims 13, 15-17 and 19, they recite the functions of the apparatus of claims 4, 6-8, and 10 as processes. Therefore, the analyses in rejecting claims 4, 6-8, and 10 are equally applicable to claims 13 and 15-17 and 19.
Claims 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Ledinauskas, in view of Stoian, in view of Vallina, and in view of Romero.
Regarding claim 11, Ledinauskas, in view of Stoian, in view of Vallina, teaches the system of claim 3, further comprising: a pattern localizer to form cropped regions of a scanned page (Ledinauskas, page 2, right-side column, para. 1: “In contrast, for Schengen area stamps we crop parts of the stamp using a predefined template and subsequently classify those parts. We find this approach to be robust enough because Schengen area stamps have a standardized layout across all of the member countries. An additional stamp segmentation stage can be used before stamp similarity model to remove overlapping stamps.”); and
logic to: form image pairs pairing the cropped regions with registered security patterns (Ledinauskas, page 8, right-side column, Section 3.4 Stamp Segmentation: “After training the stamp segmentation model reached the mean Dice coefficient D = 0.909 on the validation set. The model successfully segmented 1822 stamp images that were used as training data for other models. An example of the performance of the segmentation model on unseen (synthetic) data is shown in figure 9. Stamp a) demonstrates problems with segmentation in highly figured backgrounds with darker features present. A portion of the highly figured background is misidentified as a part of the stamp. Stamps b) and c) demonstrate the model’s ability to accurately segment stamps from slightly figured backgrounds (i.e. patterns or overlapping text). Stamp d) shows the model’s performance on clearer backgrounds. Finally, model’s performance on overlapping stamps is shown in fig. 9 e) and f). It can be seen that the model can successfully separate the main stamp given that the secondary stamp takes up relatively small part of the image. However, if two stamps are similar in size accuracy drops as both stamps are recognized as one.”;
PNG
media_image12.png
616
906
media_image12.png
Greyscale
).
apply the image pairs to the pair of identically weighted networks to determine a Euclidean distance between the images of the image pair (Ledinauskas, page 2, right-side column, para. 1; as previously discussed, the stamp segmentation stage can be used before stamp similarity model; Ledinauskas, page 3, FIG. 1; Ledinauskas, page 4, right-side column; Section 2.4: Similarity metric learning for stamp recognition; page 5, left-side column, para. 1-4; see rejection of claim 3 above for Euclidean distance discussion; although FIG. 1 shows the cropping after the similarity model it also is used before the similarity model Siamese ResNet twin neural network architecture so that the two images input into both branches are the cropped portion of the stamp from the scanned passport document (stamp segmentation) and the template stamp from the database also cropped and formatted properly with affine transformations; see FIG. 9 above for example of images pairs input;); and
on condition that the Euclidean distance satisfies a preset threshold, generate a signal (Ledinauskas, page 11, right-side column, Section: 3.6 Final Automatic travel pattern extraction solution; FIG. 13: “After loading the scanned visa pages the program can be used to automatically detect the stamps and recognize date, country, entry/exit symbols and then fill the travel pattern table.”;
PNG
media_image13.png
658
828
media_image13.png
Greyscale
;
the travel pattern is only generated if there is a match between the scanned passport and the stamp database which is determined by the similarity score determined by the Euclidean distance).
Ledinauskas, in view of Stoian, and in view of Vallina, fails to teach
on condition that the Euclidean distance satisfies a preset threshold, generate a signal to inhibit operation of the scanner.
Romero teaches
on condition that satisfies a preset threshold, generate a signal to inhibit operation of the scanner (Romero, abstract; para. [0067]: “A system and method of operating a data security scanner including: a control unit configured to: examine a document to detect an instance of confidential data, generate an alert based on finding the instance of confidential data, initiate a verification process based on generating the alert, wherein the verification process fulfills the print request by comparing a print parameter of the print request with a release condition to determine whether the release condition is satisfied, determine whether to send the document for printing based on whether the release condition is satisfied and send the document for printing on a printer if the release condition is satisfied”; “In one embodiment, where more than one release condition 334 is used as a part of the verification process 332, the verification module 316 can further implement a threshold 338 such that the flag 336 is not generated until the threshold 338 is satisfied. The threshold 338 refers to a value indicating a minimum number of release conditions that must be satisfied before the flag 336 is generated. The threshold 338 can be represented as a percentage, an absolute value, or a combination thereof. For example, the threshold 338 can be set to a percentage, for example “90%” such that only when ninety (90) percent or more of the release conditions are satisfied, the flag 336 is generated indicating the release condition 334 is satisfied.”).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the system, as taught by Ledinauskas, in view of Stoian, in view of Vallina, to include logic to on condition that satisfies a preset threshold, generate a signal to inhibit operation of the scanner, as taught by Romero.
The suggestion/motivation for doing so would have been to “significantly improve[s] the ability to automatically recognize instances of documents that should not be printed without the need for human intervention and decision making and further allows the system 100 to process large numbers of print requests to determine whether documents should be printed or not” (Romero, para. [0069]).
Ledinauskas, in view of Stoian, in view of Vallina, and in view of Romero, teaches on condition that the Euclidean distance satisfies a preset threshold, generate a signal to inhibit operation of the scanner (Ledinauskas, page 5, left-side column, para. 5; page 5, right-side column, para. 1-2; see rejection of claim 3 above; Romero, abstract; para. [0067]; by combing Ledinauskas, in view of Havabi, and Romero, one of ordinary skill in the art analyzes stamps in passports that are scanned and need to verified for security purposes using Euclidean distances for the verification, taught by Ledinauskas and then uses the concept of inhibiting operation of a scanner, taught by Romero, if the case arises that the passport fails to contain the correct stamp or watermark that indicates the authenticity of the passport; in this case, using the concept from Romero in combination with Ledinauskas, in view of Stoian, and in view of Vallina, the scanner that scanned the passport fails to print the passport out or stops working until further verification from security personal of the passport is done).
Therefore, it would have been obvious to combine Ledinauskas, Stoian, and Vallina, with Romero, to obtain the invention as specified in claim 11.
With regards to dependent claim 20, it recites the functions of the apparatus of claim 11 as a process. Therefore, the analysis in rejecting claim 11 is equally applicable to claim 20.
Claims 5 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Ledinauskas, in view of Stoian, in view of Vallina, and in view of U.S. Patent Application Publication No.: 2023/0229897 (Komkov et al.) (hereinafter Komkov).
Regarding claim 5, Ledinauskas, in view of Stoian, and in view of Vallina, teaches the system of claim 3.
Ledinauskas, in view of Stoian, and in view of Vallina, fails to teach
the match head further comprising a flattening layer.
Komkov teaches
the match head further comprising a flattening layer (Komkov, para. [0206]-[0208]; FIG. 10C: “FIG. 10B shows an exemplary neural network architecture for open-set inference phase. Similar architecture is applicable for image recognition in FaceID. The input image 1010 and the convolutional NN layers 1110_1 to 1110_N are similar as described above with reference to FIG. 10A. In this architecture, there is no pooling layer. Rather, the output 7×7×S tensor of the last convolutional layer is flattened 1050 to a one-dimensional feature vector with having a size of 49S. Then, a fully connected (FC) layer 1060 multiplies the input vector of size 49S with a matrix of the size 49S×512. Here, 512 is a typically used value … The output feature vector 1065 with the size 512 (features) may then be stored for being compared with other such feature vectors (obtained from other input images) to assess similarity … FIG. 10C illustrates an exemplary neural network architecture for open-set training phase. The input image 1010 and the convolutional NN layers 1110_1 to 1110_N as well as the flattening layer 1050 and the first FC layer 1060 are similar as described above with reference to FIG. 10B. In addition, a second FC layer may be used for the training phase, which multiplies the feature vector of size 512 with a matrix of size 512×K. Thus, the result 1075 are similarities to the respective K classes which are trained in the training phase.”;
PNG
media_image14.png
663
995
media_image14.png
Greyscale
).
It would have been obvious to a person having ordinary skill in the art before the time of the effective filing date of the claimed invention of the instant application to modify the match head, as taught by Ledinauskas, in view of Stoian, and in view of Vallina, to include a flattening layer, as taught by Komkov.
The suggestion/motivation for doing so would have been that flattening layers I neural networks converts the multi-dimensional output of convolutional layers into a 1D vector, enabling the subsequent fully connected layers to process the data; this process simplifies data handling and reduces model complexity, preventing overfitting.
Therefore, it would have been obvious to combine Ledinauskas, Stoian, and Vallina, with Komkov, to obtain the invention as specified in claim 5.
With regards to dependent claim 14, it recites the functions of the apparatus of claim 5 as a process. Therefore, the analysis in rejecting claim 5 is equally applicable to claim 14.
Conclusion
Applicant's amendment necessitated the new ground of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL ADAM SHARIFF whose telephone number is 571-272-9741. The examiner can normally be reached M-F 8:30-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 571-272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL ADAM SHARIFF/
Examiner, Art Unit 2672
/SUMATI LEFKOWITZ/Supervisory Patent Examiner, Art Unit 2672