DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Remarks
This action is in response to the applicant’s response filed 15 December 2025, which is in response to the USPTO office action mailed 18 September 2025. Claims 1, 13, 18 and 19 are amended. Claims 1-20 are currently pending.
Response to Arguments
With respect to the 35 USC §103 rejections of claims 1-20, the applicant’s arguments are moot in view of a new grounds of rejection, as necessitated by the applicant's amendments.
With respect to the claim interpretation under 35 USC §112(f), the applicant’s arguments have been fully considered but are not deemed persuasive.
The applicant argues “Claim 13 recites sufficient structure to avoid interpretation under 35 U.S.C. §112(f). The claim now expressly defines the ‘image comparator’ as including a vectorizer and a scorer, the ‘optical character recognition engine’ as configured to perform preprocessing, segmentation, and recognition, the ‘word tokenizer’ as implemented as a segmentation processor, and the ‘text comparator’ as executing match-count, match-spread, and out-of-order analyses” (Remarks, pg. 7). Respectfully, this argument is not persuasive.
With respect to the “image comparator“ including a vectorizer and a scorer, as well as the “word tokenizer“ implemented as a segmentation processor, the examiner respectfully disagrees that these terms impart any structural significance to the terms “image comparator” and “word tokenizer” respectively. The specification does not provide a description sufficient to inform one of ordinary skill in the art that the terms denote structure. Furthermore, the prior art does not provide evidence that the terms are an art-recognized structure to perform the claimed function. In other words, the “vectorizer”, “scorer” and “segmentation processor” are non-structural modifiers that do not have any generally understood structural meaning in the art. See MPEP §2181, subsection I.C for more detail.
With respect to the “optical character recognition engine“ configured to perform preprocessing, segmentation, and recognition as well as the “text comparator“ as executing match-count, match-spread, and out-of-order analyses, the “preprocessing” and “executing” steps appear to be functional language modifying the generic place holders “optical character recognition engine” and “text comparator” respectively.
The 3-prong analysis for applying 35 USC 112(f) to a claim includes “(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; (B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as "configured to" or "so that"; and (C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.” See MPEP §2181, subsection I. In this case, the claims do not appear to recite a generic place holder modified by sufficient structure, material or acts for performing the claimed function. Therefore, the applicant’s argument is not persuasive.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.
Such claim limitation(s) is/are: “an image comparator”, “an optical character recognition engine”, “a word tokenizer” and “a text comparator” in claim 13. “A vectorizer” and “a scorer” in claim 18.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-9, 13-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over GAO et al., US 2024/0220999 A1 (hereinafter “Gao”) in view of Reisswig et al., US 2021/0150201 A1 (hereinafter “Reisswig”) in further view of Li et al., US 2013/0016899 A1 (hereinafter “Li”).
Claim 1: Gao teaches a method for identifying an input image as a placeholder, the method comprising the steps of:
evaluating a match of the input image to a set of known placeholder images (Gao, [Fig. 3B] note 326-330, [0065] note apply an SNN technique to compare the fingerprint associated to the reference image data (e.g., stock image 500) with the fingerprint obtained running the SNN inference process on the captured image 400… At step 330, the processor 135 determines an item match score based on the output vector distance between the two fingerprints: the one associated to the item image 400 and the other associated to the stock image 500 and computes an item match score);
extracting a set of text characters from the input image (Gao, [Fig. 3B] note 312-318, [0057] note employ an OCR technique to compare optical characters in the cropped image 600 with those in the stock descriptor associated to the image 500, selection step 310 proceeds along OCR branch 312. At step 314, the processor 135 processes the cropped image 600 to extract numeric and alphanumeric text present in the image to enable the OCR comparison);
generating a placeholder text match score from the input image words evaluated against a placeholder text wordlist of known placeholder phrases each including one or more known placeholder words (Gao, [0057] note At step 316, the processor 135 compares the text extracted from the cropped image 600 to the text present in the descriptor associated to the reference image data 500. In some embodiments, the text for the reference image data 500 may be previously known and stored in the database, along with other meta-information listed above as possible data fields of the descriptor, to expedite processing. At step 318, the processor tallies the number of matching optical characters present in the cropped image 600 and the descriptor associated to the reference image data 500 and computes an item match score corresponding to a match rate for those optical characters); and
flagging the input image as a placeholder based at least partially upon the placeholder text match score (Gao, [Fig. 3B] note 320, 332, 340, [0060] note If, on the other hand, the item match score fails to equal or exceed the threshold match score, this may indicate an instance of label swapping or other issue that requires further attention by store personnel).
Gao does not explicitly teach each placeholder image being a non-product catalog substitute image; in response to the input image not matching any known placeholder image; and tokenizing the set of text characters into constituent input image words of one or more phrases.
However, Reisswig teaches tokenizing the set of text characters into constituent input image words of one or more phrases (Reisswig, [0023] note to preserve the positional information, the label system described herein may perform a positional embedding to preserve the position information. The positional embedding may preserve two-dimensional coordinates corresponding to the positions of the words or tokens of the document. In some embodiments, the label system may receive a document and/or document image as a data file. The label system may identify the characters, words, and/or other groupings of characters as the tokens of the document. For example, the label system may perform an optical character recognition (OCR) process to identify the characters and/or words, [0034] note each element of the sequence (such as a word or character) may be compared to each other element in each layer of label network 116. In this manner, this comparison may identify out-of-order relationships between arbitrary elements of an input sequence. By including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved, [0051] note a token may be a word, phrase, sentence, paragraph, or other organization of characters).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao with the OCR including token identification and positional information of Reisswig according to known methods (i.e. identifying tokens and positional information using OCR). Motivation for doing so is that, by including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved (Reisswig, [0034]).
Gao and Reisswig do not explicitly teach each placeholder image being a non-product catalog substitute image; in response to the input image not matching any known placeholder image.
However, Li teaches this (Li, [Fig. 2], [0039] note at block 201, the method 200 includes receiving a query image, [0041] note At block 203, the method 200 includes matching the query image to an object using a visual object recognition module, [0045] note At block 205, the method 200 includes determining a matched region within a training image to which the query image matches using the visual object recognition module, [0073] note training images include: stock images reused in images of different objects; i.e. stock images read on non-product catalog substitute images, [0048] note At block 209, a decision may be made based on the determination at block 207, [0051] note At block 213, the method 200 includes identifying an annotation associated with an entirety of the training image. Accordingly, the annotation may be returned responsive to the query image being received. The query image may be determined to be a match to the training image as a whole, and metadata of the object which the training image depicts may be identified and output).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao and Reisswig with the annotation identification in stock images of Li according to known methods (i.e. identifying annotations associated stock images reused in images of different objects). Motivation for doing so is that this improves visual object recognition (Li, [0022]).
Claim 2: Gao, Reisswig and Li teach the method of claim 1, wherein the step of generating the placeholder text match score includes:
generating match count values for the one or more phrases based upon a number of the input image words in a given one of the phrases matching known placeholder words in the placeholder text wordlist (Reisswig, [0057] note At step 318, the processor tallies the number of matching optical characters present in the cropped image 600 and the descriptor associated to the reference image data 500 and computes an item match score corresponding to a match rate for those optical characters);
generating match spread values of the one or more phrases between a first one of the input image words in the given one of the phrases matching one of the known placeholder words and a last one of the input image words in the given one of the phrases matching another one of the known placeholder words (Reisswig, [0034] note each element of the sequence (such as a word or character) may be compared to each other element in each layer of label network 116. In this manner, this comparison may identify out-of-order relationships between arbitrary elements of an input sequence. By including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved); and
deriving the placeholder text match score for a selected one of the one or more phrases as a function of a corresponding one of the match count values and a corresponding one of the match spread values (Gao, [0065] note At step 332, the processor 135 compares that item match score to a threshold match score to determine whether the images 400, 500 match for verifying the transaction, [0066] note If the item match score equals or exceeds the threshold match score, the comparison step 332 proceeds along the PASS branch… If, on the other hand, the item match score fails to equal or exceed the threshold match score, this may indicate an instance of label swapping or other issue that requires further attention by store personnel. In this scenario, the comparison step 332 proceeds along the FAIL branch).
Claim 3: Gao, Reisswig and Li teach the method of claim 2, further comprising:
flagging the input image as a non-placeholder in response to an evaluation of the match spread value for the selected one of the one or more phrases against a predefined match spread threshold (Reisswig, [0010] note generating document labels using positional embeddings, [0029] note label processor 114 may further identify “wi” as a sequence of input word vectors. For example, each word vector may represent a word of document 120).
Claim 4: Gao, Reisswig and Li teach the method of claim 2, further comprising:
flagging the input image as a non-placeholder in response to an evaluation of an out-of-order ratio for the input image words against a predefined word order ratio threshold (Reisswig, [0034] note each element of the sequence (such as a word or character) may be compared to each other element in each layer of label network 116. In this manner, this comparison may identify out-of-order relationships between arbitrary elements of an input sequence. By including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved).
Claim 5: Gao, Reisswig and Li teach the method of claim 1, wherein the step of evaluating the match of the input image to the set of known placeholder image includes:
converting the input image to input image vector values (Gao, [0065] note At step 328, the processor 135 applies the SNN inference process to the item image 400 obtained during the customer transaction for analysis. The result of the inference process is a fingerprint associated to the item image 400, [0078] note Information useful for the fingerprint analysis 1004 (e.g., SNN) may include n-dimensional vector(s));
generating a placeholder image match score from a query to a search index with the input image vector values, the search index being populated with known placeholder image vector values generated from the set of known placeholder images (Gao, [0065] note At step 329, the fingerprint and/or other features generated from the captured image 400 may be compared to the related fingerprint and/or features included within the reference descriptor. At step 330, the processor 135 determines an item match score based on the output vector distance between the two fingerprints: the one associated to the item image 400 and the other associated to the stock image 500 and computes an item match score); and
flagging the input image as a placeholder based upon the placeholder image match score (Gao, [Fig. 3B] note 320, 332, 340, [0060] note If, on the other hand, the item match score fails to equal or exceed the threshold match score, this may indicate an instance of label swapping or other issue that requires further attention by store personnel).
Claim 6: Gao, Reisswig and Li teach the method of claim 5, wherein:
the converting of the input image to the input image vector values is performed by a machine learning system; and the known placeholder image vector values is generated by the machine learning system (Gao, [0078] note Information useful for the fingerprint analysis 1004 (e.g., SNN) may include n-dimensional vector(s)).
Claim 7: Gao, Reisswig and Li teach the method of claim 6, wherein the machine learning system is a convolutional neural network (Gao, [0062] note employ a trained object recognition model, such as a neural network, to perform the fingerprint-comparison process. Briefly, the neural network may be trained with image data representative of the items for sale in the retail establishment, where the image data may include images of the items, optical codes, and/or other optical characters, such as numeric and alphanumeric information. The neural network may be trained using a suitable machine learning program that involves identifying and recognizing patterns in existing data, such as image data of items 20).
Claim 8: Gao, Reisswig and Li teach the method of claim 2, further comprising:
receiving an input of a confirmed evaluated match of the input image to the set of known placeholder images (Gao, [0065] note At step 332, the processor 135 compares that item match score to a threshold match score to determine whether the images 400, 500 match for verifying the transaction, [0066] note If the item match score equals or exceeds the threshold match score, the comparison step 332 proceeds along the PASS branch… If, on the other hand, the item match score fails to equal or exceed the threshold match score, this may indicate an instance of label swapping or other issue that requires further attention by store personnel. In this scenario, the comparison step 332 proceeds along the FAIL branch).
Claim 9: Gao, Reisswig and Li teach the method of claim 1, wherein the step of extracting the text characters from the input image is performed by an optical character recognition system (Gao, [0057] note employ an OCR technique to compare optical characters in the cropped image 600 with those in the stock descriptor associated to the image 500).
Claim 13: Gao teaches a system for identifying a placeholder image in a catalog, the system comprising:
an image comparator including a vectorizer and a scorer, the image comparator being receptive to an input image, a placeholder image match score being generated by the image comparator from the input image (Note, this limitation is interpreted as the hardware along with the algorithm described in the specification, Gao, [Fig. 3B] note 326-330, [0065] note apply an SNN technique to compare the fingerprint associated to the reference image data (e.g., stock image 500) with the fingerprint obtained running the SNN inference process on the captured image 400… At step 330, the processor 135 determines an item match score based on the output vector distance between the two fingerprints: the one associated to the item image 400 and the other associated to the stock image 500 and computes an item match score);
an optical character recognition engine configured to perform preprocessing, segmentation, and recognition, and further being receptive to the input image and outputting a set of text characters from the input image, the set of text characters being sequenced as constituent input image words of one or more phrases (Note, this limitation is interpreted as the hardware along with the algorithm described in the specification, Gao, [Fig. 3B] note 312-318, [0057] note employ an OCR technique to compare optical characters in the cropped image 600 with those in the stock descriptor associated to the image 500, selection step 310 proceeds along OCR branch 312. At step 314, the processor 135 processes the cropped image 600 to extract numeric and alphanumeric text present in the image to enable the OCR comparison);
a placeholder text wordlist database including one or more known placeholder phrases each including one more known placeholder words (Gao, [0057] note the text for the reference image data 500 may be previously known and stored in the database, along with other meta-information listed above as possible data fields of the descriptor, to expedite processing, [0071] note item classification for each item may be designated and stored in the database along with the stock image and other information for each item); and
a text comparator, the text comparator being connected to the placeholder text wordlist database and receptive to the input image words, a placeholder text match score being generated by the text comparator from an evaluation of the input image words against the placeholder text wordlist database, and a placeholder image identification being made based at least partially on the placeholder text match score (Note, this limitation is interpreted as the hardware along with the algorithm described in the specification, Gao, [0057] note At step 316, the processor 135 compares the text extracted from the cropped image 600 to the text present in the descriptor associated to the reference image data 500. In some embodiments, the text for the reference image data 500 may be previously known and stored in the database, along with other meta-information listed above as possible data fields of the descriptor, to expedite processing. At step 318, the processor tallies the number of matching optical characters present in the cropped image 600 and the descriptor associated to the reference image data 500 and computes an item match score corresponding to a match rate for those optical characters).
Gao does not explicitly teach a word tokenizer implemented as a segmentation processor grouping the set of text character into the input image words of the one or more phrases; executing match-count, match-spread, and out-of-order analyses; and a non-product catalog substitute image.
However, Reisswig teaches a word tokenizer implemented as a segmentation processor grouping the set of text character into the input image words of the one or more phrases (Note, this limitation is interpreted as the hardware along with the algorithm described in the specification, Reisswig, [0023] note to preserve the positional information, the label system described herein may perform a positional embedding to preserve the position information. The positional embedding may preserve two-dimensional coordinates corresponding to the positions of the words or tokens of the document. In some embodiments, the label system may receive a document and/or document image as a data file. The label system may identify the characters, words, and/or other groupings of characters as the tokens of the document. For example, the label system may perform an optical character recognition (OCR) process to identify the characters and/or words, [0034] note each element of the sequence (such as a word or character) may be compared to each other element in each layer of label network 116. In this manner, this comparison may identify out-of-order relationships between arbitrary elements of an input sequence. By including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved, [0051] note a token may be a word, phrase, sentence, paragraph, or other organization of characters); and
executing match-count, match-spread, and out-of-order analyses (Reisswig, [0010] note generating document labels using positional embeddings, [0029] note label processor 114 may further identify “wi” as a sequence of input word vectors. For example, each word vector may represent a word of document 120, [0034] note each element of the sequence (such as a word or character) may be compared to each other element in each layer of label network 116. In this manner, this comparison may identify out-of-order relationships between arbitrary elements of an input sequence. By including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved, [0057] note At step 318, the processor tallies the number of matching optical characters present in the cropped image 600 and the descriptor associated to the reference image data 500 and computes an item match score corresponding to a match rate for those optical characters).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao with the OCR including token identification and positional information of Reisswig according to known methods (i.e. identifying tokens and positional information using OCR). Motivation for doing so is that, by including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved (Reisswig, [0034]).
Gao and Reisswig do not explicitly teach a non-product catalog substitute image.
However, Li teaches this (Li, [Fig. 2], [0039] note at block 201, the method 200 includes receiving a query image, [0041] note At block 203, the method 200 includes matching the query image to an object using a visual object recognition module, [0045] note At block 205, the method 200 includes determining a matched region within a training image to which the query image matches using the visual object recognition module, [0073] note training images include: stock images reused in images of different objects; i.e. stock images read on non-product catalog substitute images, [0048] note At block 209, a decision may be made based on the determination at block 207, [0051] note At block 213, the method 200 includes identifying an annotation associated with an entirety of the training image. Accordingly, the annotation may be returned responsive to the query image being received. The query image may be determined to be a match to the training image as a whole, and metadata of the object which the training image depicts may be identified and output).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao and Reisswig with the annotation identification in stock images of Li according to known methods (i.e. identifying annotations associated stock images reused in images of different objects). Motivation for doing so is that this improves visual object recognition (Li, [0022]).
Claim 14: Gao, Reisswig and Li teach the system of claim 13, wherein:
the text comparator generates match count values for the one or more phrases based upon a number of input image words in a given one of the phrases matching known placeholder words (Note, this limitation is interpreted as the hardware along with the algorithm described in the specification, Reisswig, [0057] note At step 318, the processor tallies the number of matching optical characters present in the cropped image 600 and the descriptor associated to the reference image data 500 and computes an item match score corresponding to a match rate for those optical characters);
the text comparator generates match spread values for the one or more phrases, a given one of the match spread values being based upon a first one of the input image words in the given one of the phrases matching one of the known placeholder words and a last one of the input image words in the given one of the phrases matching another one of the known placeholder words (Note, this limitation is interpreted as the hardware along with the algorithm described in the specification, Reisswig, [0034] note each element of the sequence (such as a word or character) may be compared to each other element in each layer of label network 116. In this manner, this comparison may identify out-of-order relationships between arbitrary elements of an input sequence. By including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved); and
the placeholder text match score is derived from a selected one of the one or more phrases as a function of a corresponding one of the match count values and a corresponding one of the match spread values (Gao, [0065] note At step 332, the processor 135 compares that item match score to a threshold match score to determine whether the images 400, 500 match for verifying the transaction, [0066] note If the item match score equals or exceeds the threshold match score, the comparison step 332 proceeds along the PASS branch… If, on the other hand, the item match score fails to equal or exceed the threshold match score, this may indicate an instance of label swapping or other issue that requires further attention by store personnel. In this scenario, the comparison step 332 proceeds along the FAIL branch).
Claim 15: Gao, Reisswig and Li teach the system of claim 14, wherein the placeholder image identification is made based at least partially on an evaluation of the match spread value for the selected one of the one or more phrases against a predefined match spread threshold (Reisswig, [0010] note generating document labels using positional embeddings, [0029] note label processor 114 may further identify “wi” as a sequence of input word vectors. For example, each word vector may represent a word of document 120).
Claim 16: Gao, Reisswig and Li teach the system of claim 14, wherein the placeholder image identification is made based at least partially on an evaluation of an out-of-order ratio for the input image words against a predefined word order ratio threshold (Reisswig, [0034] note each element of the sequence (such as a word or character) may be compared to each other element in each layer of label network 116. In this manner, this comparison may identify out-of-order relationships between arbitrary elements of an input sequence. By including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved).
Claim 18: Gao, Reisswig and Li teach the system of claim 13, further comprising a placeholder image index including one or more known placeholder image vector values wherein the vectorizer of the image comparator configured to generate input image vector values from the input image, and the one or more known placeholder image vector values from known placeholder images (Note, this limitation is interpreted as the hardware along with the algorithm described in the specification, Gao, [0057] note the text for the reference image data 500 may be previously known and stored in the database, along with other meta-information listed above as possible data fields of the descriptor, to expedite processing, [0065] note At step 328, the processor 135 applies the SNN inference process to the item image 400 obtained during the customer transaction for analysis. The result of the inference process is a fingerprint associated to the item image 400, [0071] note item classification for each item may be designated and stored in the database along with the stock image and other information for each item, [0078] note Information useful for the fingerprint analysis 1004 (e.g., SNN) may include n-dimensional vector(s));
the scorer configured to evaluate similarity between vector values and to generate the placeholder image match score from a query to the placeholder image index with the input image vector values (Note, this limitation is interpreted as the hardware along with the algorithm described in the specification, Gao, [0065] note At step 329, the fingerprint and/or other features generated from the captured image 400 may be compared to the related fingerprint and/or features included within the reference descriptor. At step 330, the processor 135 determines an item match score based on the output vector distance between the two fingerprints: the one associated to the item image 400 and the other associated to the stock image 500 and computes an item match score).
Claim 19: Gao teaches an article of manufacture comprising a non-transitory program storage medium readable by a computing device, the medium tangibly embodying one or more programs of instructions executable by the computing device to perform a method for identifying an input image as a placeholder, the method comprising the steps of:
evaluating a match of the input image to a set of known placeholder images (Gao, [Fig. 3B] note 326-330, [0065] note apply an SNN technique to compare the fingerprint associated to the reference image data (e.g., stock image 500) with the fingerprint obtained running the SNN inference process on the captured image 400… At step 330, the processor 135 determines an item match score based on the output vector distance between the two fingerprints: the one associated to the item image 400 and the other associated to the stock image 500 and computes an item match score);
extracting a set of text characters from the input image (Gao, [Fig. 3B] note 312-318, [0057] note employ an OCR technique to compare optical characters in the cropped image 600 with those in the stock descriptor associated to the image 500, selection step 310 proceeds along OCR branch 312. At step 314, the processor 135 processes the cropped image 600 to extract numeric and alphanumeric text present in the image to enable the OCR comparison);
generating a placeholder text match score from the input image words evaluated against a placeholder text wordlist of known placeholder phrases each including one or more known placeholder words (Gao, [0057] note At step 316, the processor 135 compares the text extracted from the cropped image 600 to the text present in the descriptor associated to the reference image data 500. In some embodiments, the text for the reference image data 500 may be previously known and stored in the database, along with other meta-information listed above as possible data fields of the descriptor, to expedite processing. At step 318, the processor tallies the number of matching optical characters present in the cropped image 600 and the descriptor associated to the reference image data 500 and computes an item match score corresponding to a match rate for those optical characters); and
flagging the input image as a placeholder based upon the placeholder text match score (Gao, [Fig. 3B] note 320, 332, 340, [0060] note If, on the other hand, the item match score fails to equal or exceed the threshold match score, this may indicate an instance of label swapping or other issue that requires further attention by store personnel).
Gao does not explicitly teach each placeholder image being anon-product catalog substitute image; in response to the input image not matching any known placeholder image; and tokenizing the set of text characters into constituent input image words of one or more phrases.
However, Reisswig teaches tokenizing the set of text characters into constituent input image words of one or more phrases (Reisswig, [0023] note to preserve the positional information, the label system described herein may perform a positional embedding to preserve the position information. The positional embedding may preserve two-dimensional coordinates corresponding to the positions of the words or tokens of the document. In some embodiments, the label system may receive a document and/or document image as a data file. The label system may identify the characters, words, and/or other groupings of characters as the tokens of the document. For example, the label system may perform an optical character recognition (OCR) process to identify the characters and/or words, [0034] note each element of the sequence (such as a word or character) may be compared to each other element in each layer of label network 116. In this manner, this comparison may identify out-of-order relationships between arbitrary elements of an input sequence. By including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved, [0051] note a token may be a word, phrase, sentence, paragraph, or other organization of characters).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao with the OCR including token identification and positional information of Reisswig according to known methods (i.e. identifying tokens and positional information using OCR). Motivation for doing so is that, by including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved (Reisswig, [0034]).
Gao and Reisswig do not explicitly teach each placeholder image being anon-product catalog substitute image; and in response to the input image not matching any known placeholder image.
However, Li teaches this (Li, [Fig. 2], [0039] note at block 201, the method 200 includes receiving a query image, [0041] note At block 203, the method 200 includes matching the query image to an object using a visual object recognition module, [0045] note At block 205, the method 200 includes determining a matched region within a training image to which the query image matches using the visual object recognition module, [0073] note training images include: stock images reused in images of different objects; i.e. stock images read on non-product catalog substitute images, [0048] note At block 209, a decision may be made based on the determination at block 207, [0051] note At block 213, the method 200 includes identifying an annotation associated with an entirety of the training image. Accordingly, the annotation may be returned responsive to the query image being received. The query image may be determined to be a match to the training image as a whole, and metadata of the object which the training image depicts may be identified and output).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao and Reisswig with the annotation identification in stock images of Li according to known methods (i.e. identifying annotations associated stock images reused in images of different objects). Motivation for doing so is that this improves visual object recognition (Li, [0022]).
Claim 20: Gao, Reisswig and Li teach the article of manufacture of claim 19, wherein the step of generating the placeholder text match score embodied as the one or more programs of instructions includes:
generating match count values for the one or more phrases based upon a number of the input image words in a given one of the phrases matching known placeholder words in the placeholder text wordlist (Reisswig, [0057] note At step 318, the processor tallies the number of matching optical characters present in the cropped image 600 and the descriptor associated to the reference image data 500 and computes an item match score corresponding to a match rate for those optical characters);
generating match spread values of the one or more phrases between a first one of the input image words in the given one of the phrases matching one of the known placeholder words and a last one of the input image words in the given one of the phrases matching another one of the known placeholder words (Reisswig, [0034] note each element of the sequence (such as a word or character) may be compared to each other element in each layer of label network 116. In this manner, this comparison may identify out-of-order relationships between arbitrary elements of an input sequence. By including positional embedding information when using these transformer-based systems, spatial relationships between elements may be preserved); and
deriving the placeholder text match score for a selected one of the one or more phrases as a function of a corresponding one of the match count values and a corresponding one of the match spread values (Gao, [0065] note At step 332, the processor 135 compares that item match score to a threshold match score to determine whether the images 400, 500 match for verifying the transaction, [0066] note If the item match score equals or exceeds the threshold match score, the comparison step 332 proceeds along the PASS branch… If, on the other hand, the item match score fails to equal or exceed the threshold match score, this may indicate an instance of label swapping or other issue that requires further attention by store personnel. In this scenario, the comparison step 332 proceeds along the FAIL branch).
Claims 10-12 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Gao, Reisswig and Li in further view of Para et al., US 2024/0193469 A1 (hereinafter “Para”).
Claim 10: Gao, Reisswig and Li do not explicitly teach the method of claim 1, wherein the text characters including characters for one or more languages.
However, Para teaches this (Para, [0038] note one or more content parsers 104 may include, but are not limited to, audio extractor 126, a speech to text parser 128, a language parser 130, and optical character recognition (OCR). The OCR may be used to extract texts from the images, [0029] note the framework in the present disclosure is domain agnostic and generic such that it supports rule generation on English and other languages. For example, a review written in various languages may require different tokenizers or parsers to extract N-grams. It is noted that the framework in the present disclosure has a multilingual support, [0045] note one or more language specific tokenizers).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao, Reisswig and Li with the OCR including multilingual support of Para according to known methods (i.e. providing multilingual support for OCR). Motivation for doing so is that this may allow for improvements of existing machine learning models (Para, [0024]).
Claim 11: Gao, Reisswig and Li do not explicitly teach the method of claim 1, further comprising: removing stop words from the one or more phrases.
However, Para teaches this (Para, [0038] note one or more content parsers 104 may include, but are not limited to, audio extractor 126, a speech to text parser 128, a language parser 130, and optical character recognition (OCR). The OCR may be used to extract texts from the images, [0041] note data from the one or more content parsers 104 may also be pre-processed in the processor 108. The pre-processing steps may include, but are not limited to, stop words removal, labelling errors removal, stemming identification, spelling corrections, punctuations removal, irrelevant characters removal, and pictorial characters removal).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao, Reisswig and Li with the OCR including stop word removal of Para according to known methods (i.e. providing stop word removal for OCR). Motivation for doing so is that this may allow for improvements of existing machine learning models (Para, [0024]).
Claim 12: Gao, Reisswig and Li do not explicitly teach the method of claim 1, further comprising: applying spelling corrections to the words in the one or more phrases.
However, Para teaches this (Para, [0038] note one or more content parsers 104 may include, but are not limited to, audio extractor 126, a speech to text parser 128, a language parser 130, and optical character recognition (OCR). The OCR may be used to extract texts from the images, [0041] note data from the one or more content parsers 104 may also be pre-processed in the processor 108. The pre-processing steps may include, but are not limited to, stop words removal, labelling errors removal, stemming identification, spelling corrections, punctuations removal, irrelevant characters removal, and pictorial characters removal).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao, Reisswig and Li with the OCR including spelling corrections of Para according to known methods (i.e. providing spelling corrections for OCR). Motivation for doing so is that this may allow for improvements of existing machine learning models (Para, [0024]).
Claim 17: Gao, Reisswig and Li do not explicitly teach the system of claim 13, wherein the optical character recognition engine outputs the set of text characters in one or more languages, the word tokenizer associating the set of text characters as being specific to a given one of the languages.
However, Para teaches this (Para, [0038] note one or more content parsers 104 may include, but are not limited to, audio extractor 126, a speech to text parser 128, a language parser 130, and optical character recognition (OCR). The OCR may be used to extract texts from the images, [0029] note the framework in the present disclosure is domain agnostic and generic such that it supports rule generation on English and other languages. For example, a review written in various languages may require different tokenizers or parsers to extract N-grams. It is noted that the framework in the present disclosure has a multilingual support, [0045] note one or more language specific tokenizers).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the OCR of Gao, Reisswig and Li with the OCR including multilingual support of Para according to known methods (i.e. providing multilingual support for OCR). Motivation for doing so is that this may allow for improvements of existing machine learning models (Para, [0024]).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Giuseppi Giuliani whose telephone number is (571)270-7128. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached at (571)272-8352. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/GIUSEPPI GIULIANI/Primary Examiner, Art Unit 2153