Last updated: May 29, 2026
Application No. 17/961,711
SYSTEMS AND METHODS FOR DETECTING OBJECTS

Final Rejection §103
Filed
Oct 07, 2022
Priority
Oct 07, 2021 — provisional 63/253,496
Examiner
PHAM, NHUT HUY
Art Unit
2674
Tech Center
2600 — Communications
Assignee
Cognex Corporation
OA Round
4 (Final)
Interview Optional

— +26.0% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 80% grant rate with +26.0% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.
Based on 64 resolved cases, 2023–2026
Examiner Intelligence

PHAM, NHUT HUY View full profile →
Grants 80% — above average
Career Allowance Rate
51 granted / 64 resolved
+17.7% vs TC avg
Strong +26% interview lift
Without
With
+26.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
15 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
2.4%
-37.6% vs TC avg
§103
90.6%
+50.6% vs TC avg
§102
1.6%
-38.4% vs TC avg
§112
3.9%
-36.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 64 resolved cases
Office Action

§103
DETAIL OFFICE ACTIONS
The United States Patent & Trademark Office appreciates the response filed for the current application that is submitted on 01/15/2026. The United States Patent & Trademark Office reviewed the following documents submitted and has made the following comments below.

Amendment
Applicant submitted amendments on 01/15/2026. The Examiner acknowledges the amendment and has reviewed the claims accordingly.

Applicant Arguments:
Applicant/s state/s that the cited prior arts do not teach the amended claims, specially, the limitation “determining, with a pre-trained neural network model, a feature map of the image by inputting the image into the pre-trained neural network model and receiving the feature map as an output from the pre-trained neural network model, wherein: the feature map of the image is different than the image, and the feature map comprises a plurality of samples, each of the plurality of samples being associated with a respective feature vector representing a selected feature; … wherein the feature map is determined prior to determining the locations of the one or more objects in the image based on the object center heatmap.”; therefore, the rejection under 35 U.S.C. 103 should be withdrawn. 

Examiner’s Responses:
Applicant’s arguments and amendments, see Remarks, filed 01/15/2026, with respect to the rejection(s) of claim(s) 1, 15 and 16 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration of amendments, a new ground(s) of rejection is made in view of XXX.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1, 2, 7 and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lian et al. (CN111242129A, a translated copy is attached, published 2020, hereinafter Lian) in view of Lei et al. (Lei, Zhengchao, et al. "Scene text recognition using residual convolutional recurrent neural network." Machine Vision and Applications 29.5, published 2018, hereinafter Lei).
CLAIM 1
In regards to Claim 1, Lian teaches a computerized method (Lian, ¶ [0018, 0023, 0028]: “a computer device is also provided, wherein the computer device includes: a memory for storing one or more programs; and one or more processors connected to the memory”) for detecting one or more objects in an image (Lian, ¶ [0004 and 0008]: “Automatic text detection and recognition algorithms require models to simultaneously locate text and recognize characters in natural images … a method for end-to-end text detection and recognition is provided”), the method comprising:
accessing the image (Lian, ¶ [0056]: “the computer device may obtain the target image in any feasible manner, such as by taking a picture, obtaining the target image from the local device, or receiving a target image sent by another device”);
a pre-trained neural network model (Lian, ¶ [0056]: “The feature extraction network can be any network model used to extract features from images, such as VGG (Visual Geometry Group)”. Lian teaches extracting features from an image using VGG network)
Lian does not explicitly disclose determining, with a pre-trained neural network model, a feature map of the image by inputting the image into the pre-trained neural network model and receiving the feature map as an output from the pre-trained neural network model, wherein: the feature map of the image is different than the image; the feature map comprises a plurality of samples, each of the plurality of samples being associated with a respective feature vector representing a selected feature.
Lei is in the same field of art of text recognition from image. Further, Lei teaches determining, with a pre-trained neural network model (Lei, page 863, section 4.1: “we introduce deeper neural networks to enhance the accuracy rate. Specifically, we explore different deeper feature descriptors of VGG16, VGG19, ResNet34, and ResNet50”), a feature map of the image by inputting the image into the pre-trained neural network model and receiving the feature map as an output from the pre-trained neural network model (Lei, page 863-864, section 4.1: “The output of final CNN part is the feature map … Figure 3 shows the architecture of VGG16 and corresponding feature map size in our CNN part … Figure 5 shows the architecture of ResNet50 and the corresponding feature map size”. Lei teaches input image into networks such as VGG, Resnet, and output feature map), wherein: the feature map of the image is different than the image (Lei, see FIG. 3 and 5. Generated feature maps has different size with the input image); the feature map comprises a plurality of samples, each of the plurality of samples being associated with a respective feature vector (Lei, see Fig. 2, each feature has corresponding feature sequence) representing a selected feature (Lei, page 863-864, section 4.1: “The output of final CNN part is the feature map … Figure 3 shows the architecture of VGG16 and corresponding feature map size in our CNN part … Figure 5 shows the architecture of ResNet50 and the corresponding feature map size”. Lei teaches input image into networks such as VGG, Resnet, and output 2D feature maps, see FIG. 3 and 5)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to substitute the feature extraction network of Lian with feature extraction networks that is taught by Lei, to make a text recognition system that can work with multiple feature networks; thus, one of ordinary skilled in the art would be motivated to make the substitution since Lian teaches extracting features with VGG network, and Lei teaches different version of VGG networks (Lei, page 863, section 4.1: “we introduce deeper neural networks to enhance the accuracy rate. Specifically, we explore different deeper feature descriptors of VGG16, VGG19, ResNet34, and ResNet50”).
The combination of Lian and Lei then teaches generating, with a first machine learning model different from the pre-trained neural network model (Lian, ¶ [0057]: “In step S12, the computer device inputs the shared feature information into the text detection network and obtains the character detection results output by the text detection network”. The Examiner notes Lian extract features using feature extraction network, and input extracted features into text detection network), an object center heatmap for the image by using the first machine learning model to process the feature map of the image to generate the object center heatmap for the image (Lian, ¶ [0057]: “the text detection network uses Gaussian heatmaps to generate the character region detection results and the character connection region detection results…”), wherein the object center heatmap comprises a plurality of samples each having a value indicative of a likelihood of a corresponding sample in the image being a central position of an object (Lian, ¶ [0057]: “… the character region detection results use Gaussian heatmaps to represent the probability of the character center region…”); and
determining locations of one or more objects in the image based on the object center heatmap wherein the feature map is determined prior to determining the locations of the one or more objects in the image based on the object center heatmap. (Lian, ¶ [0058 and 0059]: “… the character region detection result is used to guide the attention network in the character recognition network to predict character regions (i.e., to guide the attention network in which regions to make predictions) … the computer device locates the position of each text box based on the character detection result”)
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 2
Regarding claim 2, the combination of Lian and Lei teaches method of Claim 1. In addition, the combination of Lian and Lei teaches processing the locations of the one or more objects in the image using a second machine learning model and the feature map to recognize an object of the one or more objects. (Lian ¶ [0058]: “In step S13, the computer device inputs the shared feature information and the character detection result into the character recognition network to obtain the character recognition result output by the character recognition network … the text recognition network first uses a Bi-Long Short-Term Memory Network (BiLSTM) to capture text temporal information, and then uses an attention mechanism to predict character regions and character content” The Examiner notes shared feature information corresponds to “feature map”, character detection result corresponds to “locations of objects”)

CLAIM 7
Regarding claim 7, the combination of Lian and Lei teaches method of Claim 2. In addition, the combination of Lian and Lei teaches training each of the first machine learning model (Lian, ¶ [0057]: “the text detection network uses Gaussian heatmaps to generate the character region detection results and the character connection region detection results…”) and the second machine learning model (Lian, ¶ [0058]: “… the text recognition network first uses a Bi-Long Short-Term Memory Network (BiLSTM)”) using a respective machine learning method and using a respective set of field training data. (The Examiner notes the detection network and text recognition network are two different models, take different type of inputs  (one take feature maps as input, one take sequential data as input), and generate different type of outputs. Therefore, their training method and training data are different)

CLAIM 15
In regards to Claim 15, Lian teaches a non-transitory computer-readable media comprising instructions that, when executed by one or more processors on a computing device (Lian, ¶ [0018, 0023, 0028]: “a computer device is also provided, wherein the computer device includes: a memory for storing one or more programs; and one or more processors connected to the memory”), are operable to cause the one or more processors to perform:
accessing the image (Lian, ¶ [0056]: “the computer device may obtain the target image in any feasible manner, such as by taking a picture, obtaining the target image from the local device, or receiving a target image sent by another device”);
a pre-trained neural network model (Lian, ¶ [0056]: “The feature extraction network can be any network model used to extract features from images, such as VGG (Visual Geometry Group)”. Lian teaches extracting features from an image using VGG network)
Lian does not explicitly disclose determining, with a pre-trained neural network model, a feature map of the image by inputting the image into the pre-trained neural network model and receiving the feature map as an output from the pre-trained neural network model, wherein: the feature map of the image is different than the image; the feature map comprises a plurality of samples, each of the plurality of samples being associated with a respective feature vector representing a selected feature.
Lei is in the same field of art of text recognition from image. Further, Lei teaches determining, with a pre-trained neural network model (Lei, page 863, section 4.1: “we introduce deeper neural networks to enhance the accuracy rate. Specifically, we explore different deeper feature descriptors of VGG16, VGG19, ResNet34, and ResNet50”), a feature map of the image by inputting the image into the pre-trained neural network model and receiving the feature map as an output from the pre-trained neural network model (Lei, page 863-864, section 4.1: “The output of final CNN part is the feature map with height 1 which is easily converted to a feature sequence … Figure 3 shows the architecture of VGG16 and corresponding feature map size in our CNN part … Figure 5 shows the architecture of ResNet50 and the corresponding feature map size”. Lei teaches input image into networks such as VGG, Resnet, and output feature map), wherein: the feature map of the image is different than the image (Lei, see FIG. 3 and 5. Generated feature maps has different size with the input image); the feature map comprises a plurality of samples, each of the plurality of samples being associated with a respective feature vector (Lei, see Fig. 2, each feature has corresponding feature sequence) representing a selected feature (Lei, page 863-864, section 4.1: “The output of final CNN part is the feature map … Figure 3 shows the architecture of VGG16 and corresponding feature map size in our CNN part … Figure 5 shows the architecture of ResNet50 and the corresponding feature map size”. Lei teaches input image into networks such as VGG, Resnet, and output 2D feature maps, see FIG. 3 and 5)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to substitute the feature extraction network of Lian with feature extraction networks that is taught by Lei, to make a text recognition system that can work with multiple feature networks; thus, one of ordinary skilled in the art would be motivated to make the substitution since Lian teaches extracting features with VGG network, and Lei teaches different version of VGG networks (Lei, page 863, section 4.1: “we introduce deeper neural networks to enhance the accuracy rate. Specifically, we explore different deeper feature descriptors of VGG16, VGG19, ResNet34, and ResNet50”).
The combination of Lian and Lei then teaches generating, with a first machine learning model different from the pre-trained neural network model (Lian, ¶ [0057]: “In step S12, the computer device inputs the shared feature information into the text detection network and obtains the character detection results output by the text detection network”. The Examiner notes Lian extract features using feature extraction network, and input extracted features into text detection network), an object center heatmap for the image by using the first machine learning model to process the feature map of the image to generate the object center heatmap for the image (Lian, ¶ [0057]: “the text detection network uses Gaussian heatmaps to generate the character region detection results and the character connection region detection results…”), wherein the object center heatmap comprises a plurality of samples each having a value indicative of a likelihood of a corresponding sample in the image being a central position of an object (Lian, ¶ [0057]: “… the character region detection results use Gaussian heatmaps to represent the probability of the character center region…”); and
determining locations of one or more objects in the image based on the object center heatmap wherein the feature map is determined prior to determining the locations of the one or more objects in the image based on the object center heatmap. (Lian, ¶ [0058 and 0059]: “… the character region detection result is used to guide the attention network in the character recognition network to predict character regions (i.e., to guide the attention network in which regions to make predictions) … the computer device locates the position of each text box based on the character detection result”)
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.


Claim(s) 3, 5 and 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lian in view of Lei, and further in view of Cho (US-20180373947-A1, published 2018, hereinafter Cho).
CLAIM 3
In regards to Claim 3, the combination of Lian and Lei teaches the method of Claim 2. 
The combination of Lian and Lei does not explicitly disclose recognizing the object of the one or more objects comprises: generating an object feature vector using a portion of the feature map of the image, wherein the portion of the feature map is based on an area surrounding the location of the object; 
Cho is in the same field of art of character recognition. Further, Cho teaches recognizing the object of the one or more objects comprises: generating an object feature vector using a portion of the feature map of the image (Cho, ¶ [0026-0027]: “… calculate each feature vector corresponding to each of the segmented character images if the segmented character images are acquired…apply operations to each one of the segmented character images to map the feature of the character into the multi-dimensional numeric representation.”), wherein the portion of the feature map is based on an area surrounding the location of the object; (Cho, ¶ [0026-0027]: “said features may not only include classic features derived from Haar, HOG (Histogram of Oriented Gradients), or LBP (Local Binary Pattern)…” The Examiner notes Haar features are obtained by highlighting intensity differences between adjacent regions in an image, meaning intensity differences between character-region vs background-region. Thus, this reads on area surrounding the location of the object)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei by incorporating the SVM classifier that is taught by Cho, to make a text detection system that recognize character with an SVM based classifier; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to increase efficiency of the task character recognition (Cho, ¶ [0010]: “It is another object of the present invention to provide a text recognition method with a high efficiency in identifying similar-shaped characters”).
The combination of Lian, Lei and Cho then teaches processing the object feature vector using the second machine learning model (Cho, ¶ [0049]: “The classifier may be implemented by conventional Support Vector Machine (SVM), or it may be implemented as a linear classifier, but it is not limited thereto”) to generate a class vector (Cho, ¶ [0034-0036]: “obtain a merged vector or its processed value by executing a computation with the support vector and a feature vector ci of the specific character image.”), wherein the class vector comprises a plurality of values each corresponding to one of a plurality of known labels (Cho, ¶ [0035]: “merged vector may be obtained by adding the support vector and the feature vector ci of the specific character and it is served as an input of a classifier for determining an identity of the specific character. As a reference, the number of character classes depends on the recognition target language. For example, the number of classes is either 26 (case-insensitive) or 52 (case-sensitive) for English and it is 10 for digital numbers”); and classifying the object to a label of the plurality of known labels using the class vector. (Cho, ¶ [00336]: “classify the specific character image as a letter in a predetermined set of letters by referring to the merged vector or its processed value”) 
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 5
In regards to Claim 5, the combination of Lian, Lei and Cho teaches the method of Claim 3. In addition, the combination of Lian, Lei and Cho teaches the plurality of known labels comprises a plurality of textual character labels. (Cho, ¶ [0035]: “As a reference, the number of character classes depends on the recognition target language. For example, the number of classes is either 26 (case-insensitive) or 52 (case-sensitive) for English and it is 10 for digital numbers”) 

CLAIM 6
In regards to Claim 6, the combination of Lian, Lei and Cho teaches the method of Claim 3. In addition, the combination of Lian, Lei and Cho teaches the plurality of known labels further comprises a background label. (Lei, page 866, section 4.3: “the input sequence of BLSTM is x1x2x3x4x5x6 and the labels are ‘a’ or ‘b’, it is possible to output “a-a-b-” or “-aa-b” by introducing blank label ‘-’ through BLSTM”)

CLAIM 4
Claim(s) 4 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lian in view of Lei in view of Cho, and further in view of Ogun (Sewade Ogun, “You Don't Really Know Softmax”, Github, published 04/26/2020, hereinafter, Ogun)
In regards to Claim 4, the combination of Lian, Lei and Cho teaches the method of Claim 3.
The combination of Lian, Lei and Cho does not explicitly disclose each value of the class vector is indicative of a predicted score associated with a corresponding one of the plurality of known labels; and classifying the object comprises selecting a maximum value among the plurality of values in the class vector, wherein the selected value corresponds to the label.
Ogun is in the same field of art of classification using machine learning. Further, Ogun teaches each value of the class vector is indicative of a predicted score associated with a corresponding one of the plurality of known labels (Ogun, first paragraph: “Softmax function is one of the major functions used in classification models. It is usually introduced early in a machine learning class. It takes as input a real-valued vector of length, d and normalizes it into a probability distribution”, see Figure with annotation below); and classifying the object comprises selecting a maximum value among the plurality of values in the class vector, wherein the selected value corresponds to the label. (Ogun, first paragraph: “Softmax function is one of the major functions used in classification models. It is usually introduced early in a machine learning class. It takes as input a real-valued vector of length, d and normalizes it into a probability distribution”, see Figure with annotation below)

    PNG
    media_image1.png
    625
    1429
    media_image1.png
    Greyscale

Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian, Lei and Cho by simply substitute the SVM classifier’s scoring system with the SoftMax classifier’s scoring system that is taught by Ogun, to make a character recognition that utilize SoftMax classifier’s scoring system; thus, one of ordinary skilled in the art would be motivated to make the simple substitution since Softmax is widely used for multi-class classification (Ogun, Introduction: “Softmax is a non-linear function, used majorly at the output of classifiers for multi-class classification”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.




Claim(s) 9-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lian in view of Lei, and further in view of Cao et al. (CN111402228A, published 2020, a translated copy is attached, hereinafter, Cao)
CLAIM 9
 In regards to Claim 9, the combination of Lian and Lei teaches the method of claim 1. 
The combination of Lian and Lei does not explicitly disclose determining the locations of the one or more objects comprises: smoothing the object center heatmap to generate a smoothed object center heatmap; and selecting the locations of the one or more objects, wherein a value at each respective location in the smoothed object center heatmap is higher than values in a proximate area of the location.
Cao is in the same field of art of determining object location via heatmaps. Further, Cao teaches determining the locations of the one or more objects comprises: smoothing the object center heatmap to generate a smoothed object center heatmap (Cao, ¶ [0077-0078]: “a center point heatmap … a multi-peak Gaussian heatmap Y∈(0,1)H×W (a heatmap generated by the ground truth), where each keypoint is defined as the mean of the Gaussian kernel with a standard deviation proportional to the size of the target object. The Gaussian heatmap is used as a weight to reduce the penalty for pixels near positive”. The Examiner notes the Gaussian heatmap provides a continuous range of values from 0 to 1 (smoothing), center is close to 1 and further pixels are close to 0); and selecting the locations of the one or more objects, wherein a value at each respective location in the smoothed object center heatmap is higher than values in a proximate area of the location. (Cao, ¶ [0092-0093 and 0098]: “The key point heatmap can be selected by sliding a preset window, and the pixel with the highest probability in the current window that meets the preset threshold can be selected as the probability peak point … a preset condition could be that the calculated center point has a high score in the center point heatmap, such as exceeding a certain set threshold, in which case the peak key point is considered a valid detection.” Cao teaches selecting center point by sliding a window over feature map, and selecting pixels that exceeds a preset threshold)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei by simply substituting Lian’s Gaussian heatmap with Gaussian heatmap that is taught by Cao; thus, one of ordinary skilled in the art would be motivated to make the substitution since Lian and Lei teaches determine object location using Gaussian heatmap, and Cao teaches a detailed process of determine object location using Gaussian heatmap (Cao, ¶ [0077-0078]).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 10
In regards to Claim 10, the combination of Lian, Lei and Cao teaches the method of Claim 9. In addition, the combination of Lian, Lei and Cao teaches smoothing the object center heatmap comprises applying a Gaussian filter having a standard deviation proportional to an object size. (Cao, ¶ [0078]: “a multi-peak Gaussian heatmap Y∈(0,1)H×W (a heatmap generated by the ground truth), where each keypoint is defined as the mean of the Gaussian kernel with a standard deviation proportional to the size of the target object. The Gaussian heatmap is used as a weight to reduce the penalty for pixels near positive”)

CLAIM 11
In regards to Claim 11, the combination of Lian, Lei and Cao teaches the method of Claim 9. In addition, the combination of Lian, Lei and Cao teaches selecting the location further comprises filtering one or more locations at which the value in the smoothed object center heatmap is below a threshold. (Cao, ¶ [0092-0093 and 0098]: “The key point heatmap can be selected by sliding a preset window, and the pixel with the highest probability in the current window that meets the preset threshold can be selected as the probability peak point … a preset condition could be that the calculated center point has a high score in the center point heatmap, such as exceeding a certain set threshold, in which case the peak key point is considered a valid detection.”)

CLAIM 12
Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lian in view of Lei, and further in view of Hentschel et al. (Hentschel, Christian, and Harald Sack. "What image classifiers really see–visualizing bag-of-visual words models." International Conference on Multimedia Modeling. Cham: Springer International Publishing, 2015, hereinafter, Hentschel)
In regards to Claim 12, the combination of Lian and Lei teaches the method of Claim 1.
The combination of Lian and Lei does not explicitly disclose the first machine learning model includes a weight vector; the feature map of the image comprises a plurality of samples each associated with a respective feature vector; and a value of each sample in the object center heatmap is a dot product of a feature vector of a corresponding sample in the feature map and a weight vector.
Hentschel is in the same field of art of image classifiers. Further, Hentschel teaches the first machine learning model includes a weight vector (Hentschel, Page 4, section 3.1: “The trained model consists of a bias and a weight vector”); the feature map of the image comprises a plurality of samples each associated with a respective feature vector (Hentschel, Page 3 and 4, section 3: “SIFT features are extracted at a dense grid of s = 6 pixels and at a fixed scale of σ = 1.0 and k-means clustering is used to quantize the SIFT features to k = 100 vocabulary vectors.”); and a value of each sample in the object center heatmap is a dot product of a feature vector of a corresponding sample in the feature map and a weight vector. (Hentschel, Page 4, section 3.1: “The trained model consists of a bias and a weight vector– an unknown sample is classified by computing the dot product between the weight vector and the sample’s feature vector (plus the bias)”)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei by incorporating the linear SVM classifier that is taught by Hentschel, to make a character recognition method that utilize linear SVM for classification; thus, one of ordinary skilled in the art would be motivated to combine the references since linear SVM offers higher interpretability (Hentschel, Page 4, section 3.1: “While the classification results usually tend to be inferior to the results of non-linear SVMs, the linear model allows for an immediate interpretation of the weight vector dimensions as feature importance scores”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 13
Claim(s) 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lian in view of Lei, and further in view of Singh et al. (US-20170286803-A1, published 2017, hereinafter Singh)
In regards to Claim 13, the combination of Lian and Lei teaches the method of Claim 1.
The combination of Lian and Lei does not explicitly disclose receiving an input from a user interface; and re-training the first machine learning model based on the input from the user interface.
Singh is in the same field of art of character recognition using machine learning. Further, Signh teaches receiving an input from a user interface (Singh, ¶ [0025]: “The output module in conjunction with the user interface module also prompts the user to pick one of the suggested characters as one correctly corresponding to the character in the image data”); and re-training the first machine learning model based on the input from the user interface. (Singh, ¶ [0025]: “the output module provides the received user input to the dynamic machine learning module for re-training existing machine learning algorithm corresponding to the suggested character picked by the user or for dynamically creating new machine learning algorithm corresponding to the new character labelled by the user. The output module may also update the set of pre-defined characters in the database by adding the new labelled character to the set”. Singh teaches an interface to receive user input regarding the result of character recognition, and the system can update/re-train machine learning model based on user input, user can also add new pre-defined character into the database)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei by incorporating system to retrain model based on user input that is taught by Singh, to make a text detection system that can update model based on user input; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need to improve accuracy of OCR system by updating OCR model with new training data (Singh, ¶ [0003-0004]: “training a machine learning algorithm for OCR to identify the data with high level of accuracy is challenging. Further, once an OCR technique has been trained for or has learnt a set of symbols (e.g., in a specific domain), it is difficult to apply it to new set of images which may be similar to the previous set but yet may have many new symbols that the OCR technique may not recognize”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.


Claim(s) 14 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lian in view of Lei, and further in view of Liu et al. (Liu, Yue et al. "Recognition of QR Code with mobile phones." IEEE, published 2008, hereinafter Liu)
CLAIM 14
In regards to Claim 14, the combination of Lian and Lei teaches the method of Claim 1.
The combination of Lian and Lei does not explicitly disclose capturing the image using a 1D barcode scanner or a 2D barcode scanner.
Liu is in the same field of art of image processing. Further, Liu teaches capturing the image using a 1D barcode scanner or a 2D barcode scanner. (Liu, section I. Introduction: … 2D-barcodes …”; section 3, first paragraph: “the recognition algorithm of QR Code is introduced, which is used in various conditions. The algorithm consists of several steps, as shown in Fig. 2, gray scale image conversion, binarization, filter, orientation (finder patterns or timing patterns location), alignment patterns location, cell grids generating, error correction and decoding. The input is an RGB color image which is captured by mobile phone and the output is a decoding result”.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei incorporating the system to recognize bar code from normal images that is taught by Liu, to make a text detection system that is able to scan barcode; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need for a system that can scan bar code from normal images (Liu, Introduction: “The barcode reader only be used to recognize the barcode, and the price of two dimensional bar codes reader is expensive. Now mobile phones can implement many new kinds of applications such as taking photos, and movie shooting by using embedded camera devices.”).
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 16
In regards to Claim 16, Lian teaches a system comprising: a processor configured to execute programming instructions (Lian, ¶ [0018, 0023, 0028]: “a computer device is also provided, wherein the computer device includes: a memory for storing one or more programs; and one or more processors connected to the memory”): 
accessing the image (Lian, ¶ [0056]: “the computer device may obtain the target image in any feasible manner, such as by taking a picture, obtaining the target image from the local device, or receiving a target image sent by another device”);
a pre-trained neural network model (Lian, ¶ [0056]: “The feature extraction network can be any network model used to extract features from images, such as VGG (Visual Geometry Group)”. Lian teaches extracting features from an image using VGG network)
Lian does not explicitly disclose determining, with a pre-trained neural network model, a feature map of the image by inputting the image into the pre-trained neural network model and receiving the feature map as an output from the pre-trained neural network model, wherein: the feature map of the image is different than the image; the feature map comprises a plurality of samples, each of the plurality of samples being associated with a respective feature vector representing a selected feature.
Lei is in the same field of art of text recognition from image. Further, Lei teaches determining, with a pre-trained neural network model (Lei, page 863, section 4.1: “we introduce deeper neural networks to enhance the accuracy rate. Specifically, we explore different deeper feature descriptors of VGG16, VGG19, ResNet34, and ResNet50”), a feature map of the image by inputting the image into the pre-trained neural network model and receiving the feature map as an output from the pre-trained neural network model (Lei, page 863-864, section 4.1: “The output of final CNN part is the feature map with height 1 which is easily converted to a feature sequence … Figure 3 shows the architecture of VGG16 and corresponding feature map size in our CNN part … Figure 5 shows the architecture of ResNet50 and the corresponding feature map size”. Lei teaches input image into networks such as VGG, Resnet, and output feature map), wherein: the feature map of the image is different than the image (Lei, see FIG. 3 and 5. Generated feature maps has different size with the input image); the feature map comprises a plurality of samples, each of the plurality of samples being associated with a respective feature vector (Lei, see Fig. 2, each feature has corresponding feature sequence) representing a selected feature (Lei, page 863-864, section 4.1: “The output of final CNN part is the feature map … Figure 3 shows the architecture of VGG16 and corresponding feature map size in our CNN part … Figure 5 shows the architecture of ResNet50 and the corresponding feature map size”. Lei teaches input image into networks such as VGG, Resnet, and output 2D feature maps, see FIG. 3 and 5)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to substitute the feature extraction network of Lian with feature extraction networks that is taught by Lei, to make a text recognition system that can work with multiple feature networks; thus, one of ordinary skilled in the art would be motivated to make the substitution since Lian teaches extracting features with VGG network, and Lei teaches different version of VGG networks (Lei, page 863, section 4.1: “we introduce deeper neural networks to enhance the accuracy rate. Specifically, we explore different deeper feature descriptors of VGG16, VGG19, ResNet34, and ResNet50”).
The combination of Lian and Lei does not explicitly disclose a scanner comprising an image capturing device configured to capture an image of a part on an inspection station.
Liu is in the same field of art of image processing. Further, Liu teaches a scanner comprising an image capturing device configured to capture an image of a part on an inspection station. (Liu, section I. Introduction: … 2D-barcodes …”; section 3, first paragraph: “the recognition algorithm of QR Code is introduced, which is used in various conditions. .... The input is an RGB color image which is captured by mobile phone and the output is a decoding result”. Liu teaches a QR code scanner system that can read the decoded information of an QR code in normal images)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei incorporating the system to recognize bar code from normal images that is taught by Liu, to make a text detection system that is able to scan barcode; thus, one of ordinary skilled in the art would be motivated to combine the references since among its several aspects, the present invention recognizes there is a need for a system that can scan bar code from normal images (Liu, Introduction: “The barcode reader only be used to recognize the barcode, and the price of two dimensional bar codes reader is expensive. Now mobile phones can implement many new kinds of applications such as taking photos, and movie shooting by using embedded camera devices.”).
The combination of Lian, Lei and Liu then teaches generating, with a first machine learning model different from the pre-trained neural network model (Lian, ¶ [0057]: “In step S12, the computer device inputs the shared feature information into the text detection network and obtains the character detection results output by the text detection network”. The Examiner notes Lian extract features using feature extraction network, and input extracted features into text detection network), an object center heatmap for the image by using the first machine learning model to process the feature map of the image to generate the object center heatmap for the image (Lian, ¶ [0057]: “the text detection network uses Gaussian heatmaps to generate the character region detection results and the character connection region detection results…”), wherein the object center heatmap comprises a plurality of samples each having a value indicative of a likelihood of a corresponding sample in the image being a central position of an object (Lian, ¶ [0057]: “… the character region detection results use Gaussian heatmaps to represent the probability of the character center region…”); and
determining locations of one or more objects in the image based on the object center heatmap wherein the feature map is determined prior to determining the locations of the one or more objects in the image based on the object center heatmap. (Lian, ¶ [0058 and 0059]: “… the character region detection result is used to guide the attention network in the character recognition network to predict character regions (i.e., to guide the attention network in which regions to make predictions) … the computer device locates the position of each text box based on the character detection result”)
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.


Claim(s) 17-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Lian in view of Lei, and further in view of Zuccolo (Ricardo Zuccolo " Self-driving Cars — OpenCV and SVM Machine Learning with Scikit-Learn for Vehicle Detection on the Road." Medium, published 2017, hereinafter Zuccolo)
CLAIM 17
In regards to Claim 17, the combination of Lian and Lei teaches the method of Claim 1.
The combination of Lian and Lei does not explicitly disclose the first machine learning model comprises no convolutional neural network. 
Zuccolo is in the same field of art of object detection using machine learning. Further, Zuccolo teaches the first machine learning model comprises no convolutional neural network. (Zuccolo, Page 25-27, section Heat-maps bounding boxes and false positives. Zuccolo teaches combining SVM classifier and sliding window search technique to generate heatmap for detected objects, without using cnn.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei by incorporating the method to generate heatmap that is taught by Zuccolo, to make a system to generate heatmap for detected object without using cnn; thus, one of ordinary skilled in the art would be motivated to combine the references since swapping a deep-learning head for Zuccolo’s SVM to the make the computational load lighter to run on an edge device like a handheld scanner. (Zuccolo, page 18: “The SVM training took a process time of only 21.05 seconds.”)
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 18
In regards to Claim 18, the combination of Lian and Lei teaches the method of Claim 1.
The combination of Lian and Lei does not explicitly disclose the first machine learning model comprises is a non-deep learning model. 
Zuccolo is in the same field of art of object detection using machine learning. Further, Zuccolo teaches the first machine learning model is a non-deep learning model. (Zuccolo, Page 25-27, section Heat-maps bounding boxes and false positives. Zuccolo teaches combining SVM classifier and sliding window search technique to generate heatmap for detected objects, without using a deep learning model.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei by incorporating the method to generate heatmap that is taught by Zuccolo, to make a system to generate heatmap for detected object without using a deep learning model; thus, one of ordinary skilled in the art would be motivated to combine the references since swapping a deep-learning head for Zuccolo’s SVM to the make the computational load lighter to run on an edge device like a handheld scanner. (Zuccolo, page 18: “The SVM training took a process time of only 21.05 seconds.”)
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 19
In regards to Claim 19, the combination of Lian and Lei teaches the method of Claim 2. In addition, the combination of Lian and Lei teaches the second machine learning model comprise no convolutional neural network. (Lian, ¶ [0058]: “… the text recognition network first uses a Bi-Long Short-Term Memory Network (BiLSTM)”)
The combination of Lian and Lei does not explicitly disclose the first machine learning model comprises no convolutional neural network.  
Zuccolo is in the same field of art of object detection using machine learning. Further, Zuccolo teaches the first machine learning model comprises no convolutional neural network. (Zuccolo, Page 25-27, section Heat-maps bounding boxes and false positives. Zuccolo teaches combining SVM classifier and sliding window search technique to generate heatmap for detected objects, without using a cnn.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei by incorporating the method to generate heatmap that is taught by Zuccolo, to make a system to generate heatmap for detected object without using a deep learning model; thus, one of ordinary skilled in the art would be motivated to combine the references since swapping a deep-learning head for Zuccolo’s SVM to the make the computational load lighter to run on an edge device like a handheld scanner. (Zuccolo, page 18: “The SVM training took a process time of only 21.05 seconds.”)
The combination of Lian, Lei and Zuccolo then teaches both the first (Zuccolo, Page 25-27, section Heat-maps bounding boxes and false positives. Zuccolo teaches combining SVM classifier and sliding window search technique to generate heatmap for detected objects, without using a cnn.) and second machine learning models comprise no convolutional neural network. (Lian, ¶ [0058]: “… the text recognition network first uses a Bi-Long Short-Term Memory Network (BiLSTM)”)
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.

CLAIM 20
In regards to Claim 20, the combination of Lian and Lei teaches the method of Claim 2. In addition, the combination of Lian and Lei teaches the second machine learning model is a non-deep learning model. (Lian, ¶ [0058]: “… the text recognition network first uses a Bi-Long Short-Term Memory Network (BiLSTM)”)
The combination of Lian and Lei does not explicitly disclose the first machine learning model comprises is a non-deep learning model. 
Zuccolo is in the same field of art of object detection using machine learning. Further, Zuccolo teaches the first machine learning model is a non-deep learning model. (Zuccolo, Page 25-27, section Heat-maps bounding boxes and false positives. Zuccolo teaches combining SVM classifier and sliding window search technique to generate heatmap for detected objects, without using a deep learning model.)
Therefore, it would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Lian and Lei by incorporating the method to generate heatmap that is taught by Zuccolo, to make a system to generate heatmap for detected object without using a deep learning model; thus, one of ordinary skilled in the art would be motivated to combine the references since swapping a deep-learning head for Zuccolo’s SVM to the make the computational load lighter to run on an edge device like a handheld scanner. (Zuccolo, page 18: “The SVM training took a process time of only 21.05 seconds.”)
The combination of Lian, Lei and Zuccolo then teaches both the first (Zuccolo, Page 25-27, section Heat-maps bounding boxes and false positives. Zuccolo teaches combining SVM classifier and sliding window search technique to generate heatmap for detected objects, without using a deep learning model.) and second machine learning models are non-deep learning models. (Lian, ¶ [0058]: “… the text recognition network first uses a Bi-Long Short-Term Memory Network (BiLSTM)”)
	Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention.







Allowable Subject Matter
Claim 8 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NHUT HUY (JEREMY) PHAM whose telephone number is (703)756-5797. The examiner can normally be reached Mo - Fr. 8:30am - 6pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, O'Neal Mistry can be reached on (313)446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

NHUT HUY (JEREMY) PHAMExaminerArt Unit 2674
/Ross Varndell/Primary Examiner, Art Unit 2674
Read full office action
Prosecution Timeline

Show 7 earlier events
Jun 13, 2025
Examiner Interview Summary
Jul 21, 2025
Request for Continued Examination
Jul 22, 2025
Response after Non-Final Action
Aug 26, 2025
Non-Final Rejection mailed — §103
Oct 28, 2025
Interview Requested
Nov 06, 2025
Examiner Interview Summary
Jan 15, 2026
Response Filed
Mar 16, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/037,475
Patent 12631891
DIFFRACTION-GRATING-BASED SYSTEMS AND METHODS FOR STEREOSCOPIC AND MULTISCOPIC IMAGING
3y 0m to grant Granted May 19, 2026
18/299,960
Patent 12633140
DRIVING SKILL ESTIMATION DEVICE
3y 1m to grant Granted May 19, 2026
18/001,278
Patent 12620100
METHOD FOR DETECTING SPATIAL COUPLING
3y 4m to grant Granted May 05, 2026
18/321,520
Patent 12620195
ROBUST FEATURE EXTRACTION FROM OCCLUDED IMAGE FRAMES FOR VEHICLE APPLICATIONS
2y 11m to grant Granted May 05, 2026
17/925,903
Patent 12598397
DIRT DETECTION METHOD AND DEVICE FOR CAMERA COVER
3y 4m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
80%
Grant Probability
99%
With Interview (+26.0%)
2y 10m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 64 resolved cases by this examiner. Grant probability derived from career allowance rate.