Last updated: April 19, 2026
Application No. 18/071,481
INFORMATION GENERATING METHOD AND APPARATUS, DEVICE, STORAGE MEDIUM, AND PROGRAM PRODUCT

Final Rejection §103
Filed
Nov 29, 2022
Examiner
BUDISALICH, ANDREW STEVEN
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Tencent Technology (Shenzhen) Company Limited
OA Round
4 (Final)
Interview Optional

— +8.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 46 resolved cases, 2023–2026
Examiner Intelligence

BUDISALICH, ANDREW STEVEN View full profile →
Grants 78% — above average
Career Allow Rate
36 granted / 46 resolved
+16.3% vs TC avg
Moderate +9% lift
Without
With
+8.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
14.5%
-25.5% vs TC avg
§103
65.6%
+25.6% vs TC avg
§102
5.2%
-34.8% vs TC avg
§112
13.0%
-27.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 46 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 0 has been entered.

Status of Claims
Claims 1-20 are pending.

Response to Arguments
Applicant’s arguments, see p.9-14, filed 02/04/2026, with respect to the rejections of Claims 1-20 under 35 U.S.C. 103 have been fully considered but are moot because Applicant’s amendments of independent claims has altered the scope of the claims, and therefore, necessitated new grounds of rejection, which are presented below. Accordingly, THIS ACTION IS MADE FINAL.   

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-4, 8-11, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Chiu et al. (US 20200357143 A1) in view of Wu et al. (CN 107608943 A), Yang et al. (CN 111144410 A), Yang et al. (CN 110472642 B, hereinafter Yang'642), and Wang et al. (CN 107563498 A).

Regarding Claim 1, Chiu teaches "An information generating method performed by a computer device, the method comprising: obtaining a target image"; (Chiu, Abstract, teaches a method, apparatus, and system for visual localization wherein an image can include at least one image from a plurality of modalities, i.e., obtaining a target image);
"extracting a semantic feature set from the target image and a visual feature set from the target image"; (Chiu, Abstract, teaches extracting appearance features of an image and extracting semantic features of the image, i.e., extracting a semantic feature set and a visual feature set of the target image).
However, Chiu does not explicitly teach "performing attention fusion using an attention fusion network comprising a single long short term memory (LSTM) network in an information generating model on semantic features of the target image and visual features of the target image at n time steps to obtain caption words of the target image at the n time steps; including: inputting the semantic feature set extracted from the target image into a semantic attention network, wherein the semantic attention network generates a semantic attention vector at a current time step according to a hidden layer vector outputted by the single LSTM network at a previous time step; inputting the visual feature set extracted from the target image into a visual attention network, wherein the visual attention network generates a visual attention vector at the current time step according to the hidden layer vector outputted by the single LSTM network at the previous time step; and inputting the semantic attention vector, the visual attention vector, and a caption word outputted by the single LSTM network at the previous time step, into the single LSTM network, wherein the single LSTM network generates a caption word at the current time step; and generating image caption information of the target image based on the caption words of the target image at n time steps".
In an analogous field of endeavor, Wu teaches "performing attention fusion using an attention fusion network comprising a single long short term memory (LSTM) network in an information generating model on semantic features of the target image and visual features of the target image at n time steps to obtain caption words of the target image at the n time steps"; (Wu, FIG. 4 and Pg. 3 final three paragraphs and Pg. 4 first six paragraphs, teaches an image subtitle generating method fusing visual attention and semantic attention using an LSTM network and multi-layer perceptron model to generate words corresponding to the images of the subtitles to be generated, i.e., performing attention fusion using an attention fusion network in an information generating model, wherein image features are extracted from each image to go through a convolutional neural network to obtain an image feature set wherein a visual attention model and semantic attention model are respectively generated using time sequence information and the extracted image feature set in which the steps are repeated until a stop mark is detected and performing series combination on all the obtained words to generate the subtitles, i.e., attention fusion using an attention fusion network with one embodiment comprising only a single long short term memory network of semantic and visual features of the target image at n times steps of the time sequence features to obtain caption words of the image at the n time steps);
"including: inputting the semantic feature set extracted from the target image into a semantic attention network"; (Wu, Pg. 4 Para. 2, teaches generating a semantic attention model by combining the extracted image feature step with the time sequence information and words in the previous time sequence, i.e., input the semantic feature set previously extracted from the target image into the semantic attention network);
"
"inputting the visual feature set extracted from the target image into a visual attention network"; (Wu, Pg. 4 Para. 1, teaches generating a visual attention model by combining the image feature set and the time sequence information, i.e., input the previously extracted visual feature set from the target image into a visual attention network).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Chiu wherein the extracted image feature set contains explicitly extracted semantic features and explicitly extracted visual features by including the attention fusion of semantic and visual features of a target image to obtain caption words taught by Wu. One of ordinary skill in the art would be motivated to combine the references since gives image subtitles a better fit to reality (Wu, Pg. 3 Para. 13, teaches the motivation of combination to be to give image subtitles a better fit to the reality).
However, the combination of references of Chiu in view of Wu does not explicitly teach "wherein the semantic attention network generates a semantic attention vector at a current time step according to a hidden layer vector outputted by the single LSTM network at a previous time step; wherein the visual attention network generates a visual attention vector at the current time step according to the hidden layer vector outputted by the single LSTM network at the previous time step; and inputting the semantic attention vector, the visual attention vector, and a caption word outputted by the single LSTM network at the previous time step, into the single LSTM network, wherein the single LSTM network generates a caption word at the current time step; and generating image caption information of the target image based on the caption words of the target image at n time steps".
In an analogous field of endeavor, Yang teaches "wherein the semantic attention network generates a semantic attention vector at a current time step according to a hidden layer vector outputted by the single LSTM network at a previous time step"; (Yang, Pg. 6 Paras. 7-8 and Pg. 8 second to last paragraph and Claim 3, teaches outputting the semantic attention vector when the hidden state of the first layer LSTM model at the previous moment, image target, and image theme are input into the semantic attention mechanism model wherein the semantic attention vector is at the current time, i.e., semantic attention network generates a semantic attention vector at a current time step according to a hidden layer vector outputted at a previous time step by one of the LSTM networks);
"wherein the visual attention network generates a visual attention vector at the current time step according to the hidden layer vector outputted by the single LSTM network at the previous time step"; (Yang, Pg. 6 Paras. 9-10 and Pg. 8 second to last paragraph and Claim 4, teaches outputting the visual attention vector when the hidden state of the first layer LSTM model at the previous moment and the visual features of the image are input into the visual attention mechanism model wherein the visual attention vector is at the current time, i.e., visual attention network generates a visual attention vector at a current time step according to a hidden layer vector outputted by one of the LSTM networks at a previous time step).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Chiu and Wu wherein the model comprises only one LSTM network by including the semantic and visual attention networks generating semantic and visual attention vectors respectively according to the hidden layer vector outputted by a LSTM network at a previous time step taught by Yang. One of ordinary skill in the art would be motivated to combine the references since it increases grammar readability (Yang, Pg. 4 final paragraph, teaches the motivation of combination to be to increase grammar readability of the captions).
However, the combination of references of Chiu in view of Wu and Yang does not explicitly teach "and inputting the semantic attention vector, the visual attention vector, and a caption word outputted by the single LSTM network at the previous time step, into the single LSTM network, wherein the single LSTM network generates a caption word at the current time step; and generating image caption information of the target image based on the caption words of the target image at n time steps".
In an analogous field of endeavor, Yang’642 teaches "and inputting the semantic attention vector, the visual attention vector, and a caption word outputted by the single LSTM network at the previous time step, into the single LSTM network, wherein the single LSTM network generates a caption word at the current time step";(Yang'642, Pgs. 17-18 and 20 and Claim 1, teaches the input of the attention based LSTM language model consists of three parts being St, Jt, and the output state of the nth layer LSTM at the previous moment wherein St represents the word generated by the language generation model and Jt represents the joint vector which is computed from the context vector which is derived from the joint attention network comprising visual attention information and semantic attention information in which the SoftMax layer is connected after the final layer of the LSTM model to output the word with the highest probability in the output at each moment is selected to be connected  given the hidden state of the LSTM, the joint vector, and the previous output word in order to output the probability of the output word, i.e., semantic attention and visual attention vectors are input to the LSTM being the joint vector and the caption word outputted by the LSTM at the previous time step is input to the LSTM being the previous output word in order for the LSTM to generate a caption word  at the current time step).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Chiu, Wu, and Yang by including the input of semantic attention vector information, visual attention vector information, and a caption word output from the LSTM at the previous timestep in order for the single LSTM to output the caption word for the current time step taught by Yang’642. One of ordinary skill in the art would be motivated to combine the references since it generates fine-grained image descriptions (Yang'642, Abstract, teaches, the motivation of combination to be to generate fine-grained image descriptions using visual and semantic attention).
However, the combination of references of Chiu in view of Wu, Yang, Yang’642, and Yang’642 does not explicitly teach "and generating image caption information of the target image based on the caption words of the target image at n time steps".
In an analogous field of endeavor, Wang teaches "and generating image caption information of the target image based on the caption words of the target image at n time steps"; (Wang, Pg. 13 Paras. 3-9 starting with "In order to solve...", teaches generating the image description based on words obtained by connection in series wherein the words are obtained by attention models at t time for each word, i.e., generating image caption information based on captions words of the image at n time steps).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Chiu, Wu, Yang, and Yang’642 by including the attention fusion of the semantic and visual features extracted in order to generate caption words for the image at the time steps by using a network and model taught by Wang. One of ordinary skill in the art would be motivated to combine the references since it enables generating an image description that enriches the summary of an input images with the semantic features (Wang, Abstract, teaches the motivation of combination to be to generate an image description that uses the summary of the input image and enriches it with visual semantics making it more capable of reacting to the content of the image).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Regarding Claim 2, the combination of references of Chiu in view of Wu, Yang, Yang’642, and Wang teaches "The method according to claim 1, wherein the performing attention fusion on semantic features of the target image and visual features of the target image at n time steps to obtain caption words of the target image at the n time steps comprises: inputting, at tth time step, the semantic feature set at the tth time step, the visual feature set at the tth time step, a hidden layer vector at the (t-1)th time step, and an output result of the attention fusion network at the (t-1)th time step into the attention fusion network, to obtain an output result of the attention fusion network at the tth time step and a hidden layer vector at the tth time step; (Wang, Pg. 13 Paras. 3-9 starting with "In order to solve...", teaches inputting visual attention characteristic at t time and semantic attention information at t time, i.e., inputting the semantic and visual feature set at the t time step, the hidden-layer state of the t-1 time, i.e., hidden layer vector at the t-1 time step, and the t-1 time semantic attention model generated word Wt-1, i.e., previous output result of the attention fusion network at the t-1 time step, wherein the network generates image description and continues with a Wt-1 word updated step, i.e., obtain an output result of the network and update hidden layer vector for a t time step).
Please note that the first input option has been selected for examination due to only the first or second option being required for rejection of the claim.
The proposed combination as well as the motivation for combining the Chiu in view of Wu, Yang, Yang’642, and Wang references presented in the rejection of Claim 1, applies to claim 2. Thus, the method recited in claim 2 is met by Chiu in view of Wu, Yang, Yang’642, and Wang.

Regarding Claim 3, the combination of references of Chiu in view of Wu, Yang, Yang’642, and Wang teaches "The method according to claim 2, the method further comprising: generating, at the tth time step, the semantic feature set and the visual feature set at the tth time step, based on the hidden layer vector, the semantic feature set and the visual feature set at the (t-1)th time step"; (Wang, Pg. 13 Paras. 3-9 starting with "In order to solve...", teaches generating semantic and visual information at t time based on hidden-layer state t-1 time and the image feature and t-1 time semantic attention model generated word Wt-1 from the word updating step, i.e., generate semantic and visual features at t time step from the hidden layer vector and the semantic and visual feature sets from the previous word or t-1 time step).
The proposed combination as well as the motivation for combining the Chiu in view of Wu, Yang, Yang’642, and Wang references presented in the rejection of Claim 1, applies to claim 3. Thus, the method recited in claim 3 is met by Chiu in view of Wu, Yang, Yang’642, and Wang.

 Regarding Claim 4, the combination of references of Chiu in view of Wu, Yang, Yang’642, and Wang teaches "The method according to claim 1, wherein the attention fusion network includes a hyperparameter for indicating weights of the visual feature set and the semantic feature set respectively in the attention fusion network"; (Wang, Pg. 14 lines 7-8 starting with "for each region…", teaches a vision attention distribution function to generate a weight according to image characteristic and semantic attention model for word generation, i.e., a hyperparameter to indicate weight for the visual attention set and the semantic attention set of the networks).
The proposed combination as well as the motivation for combining the Chiu in view of Wu, Yang, Yang’642, and Wang references presented in the rejection of Claim 1, applies to claim 4. Thus, the method recited in claim 4 is met by Chiu in view of Wu, Yang, Yang’642, and Wang.

Claim 8 recites a system or device with elements corresponding to the steps recited in Claim 1. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, and Wang references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, and Wang references discloses a computer device with a processor and a memory (for example, see Chiu, Paragraph 70).

Claim 9 recites a system or device with elements corresponding to the steps recited in Claim 2. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, and Wang references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, and Wang references discloses a computer device with a processor and a memory (for example, see Chiu, Paragraph 70).

Claim 10 recites a system or device with elements corresponding to the steps recited in Claim 3. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, and Wang references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, and Wang references discloses a computer device with a processor and a memory (for example, see Chiu, Paragraph 70).

Claim 11 recites a system or device with elements corresponding to the steps recited in Claim 4. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, and Wang references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, and Wang references discloses a computer device with a processor and a memory (for example, see Chiu, Paragraph 70).

Claim 15 recites a computer-readable storage medium storing a program with instructions corresponding to the steps recited in Claim 1.  Therefore, the recited programming instructions of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, and Wang references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, and Wang references discloses a computer readable storage medium (for example, see Chiu, Paragraph 79).

Claim 16 recites a computer-readable storage medium storing a program with instructions corresponding to the steps recited in Claim 2.  Therefore, the recited programming instructions of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, and Wang references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, and Wang references discloses a computer readable storage medium (for example, see Chiu, Paragraph 79).

Claim 17 recites a computer-readable storage medium storing a program with instructions corresponding to the steps recited in Claim 3.  Therefore, the recited programming instructions of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, and Wang references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, and Wang references discloses a computer readable storage medium (for example, see Chiu, Paragraph 79).

Claim 18 recites a computer-readable storage medium storing a program with instructions corresponding to the steps recited in Claim 4.  Therefore, the recited programming instructions of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, and Wang references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, and Wang references discloses a computer readable storage medium (for example, see Chiu, Paragraph 79).

Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chiu in view of Wu, Yang, Yang’642, Wang, and Li et al. (US 20150262037 A1).

Regarding Claim 5, the combination of references of Chiu in view of Wu, Yang, Yang’642, and Wang does not explicitly teach "The method according to claim 1, wherein the extracting a semantic feature set from the target image comprises: obtaining a semantic feature vector of the target image; and extracting the semantic feature set of the target image based on the semantic feature vector”.
In an analogous field of endeavor, Li teaches "The method according to claim 1, wherein the extracting a semantic feature set from the target image comprises: obtaining a semantic feature vector of the target image"; (Li, Para. 44, teaches obtaining semantic feature vectors which identify features of the content of the image, i.e., obtain a semantic feature vector of the target image);
"and extracting the semantic feature set of the target image based on the semantic feature vector"; (Li, Paras. 44-45, teaches the vectors representing its semantic feature sets and vectors identify features of the content of the image, i.e., extract semantic feature set of the target image based on the semantic feature vector).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Chiu, Wu, Yang, Yang’642, and Wang by including the obtaining of semantic feature vectors for extracting semantic features taught by Li. One of ordinary skill in the art would be motivated to combine the references since it enables automatic annotation of images and helps determine relationships (Li, Abstract, teaches the motivation of combination to be to automatically annotate images and determine relationships between individuals).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.
 
Claim 12 recites a system or device with elements corresponding to the steps recited in Claim 5. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, Wang, and Li references, presented in rejection of Claim 5, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, Wang, and Li references discloses a computer device with a processor and a memory (for example, see Chiu, Paragraph 70).

Claim 19 recites a computer-readable storage medium storing a program with instructions corresponding to the steps recited in Claim 5.  Therefore, the recited programming instructions of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, Wang, and Li references, presented in rejection of Claim 5, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, Wang, and Li references discloses a computer readable storage medium (for example, see Chiu, Paragraph 79).

Claims 6 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Chiu in view of Wu, Yang, Yang’642, Wang, Li, and Wang et al. (CN 108710607 B).

Regarding Claim 6, the combination of references of Chiu in view of Wu, Yang, Yang’642, Wang, and Li does not explicitly teach "The method according to claim 5, wherein the extracting the semantic feature set of the target image based on the semantic feature vector comprises: extracting an attribute word set corresponding to the target image from a lexicon based on the semantic feature vector, the attribute word set referring to a set of a candidate caption word describing the target image; and obtaining a word vector set corresponding to the attribute word set as the semantic feature set of the target image".
In an analogous field of endeavor, Wang’607 teaches "The method according to claim 5, wherein the extracting the semantic feature set of the target image based on the semantic feature vector comprises: extracting an attribute word set corresponding to the target image from a lexicon based on the semantic feature vector"; (Wang'607, Claim 1 and Pg. 9 last two lines starting with "tagging module..." , teaches a keyword extraction module for determining a keyword set, i.e., extracting an attribute word set, according to the word feature vector extracted by a lexicon training module, i.e., word set corresponds to a lexicon based on feature vectors);
"the attribute word set referring to a set of a candidate caption word describing the target image"; (Wang'607, Claim 1, teaches the keyword set corresponding to a candidate rewritten word library, i.e., attribute word set refers to set of candidate caption words);
"and obtaining a word vector set corresponding to the attribute word set as the semantic feature set of the target image"; (Wang'607, Claim 1, teaches extracting a word feature vector of the input word in the input word set, i.e., obtaining a word vector set corresponding to the attribute word set as the feature set).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Chiu, Wu, Yang, Yang’642, Wang, and Li wherein words correspond to a target image to describe it with semantic feature vectors by including the extraction of an attribute word set corresponding to feature vectors of a lexicon and the word set containing candidate options and obtaining a word vector set of the attribute word set as features taught by Wang’607. One of ordinary skill in the art would be motivated to combine the references since it increases automation of text writing while increasing quality (Wang'607, Abstract, teaches the motivation of combination to be to increase automation of text rewriting and increase quality of text style features).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.
 
Claim 13 recites a system or device with elements corresponding to the steps recited in Claim 6. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, Wang, Li, and Wang’607 references, presented in rejection of Claim 6, apply to this claim.  Finally, the combination of the Chiu, Wu, Yang, Yang’642, Wang, Li, and Wang’607 references discloses a computer device with a processor and a memory (for example, see Chiu, Paragraph 70).

Claims 7, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Chiu in view of Wu, Yang, Yang’642, Wang, and Yue et al. (US 20180101749 A1).

Regarding Claim 7, the combination of references of Chiu in view of Wu, Yang, Yang’642, and Wang does not explicitly teach "The method according to claim 1, further comprises: dividing the target image into sub-regions to obtain at least one sub-region; "and the extracting a visual feature set from the target image comprises: extracting visual features of the at least one sub-region respectively to form the visual feature set".
In an analogous field of endeavor, Yue teaches "The method according to claim 1, further comprises: dividing the target image into sub-regions to obtain at least one sub-region"; (Yue, Para. 13, teaches dividing a color image in an orientation into multiple sub-regions, i.e., dividing the target image into sub-regions to obtain at least one sub-region);
"and the extracting a visual feature set from the target image comprises: extracting visual features of the at least one sub-region respectively to form the visual feature set"; (Yue, Para. 72, teaches performing feature extraction to obtain features of each sub-region to form a feature set, i.e., extracting visual features of the at least one sub-region to form the visual feature set).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Chiu, Wu, Yang, Yang’642, and Wang by including the division of an image into sub-regions to extract a visual feature set taught by Yue. One of ordinary skill in the art would be motivated to combine the references since it increase feature robustness and identifying accuracy (Yue, Para. 6, teaches the motivation of combination to be to increase feature robustness and increase identifying accuracy).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 14 recites a system or device with elements corresponding to the steps recited in Claim 7. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, Wang, and Yue references, presented in rejection of Claim 7, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, Wang, and Yue references discloses a computer device with a processor and a memory (for example, see Chiu, Paragraph 70).

Claim 20 recites a computer-readable storage medium storing a program with instructions corresponding to the steps recited in Claim 7.  Therefore, the recited programming instructions of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Chiu in view of Wu, Yang, Yang’642, Wang, and Yue references, presented in rejection of Claim 7, apply to this claim.  Finally, the combination of the Chiu in view of Wu, Yang, Yang’642, Wang, and Yue references discloses a computer readable storage medium (for example, see Chiu, Paragraph 79). 

	Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW STEVEN BUDISALICH whose telephone number is (703)756-5568. The examiner can normally be reached Monday - Friday 8:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached on (571) 272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx
for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANDREW S BUDISALICH/Examiner, Art Unit 2662

/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Nov 29, 2022
Application Filed
Mar 24, 2025
Non-Final Rejection — §103
Jun 30, 2025
Response Filed
Jul 03, 2025
Final Rejection — §103
Sep 05, 2025
Response after Non-Final Action
Sep 25, 2025
Request for Continued Examination
Oct 02, 2025
Response after Non-Final Action
Nov 04, 2025
Non-Final Rejection — §103
Feb 04, 2026
Response Filed
Mar 09, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/342,892
Patent 12602820
METHOD AND APPARATUS WITH ATTENTION-BASED OBJECT ANALYSIS
2y 5m to grant Granted Apr 14, 2026
18/038,197
Patent 12597106
METHOD AND APPARATUS FOR IDENTIFYING DEFECT GRADE OF BAD PICTURE, AND STORAGE MEDIUM
2y 5m to grant Granted Apr 07, 2026
18/215,428
Patent 12592078
VIDEO MONITORING DEVICE, VIDEO MONITORING SYSTEM, VIDEO MONITORING METHOD, AND STORAGE MEDIUM STORING VIDEO MONITORING PROGRAM
2y 5m to grant Granted Mar 31, 2026
18/333,890
Patent 12586232
METHOD FOR OBJECT DETECTION USING CROPPED IMAGES
2y 5m to grant Granted Mar 24, 2026
17/954,417
Patent 12567151
Microscopy System and Method for Instance Segmentation
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
78%
Grant Probability
87%
With Interview (+8.9%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 46 resolved cases by this examiner. Grant probability derived from career allow rate.