Last updated: April 19, 2026
Application No. 18/688,741
IMAGE SCENE RECOGNITION METHOD AND APPARATUS

Non-Final OA §101§102§103§112
Filed
Mar 01, 2024
Examiner
CHAN, CAROL WANG
Art Unit
2672
Tech Center
2600 — Communications
Assignee
Shanghai Bilibili Technology Co. Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +36.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 351 resolved cases, 2023–2026
Examiner Intelligence

CHAN, CAROL WANG View full profile →
Grants 83% — above average
Career Allow Rate
292 granted / 351 resolved
+21.2% vs TC avg
Strong +36% interview lift
Without
With
+36.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 6m
Avg Prosecution
19 currently pending
Career history
370
Total Applications
across all art units
Statute-Specific Performance

§101
10.8%
-29.2% vs TC avg
§103
38.7%
-1.3% vs TC avg
§102
19.9%
-20.1% vs TC avg
§112
24.1%
-15.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 351 resolved cases
Office Action

§101 §102 §103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/05/2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Objections
Claim 4 is objected to because of the following informalities: Lines 2-4 recite “wherein the encoding the at least one target visual element in a preset encoding manner and generating an encoding vector of the at least one target visual element”, which Examiner suggests amending to “wherein the encoding the at least one target visual element in the preset encoding manner and generating the encoding vector of the at least one target visual element”.  Appropriate correction is required.
Claim 5 is objected to because of the following informalities: Line 2 recites “wherein the determining an encoding value at each encoding position”, which Examiner suggests amending to “wherein the determining the encoding value at each encoding position”.  Appropriate correction is required.
Claim 7 is objected to because of the following informalities: Lines 3-4 recite “wherein the calculating at least one of a recognition accuracy or a recall rate”, which Examiner suggests amending to “wherein the calculating the at least one of the recognition accuracy or the recall rate”.   Appropriate correction is required.
Claim 9 is objected to because of the following informalities: Lines 1-3 recite “wherein the inputting the at least one sample visual element into an initial recognition model to obtain a predicted scene category output from the initial recognition model”, which Examiner suggests amending to “wherein the inputting the at least one sample visual element into the initial recognition model to obtain the predicted scene category output from the initial recognition model”.  Appropriate correction is required.
Claim 16 is objected to because of the following informalities: Lines 2-4 recite “wherein the encoding the at least one target visual element in a preset encoding manner and generating an encoding vector of the at least one target visual element”, which Examiner suggests amending to “wherein the encoding the at least one target visual element in the preset encoding manner and generating the encoding vector of the at least one target visual element”.  Appropriate correction is required.
Claim 17 is objected to because of the following informalities: Lines 1-2 recite “wherein the determining an encoding value at each encoding position”, which Examiner suggests amending to “wherein the determining the encoding value at each encoding position”.  Appropriate correction is required.
Claim 21 is objected to because of the following informalities: Lines 2-4 recite “wherein the encoding the at least one target visual element in a preset encoding manner and generating an encoding vector of the at least one target visual element”, which Examiner suggests amending to “wherein the encoding the at least one target visual element in the preset encoding manner and generating the encoding vector of the at least one target visual element”.  Appropriate correction is required.
Claim 22 is objected to because of the following informalities: Line 2 recites “wherein the determining an encoding value at each encoding position”, which Examiner suggests amending to “wherein the determining the encoding value at each encoding position”.  Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 3-9, 11, 12, and 14-23 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation "the at least one target visual element" in Line 9.  There is insufficient antecedent basis for this limitation in the claim as there is no earlier mention of at least one target visual element.  Examiner suggests amending the limitation to “at least one target visual element” (deleting “the”) and has interpreted the limitation as such.
Claims 3-8 depend on claim 1 and thus are also rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite.
Claim 9 recites the limitation "the preset encoding manner" in Line 4.  There is insufficient antecedent basis for this limitation in the claim as there is no earlier mention of a preset encoding manner.  Examiner suggests amending the limitation to “a preset encoding manner” and has interpreted the limitation as such.
Claim 11 recites the limitation "the at least one target visual element" in Line 12.  There is insufficient antecedent basis for this limitation in the claim as there is no earlier mention of at least one target visual element.  Examiner suggests amending the limitation to “at least one target visual element” (deleting “the”) and has interpreted the limitation as such.
Claim 12 recites the limitation "the at least one target visual element" in Line 12.  There is insufficient antecedent basis for this limitation in the claim as there is no earlier mention of at least one target visual element.  Examiner suggests amending the limitation to “at least one target visual element” (deleting “the”) and has interpreted the limitation as such.
Claim 14 recites the limitation "the visual elements" in Line 3.  There is insufficient antecedent basis for this limitation in the claim as it is unclear as to which visual elements are being referred to, the plurality of visual elements disclosed in claim 1 or the visual elements disclosed earlier in Line 2 of claim 14.  Examiner suggests clarifying the limitation, for example, amending the limitation “visual elements” in Lines 2 and 3 to “different visual elements”, and has interpreted the limitation as such.
Claims 15-18 depend on claim 11 and thus are also rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite.
Claim 19 recites the limitation "the visual elements" in Line 3.  There is insufficient antecedent basis for this limitation in the claim as it is unclear as to which visual elements are being referred to, the plurality of visual elements disclosed in claim 11 or the visual elements disclosed earlier in Line 2 of claim 19.  Examiner suggests clarifying the limitation, for example, amending the limitation “visual elements” in Lines 2 and 3 to “different visual elements”, and has interpreted the limitation as such.
Claims 20-22 depend on claim 12 and thus are also rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite.
Claim 23 recites the limitation "the visual elements" in Line 3.  There is insufficient antecedent basis for this limitation in the claim as it is unclear as to which visual elements are being referred to, the plurality of visual elements disclosed in claim 12 or the visual elements disclosed earlier in Line 2 of claim 23.  Examiner suggests clarifying the limitation, for example, amending the limitation “visual elements” in Lines 2 and 3 to “different visual elements”, and has interpreted the limitation as such.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 11, and 12 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The claim(s) recite(s) steps of recognizing a plurality of visual elements in the to-be recognized image, determining related visual elements among the plurality of visual elements, clustering the related visual elements to obtain at least one target visual element, and generating a scene category corresponding to the to-be-recognized image which may be performed practically in the human mind as a mental process by mentally observing and identifying visual elements in the image, mentally evaluating the visual elements to identify related visual elements, mentally grouping related visual elements together, and mentally performing a judgement of a scene category corresponding to the image.  This judicial exception is not integrated into a practical application.  The additional element of obtaining a to-be-recognized image amounts to mere data gathering and output recited at a high level of generality and thus is insignificant extra-solution activity.  The additional elements of inputting the to-be-recognized image into a target visual element detection model, using the target visual element detection model, performing semantic analysis, inputting the at least one target visual element into a scene recognition model, and using the scene recognition model are used to generally apply the abstract idea without limiting how the target visual element detection model, the semantic analysis, and the scene recognition model functions.  The target visual element detection model, the semantic analysis, and the scene recognition model are described at a high level such that it amounts to using a computer with generic models to apply the abstract idea and merely recites the outcomes of recognizing a plurality of visual elements, determining related visual elements, and generating a scene category, respectively, without any details about how the outcomes are accomplished.  The additional elements of a memory or non-transitory computer-readable storage medium and processor (claims 11 and 12) are recited at a high level of generality and amount to no more than mere instructions to apply the exception using a generic computer.
The claim(s) do not include additional elements that are sufficient to amount to significantly more than the judicial exception.  As discussed above with respect to integration of the judicial exception into a practical application, the additional elements of inputting the to-be-recognized image into a target visual element detection model, using the target visual element detection model, performing semantic analysis, inputting the at least one target visual element into a scene recognition model, using the scene recognition model, and the memory or non-transitory computer-readable storage medium and processor (claims 11 and 12) are at best mere instructions to apply the exception using a generic computer component.  The additional element of obtaining a to-be-recognized image is recited at a high level of generality that amounts to obtaining and receiving data over a network and are well-understood, routine, conventional activity.  Even when considered in combination, these additional elements represent mere instructions to implement an abstract idea or other exception on a computer and insignificant extra-solution activity, which do not provide an inventive concept.  Thus, claims 1, 11, and 12 are not patent eligible.  
The dependent claims 3-9 and 14-23 also do not include elements that amount to significantly more than just the abstract idea or integrate the abstract idea into a practical application.  Accordingly, claims 3-9 and 14-23 are also not patent eligible.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 3, 11, 12, 14, 15, 19, 20, and 23 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhou et al. (see translated version of CN 111597921).
With regards to claim 11, Zhou et al. discloses a computing device, comprising: 
a memory and a processor, wherein the memory is configured to store computer- executable instructions, and wherein the computer-executable instructions upon execution by the processor cause the processor to implement operations (Para. 0155 lines 1-14, "processor" "memory") comprising:
obtaining a to-be-recognized image (Para. 0058 line 1, "image to be recognized"); 
inputting the to-be-recognized image into a target visual element detection model (Para. 0065 lines 1-10, 0066 lines 1-15, "image recognition model");
recognizing a plurality of visual elements in the to-be-recognized image by the target visual element detection model (Para. 0063 line 1, "multiple target objects");
determining related visual elements among the plurality of visual elements by performing semantic analysis on the plurality of visual elements (Para. 0067 lines 1-2, 0095 lines 1-6, 0099 lines 1-7, 0104 lines 1-2, 0105 lines 1-3, 0106 lines 1-9, "object group");
clustering the related visual elements to obtain the at least one target visual element (Para. 0067 lines 1-2, 0068 lines 12-13, 0069 lines 1-7, 0070 lines 1-3, "object groups" "object group features");
inputting the at least one target visual element into a scene recognition model (Para. 0074 lines 1-2, 0077 lines 1-12, "classifier"); and 
generating a scene category corresponding to the to-be-recognized image by the scene recognition model (Para. 0074 lines 1-2, 0076 lines 1-4, 0077 lines 1-12, 0078 lines 1-3 and 9-11 and 16-17, "scene type").
With regards to claim 15, Zhou et al. discloses the computing device according to claim 11, the operations further comprising:
encoding the at least one target visual element in a preset encoding manner and generating an encoding vector of the at least one target visual element (Para. 0067 lines 1-2, 0068 lines 12-13, 0069 lines 1-7, 0070 lines 1-3, 0111 lines 1-3, 0112 lines 5-6, "object group matrix"); and 
inputting the encoding vector of the at least one target visual element into the scene recognition model and generating the scene category corresponding to the to-be-recognized image by the scene recognition model (Para. 0074 lines 1-2, 0076 lines 1-4, 0077 lines 1-12, 0078 lines 1-3 and 9-11 and 16-17,  0111 lines 4-6, 0112 lines 9-11, "scene type").
With regards to claim 19, Zhou et al. discloses the computing device according to claim 11, wherein the scene recognition model is trained to learn a relationship between visual elements and generate a scene category in which the visual elements coexist (Para. 0069 lines 1-12, 0076 lines 1-4, 0077 lines 1-7, 0078 lines 4-11, 0147 lines 1-3, 0150 lines 1-10, "train").
With regards to claims 1, 3, and 14, they recite the functions of the apparatus of claims 11, 15, and 19 respectively, as processes.  Thus, the analyses in rejecting claims 11, 15, and 19 are equally applicable to claims 1, 3, and 14, respectively.
With regards to claims 12, 20, and 23, they recite the apparatus of claims 11, 15, and 19, respectively, as a non-transitory computer-readable storage medium, storing computer-executable instructions, wherein when the computer-executable instructions are executed by a processor, the computer-executable instructions cause the processor to implement operations.  Zhou et al. discloses the non-transitory computer-readable storage medium (Para. 0155 lines 1-14, "memory" "processor").  Thus, the analyses in rejecting claims 11, 15, and 19 are equally applicable to claims 12, 20, and 23, respectively.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 6 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou et al. (see translated version of CN 111597921) in view of Fan et al. (US 2021/0406525).
With regards to claim 6, Zhou et al. discloses the image scene recognition method according to claim 1.
Zhou et al. does not explicitly teach wherein before inputting the to-be-recognized image into the target visual element detection model, and determining at least one target visual element comprised in the to-be-recognized image, the image scene recognition method further comprises: obtaining at least one visual element detection model, and obtaining a set of test images; calculating at least one of a recognition accuracy or a recall rate corresponding to each of the at least one visual element detection model using the set of test images; and selecting the target visual element detection model from the at least one visual element detection model based on the at least one of the recognition accuracy or the recall rate.
However, Fan et al. discloses the concept of before using a detection model (Para. 0035 lines 1-4, 0044 lines 1-3, 0048 lines 1-3, "pre-trained" "before"), obtaining and training an initial detection model, obtaining a set of test images, calculating a recognition accuracy rate to the initial detection model using the set of test images, and selecting the detection model based on the recognition accuracy rate (Para. 0050 lines 1-3, 0051 lines 1-4, 0055 lines 1-6, 0056 lines 1-7, "test set" "accuracy rate" "used as a trained neural network model") in order to avoid using a model that is under-fitted (Para. 0057 lines 5-11, "under-fitting" "practical application").  
It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to include the concept of before using a detection model, obtaining and training an initial detection model, obtaining a set of test images, calculating a recognition accuracy rate to the initial detection model using the set of test images, and selecting the detection model based on the recognition accuracy rate as taught by Fan et al. into the image scene recognition method of Zhou et al.  The motivation for this would be to avoid using a model that is under-fitted.
With regards to claim 7, the combination of Zhou et al. and Fan et al. discloses the image scene recognition method according to claim 6, wherein the set of test images comprises a plurality of test images, and each test image carries visual element labels (Fan et al.: Para. 0050 lines 1-3, 0055 lines 4-6, "test images"); and wherein the calculating at least one of a recognition accuracy or a recall rate corresponding to each of the at least one visual element detection model using the set of test images comprises: for each test image, inputting the test image into a reference visual detection model to obtain predicted visual elements output from the reference visual detection model, wherein the reference visual detection model is any one of the at least one visual element detection model (Fan et al.: Para. 0055 lines 1-6, "outputted by the initial neural network model"); and calculating at least one of a recognition accuracy or a recall rate of the reference visual detection model based on the visual element labels of each test image and corresponding predicted visual elements (Fan et al.: Para. 0056 lines 1-7, "accuracy rate").
Allowable Subject Matter
Claims 4, 5, 8, 9, 16-18, 21, and 22 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, and 35 U.S.C. 101 set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
With regards to claims 4, 16, and 21, Zhou et al. (CN 111597921) discloses encoding the at least one target visual element and generating an encoding vector, however, there is no mention of where each of the at least one target visual element carries an element probability, and where the encoding the at least one target visual element in a preset encoding manner and generating an encoding vector of the at least one target visual element comprises: determining a vector length of the encoding vector based on an element quantity of preset visual elements and determining an encoding value at each encoding position in the encoding vector having the vector length based on the at least one target visual element and the element probability corresponding to each of the at least one target visual element.  Zou et al. (US 2022/0058429) discloses the concept of vector encoding visual attributes, however, there is no mention of the rest of the limitations of the claim.  Thus, while different prior arts disclose parts of the claim, none of the prior arts disclose or have reasonable motivation to combine to disclose all of the limitations of the claim as a whole.
With regards to claim 5, it is dependent on claim 4.
With regards to claim 17, it is dependent on claim 16.
With regards to claim 22, it is dependent on claim 21.
With regards to claims 8 and 18, Zhou et al. (CN 111597921) discloses where the scene recognition model is obtained through a training process comprising obtaining a set of sample images, wherein the set of sample images comprises sample images belonging to at least two different scene categories and each sample image carries a corresponding scene category label, iterating training operations and obtaining the scene recognition model which has completed the training process.  However, there is no mention of, for each sample image, inputting the sample image into the target visual element detection model to obtain at least one sample visual element, inputting the at least one sample visual element into an initial recognition model to obtain a predicted scene category output from the initial recognition model, and calculating a loss value corresponding to the sample image based on the predicted scene category and a scene category label corresponding to the sample image, and determining an average loss value of loss values corresponding to the sample images, adjusting a model parameter of the initial recognition model based on the average loss value, returning to perform an operation of obtaining a sample image set.  Fan et al. (US 2021/0406525) discloses the concept of training a neural network model using training images, however, there is no mention of, for each sample image, inputting the sample image into the target visual element detection model to obtain at least one sample visual element, inputting the at least one sample visual element into an initial recognition model to obtain a predicted scene category output from the initial recognition model, and calculating a loss value corresponding to the sample image based on the predicted scene category and a scene category label corresponding to the sample image, and determining an average loss value of loss values corresponding to the sample images, adjusting a model parameter of the initial recognition model based on the average loss value, returning to perform an operation of obtaining a sample image set.  Thus, while different prior arts disclose parts of the claim, none of the prior arts disclose or have reasonable motivation to combine to disclose all of the limitations of the claim as a whole.
With regards to claim 9, it is dependent on claim 8.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Applicants are directed to consider additional pertinent prior art included on the Notice of References Cited (PTOL 892) attached herewith.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CAROL W CHAN whose telephone number is (571)272-5766. The examiner can normally be reached 9:30-3:30 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached at (571) 272-3638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CAROL W CHAN/Primary Examiner, Art Unit 2672
Read full office action
Prosecution Timeline

Mar 01, 2024
Application Filed
Feb 19, 2026
Non-Final Rejection — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/162,180
Patent 12579803
TOTAGRAPHY FOR SUPERRESOLUTION IMAGING AND SIGNAL PROCESSING OF POSITIVE, REAL-VALUED IMAGES AND SIGNALS
2y 5m to grant Granted Mar 17, 2026
18/229,788
Patent 12573205
ELECTRONIC DEVICE AND METHOD FOR VEHICLE WHICH ENHANCES DRIVING ENVIRONMENT RELATED FUNCTION
2y 5m to grant Granted Mar 10, 2026
18/373,729
Patent 12573240
LIGHT SOURCE SPECTRUM AND MULTISPECTRAL REFLECTIVITY IMAGE ACQUISITION METHODS AND APPARATUSES, AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 10, 2026
18/454,284
Patent 12573206
BIRD’S-EYE VIEW ADAPTIVE INFERENCE RESOLUTION
2y 5m to grant Granted Mar 10, 2026
19/292,827
Patent 12567237
OBJECT EVALUATION METHOD, OBJECT EVALUATION DEVICE, NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
99%
With Interview (+36.2%)
2y 6m
Median Time to Grant
Low
PTA Risk
Based on 351 resolved cases by this examiner. Grant probability derived from career allow rate.
IMAGE SCENE RECOGNITION METHOD AND APPARATUS

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email