Last updated: April 18, 2026
Application No. 18/551,258
MODEL TRAINING METHOD, SCENE RECOGNITION METHOD, AND RELATED DEVICE

Final Rejection §103
Filed
Sep 19, 2023
Examiner
LIU, XIAO
Art Unit
2664
Tech Center
2600 — Communications
Assignee
Huawei Technologies Co., Ltd.
OA Round
2 (Final)
Interview Optional

— +11.5% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 290 resolved cases, 2023–2026
Examiner Intelligence

LIU, XIAO View full profile →
Grants 89% — above average
Career Allow Rate
257 granted / 290 resolved
+26.6% vs TC avg
Moderate +12% lift
Without
With
+11.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
44 currently pending
Career history
334
Total Applications
across all art units
Statute-Specific Performance

§101
8.8%
-31.2% vs TC avg
§103
50.9%
+10.9% vs TC avg
§102
17.0%
-23.0% vs TC avg
§112
17.4%
-22.6% vs TC avg
Black line = Tech Center average estimate • Based on career data from 290 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
	Applicant’s amendments filed on 12/03/2025 to the claims have overcome claim objections and claim rejections under 35 U.S.C. 112(b) as preciously set forth in the Non-Final Rejection Office Action mailed on 09/23/2025.
Claim Objections
Claim 1 is objected to because of the following informalities:
In claim 1, lines 5, 7, 8 and 10, the underlines at the beginning of claim limitations should be removed.  
Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 5, 9-10 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guo (CN 112348117 A) in view of Girard et al (US 20210142113 A1), hereinafter Girard.
-Regarding claim 1, Guo discloses a model training method, applied to a training device and comprising (Abstract; FIGS. 1-12): obtaining a first training data set, wherein the first training data set comprises a plurality of first images (FIG. 2, step 202); for each first image in the plurality of first images (FIG. 2, steps 202-204): recognizing a first region in the first image by using an object detection model (Page 11, 2nd paragraph, “through the target identification model, identifying the target in the training sample image”), wherein the first region is an image region irrelevant to scene recognition (Page 10, last paragraph, “identify the foreground region of the training sample image”; Page 11, 2nd paragraph, “determining the target area of the target in the training sample image”); performing masking on the first region to obtain a third image (FIG. 2, step 204; Page 11, 2nd paragraph, “removing the content in the target area to obtain the background sample image”; Page 10, last paragraph, “determining the mask corresponding to the background region according to the foreground region”; FIG. 3B); and training a first convolutional neural network (Page 13, 2nd paragraph, “scene recognition model can be a deep neural network model … CNN”) by using a first training set (FIG. 2, step 206), and training a second convolutional neural network (Page 13, 2nd paragraph) by using a data set of the third image to obtain a scene recognition model (FIG. 2, step 206), wherein the scene recognition model comprises the trained first convolutional neural network and the trained second convolutional neural network (FIGS. 2, 4).
Guo does not disclose obtaining a plurality of sample object images generated by an image generative model, wherein a sample object image is an image of an object irrelevant to the scene recognition; respectively replacing the masked first region in the third image with the plurality of sample object images to obtain a plurality of target images. Guo does not disclose using a data set of the target images to train the first convolutional neural network. However, it is a common practice to use training data augmentation to tarin neural networks.
In the same field of endeavor,  Girard teaches a method for augmenting a training image base representing a print on a background to train a convolutional neural network (CNN) (Girard: Abstract; FIGS. 1-7). Girard further teaches obtaining a plurality of sample object images generated by an image generative model, wherein a sample object image is an image of an object irrelevant to the scene recognition; respectively replacing the masked first region in the third image with the plurality of sample object images to obtain a plurality of target images (Girard: FIGS. 2-6; [0070], “replace the print”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Guo with the teaching of Girard by using training data augmentation in order to improve the performance of neural network training.
-Regarding claim 5, Guo discloses a method, applied to an execution device and comprising (Abstract; FIGS. 1-12; Note: it is known that in the stages of testing, or validation, or production of machine learning (ML) models such as scene recognition models, object detection models, etc., the same processing procedures will be performed as those in the training stage except no training for the models. The trained ML models will be applied): obtaining a to-be-recognized first scene image (FIG. 2, step 202); detecting, by using an object detection model (Page 11, 2nd paragraph, “through the target identification model, identifying the target in … sample image”); recognizing a first region in a first image by using an object detection model (Page 11, 2nd paragraph, “through the target identification model, identifying the target in … sample image, determining the target area of the target in the training sample image”), a first region in which an object irrelevant to scene recognition is located in the first scene image (Page 10, last paragraph, “identify the foreground region of … sample image”; Page 11, 2nd paragraph, “determining the target area of the target in the … sample image”); performing masking on the first region to obtain a second scene image (FIG. 1, step 204; Page 11, 2nd paragraph, “removing the content in the target area to obtain the background sample image”; Page 10, last paragraph, “determining the mask corresponding to the background region according to the foreground region”; FIG. 3B); inputting the first scene image to a first convolutional neural network (Page 13, 2nd paragraph, “scene recognition model can be a deep neural network model … CNN”) in a scene recognition model (FIG. 2, step 206); wherein the scene recognition model comprises the trained first convolutional neural network and the trained second convolutional neural network (FIGS. 2, 4); inputting the second scene image to the second convolutional neural network (Page 13, 2nd paragraph) in the scene recognition model (FIG. 1, step 206); and outputting a classification result by using the scene recognition model (FIGS. 2, 4), wherein the first convolutional neural network is obtained by training by using a first training set (FIG. 2, step 206), the second convolutional neural network is obtained by training by using a data set of a third image (FIG. 2, step 206), and the third image is obtained by recognizing a first region that is in the first image and irrelevant to scene recognition by using the object detection model (Page 11, 2nd paragraph, “through the target identification model, identifying the target in the training sample image, determining the target area of the target in the training sample image”), and then performing masking on the first region, and the first image is an image in a training data set (FIG. 2, step 204; Page 11, 2nd paragraph, “removing the content in the target area to obtain the background sample image”; Page 10, last paragraph, “determining the mask corresponding to the background region according to the foreground region”; FIG. 3B).
Guo does not disclose obtaining a plurality of sample object images generated by an image generative model, wherein a sample object image is an image of an object irrelevant to the scene recognition; respectively replacing the masked first region in the third image with the plurality of sample object images to obtain a plurality of target images. Guo does not disclose using a data set of the target images to train the first convolutional neural network. However, it is a common practice to use training data augmentation to tarin neural networks.
In the same field of endeavor,  Girard teaches a method for augmenting a training image base representing a print on a background to train a convolutional neural network (CNN) (Girard: Abstract; FIGS. 1-7). Girard further teaches obtaining a plurality of sample object images generated by an image generative model, wherein a sample object image is an image of an object irrelevant to the scene recognition; respectively replacing the masked first region in the third image with the plurality of sample object images to obtain a plurality of target images (Girard: FIGS. 2-6; [0070], “replace the print”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Guo with the teaching of Girard by using training data augmentation in order to improve the performance of neural network training.
-Regarding claim 10, Guo discloses an electronic device, comprising: at least one processor; and at least one memory coupled to the at least one processor and storing programming instructions for execution by the at least one processor to perform operations (Page 4, 4th – 5th paragraphs; FIGS. 11-12) comprising (Abstract; FIGS. 1-12; Note: it is known that in the stages of testing, or validation, or production of machine learning (ML) models such as scene recognition models, object detection models, etc., the same processing procedures will be performed as those in the training stage except no training for the models. The trained ML models will be applied): obtaining a to-be-recognized first scene image (FIG. 2, step 202); detecting, by using an object detection model (Page 11, 2nd paragraph, “through the target identification model, identifying the target in … sample image”); a first region in which an object irrelevant to scene recognition is located in the first scene image (Page 10, last paragraph, “identify the foreground region of … sample image”; Page 11, 2nd paragraph, “determining the target area of the target in the … sample image”); performing masking on the first region to obtain a second scene image (FIG. 1, step 204; Page 11, 2nd paragraph, “removing the content in the target area to obtain the background sample image”; Page 10, last paragraph, “determining the mask corresponding to the background region according to the foreground region”; FIG. 3B); and inputting the first scene image to a first convolutional neural network (Page 13, 2nd paragraph, “scene recognition model can be a deep neural network model … CNN”) in a scene recognition model (FIG. 2, step 206), wherein the scene recognition model comprises the trained first convolutional neural network and the trained second convolutional neural network (FIGS. 2, 4); inputting the second scene image to the second convolutional neural network (Page 13, 2nd paragraph) in the scene recognition model (FIG. 1, step 206); and outputting a classification result by using the scene recognition model (FIGS. 2, 4), wherein the first convolutional neural network is obtained by training by using a first training set (FIG. 2, step 206), the second convolutional neural network is obtained by training by using a data set of third images (FIG. 2, step 206), and each third image is obtained by recognizing a first region that is in a first image and irrelevant to scene recognition by using the object detection model (Page 11, 2nd paragraph, “through the target identification model, identifying the target in the training sample image, determining the target area of the target in the training sample image”), and then performing masking on the first region, and the first image is an image in a training data set (FIG. 2, step 204; Page 11, 2nd paragraph, “removing the content in the target area to obtain the background sample image”; Page 10, last paragraph, “determining the mask corresponding to the background region according to the foreground region”; FIG. 3B).
Guo does not disclose obtaining a plurality of sample object images generated by an image generative model, wherein a sample object image is an image of an object irrelevant to the scene recognition; respectively replacing the masked first region in the third image with the plurality of sample object images to obtain a plurality of target images. Guo does not disclose using a data set of the target images to train the first convolutional neural network. However, it is a common practice to use training data augmentation to tarin neural networks.
In the same field of endeavor,  Girard teaches a method for augmenting a training image base representing a print on a background to train a convolutional neural network (CNN) (Girard: Abstract; FIGS. 1-7). Girard further teaches obtaining a plurality of sample object images generated by an image generative model, wherein a sample object image is an image of an object irrelevant to the scene recognition; respectively replacing the masked first region in the third image with the plurality of sample object images to obtain a plurality of target images (Girard: FIGS. 2-6; [0070], “replace the print”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Guo with the teaching of Girard by using training data augmentation in order to improve the performance of neural network training.
-Regarding claims 9 and 16, Guo in view of Girard teaches the method of claim 5, and the device of claim 10. The combination further teaches receiving the to-be-recognized first scene image sent by user equipment; or collecting the to-be-recognized first scene image through a camera or an image sensor (Guo:  FIGS. 1, 8, 11-12).
Claim(s) 4, 6 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guo (CN 112348117 A) in view of Girard et al (US 20210142113 A1), hereinafter Girard, and further in view of Liu et al (WO 2021043112 A1), hereinafter Liu
-Regarding claims 4, 6 and 13, Guo in view of Girard teaches the method of claim 1, the method of claim 5, and the device of claim 10. The combination further teaches extracting an image feature of the target image through a first convolutional layer of the first convolutional neural network; extracting an image feature of the third image through a second convolutional layer of the second convolutional neural network (Guo: FIGS. 4-5; Page 12, last paragraph; Page 13. 2nd paragraph).
Guo in view of Girard does not teach outputting the image feature of the third image to the first convolutional layer to fuse with the image feature of the target image; and outputting, through an output layer of the first convolutional neural network, the label of the first category based on a fused image feature. However, Guo in view of Girard does teach using difference between difference between the first scene recognition result and the second scene recognition result (Guo: FIG. 2. Steps 208-210). A person of ordinary skills in the art would understand that fused image features are used for the scene recognition model.
However, Liu is an analogous art pertinent to the problem to be solved in this application and teaches an image classification method based on a target region to be identified and a background region (Liu: Abstract; FIGS. 1-15). Liu further teaches outputting the image feature of the third image to the first convolutional layer to fuse with the image feature of the target image; and outputting, through an output layer of the first convolutional neural network, the label of the first category based on a fused image feature (Liu: FIGS. 11-12).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Guo in view of Girard with the teaching of Liu by using the fused image features in order to improve the accuracy of the recognition results.
	Claim(s) 7-8 and 14-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Guo (CN 112348117 A) in view of Girard et al (US 20210142113 A1), hereinafter Girard, and further in view of Zhang et al (WO 2020042490 A1), hereinafter Zhang.
	-Regarding claims 7 and 14, Guo in view of Girard teaches the method of claim 5, and the device of claim 10.
Guo in view of Girard does not teach adjusting a noise reduction mode of the headset to the first noise reduction mode based on the classification result; or sending the classification result to the user equipment, wherein the classification result is used to trigger the user equipment to adjust a noise reduction mode of the headset to the first noise reduction mode.
However, Zhang is an analogous art pertinent to the problem to be solved in this application and teaches an earphone far-field interaction method (Zhang: Abstract; FIGS. 1-8). Zhang further teaches adjusting a noise reduction mode of the headset to the first noise reduction mode based on the classification result; or sending the classification result to the user equipment, wherein the classification result is used to trigger the user equipment to adjust a noise reduction mode of the headset to the first noise reduction mode (Zhang: Page 13, 1st paragraph, “the far-field pickup capability of the headset's far-field interactive accessories can be adjusted, so that the best direction recognition and noise reduction processing can be provided in different sports scenes”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Guo in view of Girard with the teaching of Zhang by using adjusting a noise reduction mode of the headset to the first noise reduction mode based on the classification result; or sending the classification result to the user equipment, wherein the classification result is used to trigger the user equipment to adjust a noise reduction mode of the headset to the first noise reduction mode in order to provide a real world application of scene recognition. Please note that simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality is not an inventive concept.
-Regarding claims 8 and 15, Guo in view of Girard teaches the method of claim 5, and the device of claim 10.
Guo in view of Girard does not teach adjusting system volume of the electronic device to the first volume value based on the classification result; or sending the classification result to user equipment, wherein the classification result is used to trigger the user equipment to adjust system volume of the user equipment to the first volume value.
However, Zhang is an analogous art pertinent to the problem to be solved in this application and teaches an earphone far-field interaction method (Zhang: Abstract; FIGS. 1-8). Zhang further teaches adjusting system volume of the electronic device to the first volume value based on the classification result; or sending the classification result to user equipment, wherein the classification result is used to trigger the user equipment to adjust system volume of the user equipment to the first volume value (Zhang: Page 12, 2nd paragraph, “in different application scenarios, the headset far-field interaction accessory can adjust the volume and noise reduction effect of the headset far-field interaction accessory. For example, in relatively noisy places, such as on the street”; Page 13, 1st paragraph, “the far-field pickup capability of the headset's far-field interactive accessories can be adjusted, so that the best direction recognition and noise reduction processing can be provided in different sports scenes”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Guo in view of Girard with the teaching of Zhang by adjusting system volume of the electronic device to the first volume value based on the classification result; or sending the classification result to user equipment, wherein the classification result is used to trigger the user equipment to adjust system volume of the user equipment to the first volume value in order to provide a real world application of scene recognition. Please note that simply appending well-understood, routine, conventional activities previously known to the industry, specified at a high level of generality is not an inventive concept.
Allowable Subject Matter
Claims 2-3 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Response to Arguments
Applicant's arguments filed 12/03/2025 have been fully considered but they are not persuasive. Applicant argues “Guo has not been shown to teach or suggest "training a first convolutional neural network by using a data set of the target images, and training a second convolutional neural network by using a data set of the third images to obtain a scene recognition model," particularly where "the scene recognition model comprises the trained first convolutional neural network and the trained second convolutional neural network" as recited in amended claim 1”. The examiner respectfully disagrees the above arguments.
Regarding claims 1, 5 and 10, and in response to applicant’s above argument, Guo discloses training a first convolutional neural network (Guo” Page 13, 2nd paragraph, “scene recognition model can be a deep neural network model … CNN”) by using a first training set (Guo: FIG. 2, step 206), and training a second convolutional neural network (Page 13, 2nd paragraph) by using a data set of the third image to obtain a scene recognition model (Guo: FIG. 2, step 206), wherein the scene recognition model comprises the trained first convolutional neural network and the trained second convolutional neural network (Guo: FIGS. 2, 4). See also page 3 in this office action and page 4 in Non-Final Rejection office action mailed 09/23/2025. As shown in Guo’s FIGS. 2-4, Guo’s scene recognition model consists of first model (Guo: FIGS. 3A, 4,  top branch) and second model (Guo: FIGS. 3A, 4, bottom branch) that are trained based on first  model loss value and second model loss value respectively (Guo: FIG. 2, steps 206-210). Guo further discloses that these two models can be implemented using convolutional neural networks (Guo: page 13, 2nd paragraph; page 15, third paragraph).
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. 

For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XIAO LIU/Primary Examiner, Art Unit 2664
Read full office action
Prosecution Timeline

Sep 19, 2023
Application Filed
Sep 18, 2025
Non-Final Rejection — §103
Dec 03, 2025
Response Filed
Jan 26, 2026
Final Rejection — §103
Apr 14, 2026
Response after Non-Final Action
Precedent Cases

Applications granted by this same examiner with similar technology

17/863,567
Patent 12603972
WIRELESS TRANSMITTER IDENTIFICATION IN VISUAL SCENES
2y 5m to grant Granted Apr 14, 2026
18/270,222
Patent 12592069
OBJECT RECOGNITION METHOD AND APPARATUS, AND DEVICE AND MEDIUM
2y 5m to grant Granted Mar 31, 2026
18/319,896
Patent 12579834
Information Extraction Method and Apparatus for Text With Layout
2y 5m to grant Granted Mar 17, 2026
18/324,644
Patent 12576873
SYSTEM AND METHOD OF CAPTIONS FOR TRIGGERS
2y 5m to grant Granted Mar 17, 2026
18/268,374
Patent 12573175
TARGET TRACKING METHOD, TARGET TRACKING SYSTEM AND ELECTRONIC DEVICE
2y 5m to grant Granted Mar 10, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
89%
Grant Probability
99%
With Interview (+11.5%)
2y 9m
Median Time to Grant
Moderate
PTA Risk
Based on 290 resolved cases by this examiner. Grant probability derived from career allow rate.