Last updated: April 19, 2026
Application No. 17/861,440
IMAGE CLASSIFIER WITH LESSER REQUIREMENT FOR LABELLED TRAINING DATA

Final Rejection §103
Filed
Jul 11, 2022
Examiner
BUDISALICH, ANDREW STEVEN
Art Unit
2662
Tech Center
2600 — Communications
Assignee
Robert Bosch GmbH
OA Round
4 (Final)
Interview Optional

— +8.9% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 46 resolved cases, 2023–2026
Examiner Intelligence

BUDISALICH, ANDREW STEVEN View full profile →
Grants 78% — above average
Career Allow Rate
36 granted / 46 resolved
+16.3% vs TC avg
Moderate +9% lift
Without
With
+8.9%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
35 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
14.5%
-25.5% vs TC avg
§103
65.6%
+25.6% vs TC avg
§102
5.2%
-34.8% vs TC avg
§112
13.0%
-27.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 46 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 07/15/2025 has been entered.

Status of Claims
Claims 1-20 are pending. Claims 18-20 are new.

Response to Arguments
Applicant’s arguments, see p.10-13, filed 02/06/2026, with respect to the rejections of Claims 9-12 under 35 U.S.C. 103 have been fully considered and are persuasive. Therefore, the rejections of claims 9-12 under this section of the Rules has been withdrawn. However, Applicant’s arguments, see p. 10-13, filed 02/06/2026, with respect to the rejections of claims 1-8 and 13-17 under 35 U.S.C. 103 have been fully considered, but they are not persuasive. Applicant argues that Pham discloses ground truth images that are themselves labelled, and therefore, the labels are not "ground truth values" as in the claims. Pham teaches negative and positive attribute labels which are used to label ground truth images. Examiner respectfully disagrees, and for further clarification, Pham, Paras. 21, 39, and 105-106, teaches generating a dataset of labeled ground truth images wherein negative attribute labels for the labeled ground truth images are generated with the positive attribute labels corresponding to the ground truth images to expand the negative attribute labels in which the predicted positive attributes are compared with the ground truth positive attributes to generate a positive loss and in which the predicted negative attributes are compared with ground truth negative attributes to generate a negative loss, i.e., provide training images that are labeled ground truth images which are labeled using positive and negative attribute labels which are defined as positive and negative ground truth labels respectively to determine a loss. Therefore, Pham explicitly teaches providing training images that are labelled with the ground truth values being the binary positive/negative attribute labels for training and determining a loss by comparing predicted positive/negative attributes with the ground truth positive/negative attributes. Examiner has considered applicants arguments with respect to the new claims 18-20. However, arguments are moot due to new claims being presented and are therefore being analyzed as presented below. Accordingly, THIS ACTION IS MADE FINAL.   

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function.  
Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph).  The presumption that 35 U.S.C. 112(f) (pre-AIA  35 U.S.C. 112, sixth paragraph) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 
Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke 35 U.S.C. 112(f) except as otherwise indicated in an Office action.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 

Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitations are: “an encoder network…”, “an object classification head network”, “an attribute classification head network”, and “an association unit” relevant to Claims 1-20.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
	If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


Claims 1-5 and 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over Pham in view of Maheshwari et al. (US 20220180572 A1, hereinafter Maheshwari).

Regarding claim 1, Pham teaches “"A method for training or pre-training an image classifier for classifying an input image with respect to combinations of an object value and an attribute value”; (Pham, Abstract and Para. 56, teach training an image classification network with respect to object-attribute pairs, i.e., combinations of object and attribute values);
“the image classifier including an encoder network configured to map the input image to a representation which includes multiple independent components”; (Pham, Para. 56, teaches a network that encodes features for the object including generating an image-object feature map for multiple attributes, i.e., encoder network to map an image including multiple components or attributes or features);
“an object classification head network configured to map the representation components of the input image to one or more of the objects”; (Pham, Para. 56, teaches a network that creates an image-object feature map which maps the features or attributes, i.e., components, of the image to the object. For further clarification, Pham, Para. 56, teaches the multi-attribute extraction system which incorporates the object-label embedding vector as an input in a feature composition module to generate the image-object feature map, i.e., a distinct mapping of input image representation components being the features mapped to object values in an image-object feature map);
	“an attribute classification head network that is configured to map the representation components of the input image to one or more of the attribute values, and an association unit configured to provide, to each classification head network, a linear combination of those of the representation components of the input image that are relevant for a classification task of the respective classification head network, the method comprising the following steps”; (Pham, Para. 56, teaches a network that creates an image-object feature map which maps the features, i.e., components, of an image to one or more of the multiple attributes wherein the network learns to associate object attribute pairs together, i.e., association unit for linear combination. For further clarification, Pham, Para. 56, teaches the multi-attribute contrastive classification neural network utilizing the object-label embedding vector as an input in the feature composition module to have the multi-attribute contrastive neural network learn to associate certain object-attribute pairs together, i.e., a separate distinct mapping from the object feature map wherein the representation components being the feature composition of the image is used as input to associate object-attribute pairs together as the mapping of the representation components to attribute values. Additionally, for further clarification, Pham, Para. 56, teaches utilizing the object-label embedding vector as an input to the feature composition to learn to associate certain object-attribute pairs together such as a ball always being round wherein the network predicts attributes for the object and focus on particular visual aspects, i.e., associate representation components being features that are relevant for classification of the object and determine which object is being represented);
	“providing, for each respective component of the representation, a factor classification head network that is configured to map the respective component to a predetermined basic factor of the input image"; (Pham, Paras. 39 and 41-42, teach a network that generates an image-object feature map and attribute feature maps wherein the attribute mapping is done to predetermined attribute types, i.e., predetermined basic factors, including but not limited to color, material, shape, texture, descriptor, label, or state);
"providing factor training images that are labelled with ground truth values with respect to the basic factors represented by the components"; (Pham, Para. 21, teaches the use of labeled ground truth images to train the neural network to determine positive and negative attribute labels wherein the attributes contain the aforementioned attribute type data, i.e., with respect to basic factors of components. For further clarification, Pham, Paras. 21, 39, and 105-106, teaches generating a dataset of labeled ground truth images wherein negative attribute labels for the labeled ground truth images are generated with the positive attribute labels corresponding to the ground truth images to expand the negative attribute labels in which the predicted positive attributes are compared with the ground truth positive attributes to generate a positive loss and in which the predicted negative attributes are compared with ground truth negative attributes to generate a negative loss, i.e., provide training images that are labeled ground truth images which are labeled using positive and negative attribute labels which are defined as positive and negative ground truth labels respectively to determine a loss);
"mapping, by the encoder network and the factor classification head networks, the factor training images to values of the basic factors"; (Pham, Paras. 56 and 91-92, teach a multi-attribute extraction system that embeds an image-object feature map that corresponds to a ground truth image having attributes wherein the attributes contain the aforementioned attribute type data, i.e., mapping the factor training images to values of the basic factors, wherein the mapping is done by the contrastive classification neural network which encodes features and predicts and classifies attributes, i.e., encoder and factor classification network);
However, Pham does not explicitly teach "rating deviations of the mapped values of the basic factors from the ground truth values using a first predetermined loss function; and optimizing parameters that characterize a behavior of the encoder network and parameters that characterize a behavior of the factor classification head networks towards the goal that, when further factor training images are processed, a rating by the first loss function is likely to improve”.
Maheshwari is in the same field of art of using differing attributes of objects for image classification. The combination of references of Pham in view of Maheshwari further teaches "rating deviations of the mapped values of the basic factors from the ground truth values using a first predetermined loss function"; (Maheshwari, Paras. 8, 53, and 106, teaches the use of a multi-task loss function, i.e., first predetermined loss function, to compare, i.e., rating deviations, attribute-object pair values to corresponding ground truth values wherein the attribute object pairs are mapped to images. For further clarification, Maheshwari, Paras. 53, 60, and 106, teaches using a multi-task loss function, i.e., first predetermined loss function, to compare a predicted color profile for each mapped attribute-object pair value to corresponding ground truth color profile, i.e., finding a deviation between a mapped value of the basic factors and the ground truth, wherein the color profile is used in a ranking component to determine a relevance score of the attribute-object pair, i.e., rating the deviation. Additionally, for further clarification, Maheshwari, Paras. 53, 60, and 106, teaches using a multi-task loss function, i.e., first predetermined loss function, to compare a predicted color profile for each mapped attribute-object pair value to corresponding ground truth color profile, i.e., finding a deviation between a mapped value of the basic factors and the ground truth, wherein the color profile is used in a ranking component to determine a relevance score of the attribute-object pair, i.e., rating the deviation using a color profile linked to the loss function);
"and optimizing parameters that characterize a behavior of the encoder network and parameters that characterize a behavior of the factor classification head networks towards the goal that, when further factor training images are processed, a rating by the first loss function is likely to improve"; (Maheshwari, Paras. 42, 53, and 196, teaches updating the parameters of the neural network based on the aforementioned comparison wherein the neural network includes the encoding of features and factor classification so that the ranking component will increase in relevance score through the feature vectors, i.e., goal of reducing loss or improving ranking by the loss function. For further clarification, Maheshwari, Abstract and Paras. 8, 42, and 125, teaches updating the parameters of the neural network based on the loss function wherein a ranking component is evaluated to minimize loss, i.e., optimize parameters in order to improve rating of loss function as further images are processed, wherein the neural network parameters include encoding of attribute-object pairs, encoded features, and object and attribute classifiers, i.e., parameters characterizing behavior of encoder network and classification network).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Pham by including the loss function and optimization of parameters to help improve the processing of images from the networks taught by Maheshwari. One of ordinary skill in the art would be motivated to combine the references since it allows for improved performance by training image classifiers on complex image attribute pairs that do not appear in training (Maheshwari, Paras. 3-4, teach the motivation of combination to be to provide a way to train image classifiers on the complex relationships of attribute/object pairs that are not seen commonly in the training data).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

	Regarding claim 2, the combination of references of Pham in view of Maheshwari further teaches "The method of claim 1, wherein the providing of the factor training images includes: applying, to at least one given starting image, image processing that impacts at least one basic factor, thereby producing a factor training image"; (Pham, Paras. 22, 54, and 58, teach a multi-attribute extraction system that uses an image embedding vector to create image-object feature maps from the ground truth images to an attribute embedding space that corresponding various attributes, i.e., applying image processing to the ground truth images to impact the attributes or basic factors, wherein the extraction system also uses the feature map to predict attributes of all types and it applies a filter to select attribute features);
"and determining the ground truth values with respect to the basic factors based on the applied image processing"; (Pham, Para. 22, teaches determining the multi-attribute loss by comparing objects along the dimensions for the attributes within the attribute embedding space wherein the attribute embedding space includes the ground truth values, i.e., determine ground truth compared to the attributes or basic factors based on the image processing).

Regarding claim 3, the combination of references of Pham in view of Maheshwari further teaches "The method of claim 1, wherein, in each factor training image, each basic factor takes a particular value, and the factor training images include at least one factor training image for each combination of values of the basic factors"; (Pham, Paras. 20-21, teach a multi-attribute extraction system that generates labeled ground truth images, i.e., factor training images, wherein the extraction system determines positive and/or negative attribute labels, i.e., each attribute or basic factor takes a particular value, wherein the images comprise various combinations of these attribute labels with object, attribute, and multi-attention feature vectors).

Regarding claim 4, the combination of references of Pham in view of Maheshwari further teaches "The method of claim 1, further comprising: providing classification training images that are labelled with ground truth combinations of object values and attribute values"; (Pham, Paras. 20-21, teach the use of labeled ground truth images that include combinations of positive and/or negative attribute values and object, attribute, and multi-attention feature vectors);
"mapping, by the encoder network, the object classification head network and the attribute classification head network, the classification training images to combinations of object values and attribute values"; (Pham, Paras. 56 and 91-92, teach a multi-attribute extraction system that embeds an image-object feature map that corresponds to a ground truth image wherein the feature map includes object-attribute combinations, i.e., mapping the training images to combinations or pairs of objects and attribute values, wherein the mapping is done by the contrastive classification neural network which encodes features and predicts and classifies attributes with their objects, i.e., encoder, object, and attribute network);
"rating deviations of the mapped combinations of object values and attribute values from the respective ground truth combinations using a second predetermined loss function"; (Maheshwari, Paras. 130-132, teaches the use of comparing the mappings of combinations of attribute-object pairs to corresponding ground truth values using multiple loss functions, i.e., using at least a second predetermined loss function to rate deviation); 
"and optimizing at least parameters that characterize a behavior of the object classification head network and parameters that characterize a behavior of the attribute classification head network towards the goal that, when further classification training images are processed, the rating by the second loss function is likely to improve"; (Maheshwari, Paras. 42-43, teach adjusting weights and updating parameters of the network, wherein the network includes behavior of object classification and attribute classification, so that when further training done after each iteration the accuracy of the result from each loss function, including the second loss function, will improve).
The proposed combination as well as the motivation for combining the Pham and Maheshwari references presented in the rejection of Claim 1, applies to claim 4. Thus, the method recited in claim 4 is met by Pham and Maheshwari.

Regarding claim 5, the combination of references of Pham in view of Maheshwari further teaches "The method of claim 4, wherein combinations of one encoder network on the one hand and multiple different combinations of an object classification head network and an attribute classification head network on the other hand are trained based on the same training of the encoder network with factor training images"; (Pham, Abstract and Paras. 42, 56 and 90, teach a multi-attribute contrastive classification neural network that comprises all the networks which encodes features for objects, i.e., encoder network, and generates image-object feature maps and attribute feature maps, i.e., combinations of object and attribute classification, wherein the same loss functions, including the training ground truth images, are used to train all the networks, i.e., encoder network and object/attribute network combinations use the same training).

Claim 13 recites a computer-readable storage medium storing a program with instructions corresponding to the steps recited in Claim 1.  Therefore, the recited programming instructions of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Pham and Maheshwari references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of Pham and Maheshwari references discloses a computer readable storage medium (for example, see Pham, Paragraph 132).

Claim 14 recites one or more computers with elements corresponding to the steps recited in Claim 1. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Pham and Maheshwari references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of Pham and Maheshwari references discloses a computer (for example, Pham, Paragraph 30).

Regarding Claim 15, the combination of references of Pham in view of Maheshwari teaches "The method of claim 1, wherein the mapping comprises: mapping, by the encoder network, the factor training images to the representation";(Pham, Para. 56, teaches the classification neural network which encodes features, i.e., encoder network, generates an image-object feature map, i.e., mapping the images to a representation);
"passing each component of the representation on to the respective factor classification head network";(Pham, Paras. 56 and 58, teaches inputting each object into the multi-attribute contrastive classification neural network to focus on particular visual aspects of the object, i.e., passing each component of representation map to the classification network);
"and outputting, by the respective factor classification head network, the respective values of the basic factors";(Pham, Para. 58, teaches outputting attributes for each portrayed object by utilizing the identified object-image labels with the classification network, i.e., outputting the basic factors using the classification network).

Claim 16 recites a computer-readable storage medium storing a program with instructions corresponding to the steps recited in Claim 15.  Therefore, the recited programming instructions of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Pham and Maheshwari references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of Pham and Maheshwari references discloses a computer readable storage medium (for example, see Pham, Paragraph 132).

Claim 17 recites one or more computers with elements corresponding to the steps recited in Claim 15. Therefore, the recited elements of this claim are mapped to the proposed combination in the same manner as the corresponding steps in its corresponding method claim.  Additionally, the rationale and motivation to combine the Pham and Maheshwari references, presented in rejection of Claim 1, apply to this claim.  Finally, the combination of Pham and Maheshwari references discloses a computer (for example, Pham, Paragraph 30).

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Pham in view of Maheshwari and Wang (US 20230030419 A1, hereinafter Wang).

Regarding claim 6, the combination of references of Pham in view of Maheshwari further teaches "The method of claim 4,
"and the parameters that characterize behaviors of all networks are optimized with a goal of improving a value of the  (Maheshwari, Paras. 42-43, teach adjusting weights and updating parameters of the network, wherein the network includes behavior of object classification and attribute classification, so that when further training done after each iteration the accuracy of the result from the loss function will improve).
However, the combination of references of Pham in view of Maheshwari does not explicitly teach "The method of claim 4, wherein: a combined loss function is formed as a weighted sum of the first loss function and the second loss function". Furthermore, as seen above, Maheshwari optimizes the parameters of the networks to improve the loss function, but it does not explicitly specify a combined loss function.
Wang is in the same field of art of image classification using loss functions. The combination of references of Pham in view of Maheshwari and Wang further teaches "The method of claim 4, wherein: a combined loss function is formed as a weighted sum of the first loss function and the second loss function"; (Wang, Para. 79, teaches training a classification model using a weighted sum of the first loss function and the second loss function);
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Pham and Maheshwari by including the combination of a first and second loss function taught by Wang-1. It would be obvious to one of ordinary skill in the art to combine the weighted sum combination loss function of Wang with the network parameter optimization of Maheshwari. One of ordinary skill in the art would be motivated to combine the references since processing speed and accuracy will increase (Wang, Para. 47, teaches the motivation of combination to be to improve processing speed and accuracy as well as achieving fine processing granularity).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Pham in view of Maheshwari and Wong et al. (US 20210370955 A1, hereinafter Wong).

Regarding claim 7, the combination of references of Pham in view of Maheshwari does not explicitly teach "The method of claim 4, wherein the classification training images include images of road traffic situations".
Wong is in the same field of art of image classification of traffic situations. The combination of references of Pham in view of Maheshwari and Wong further teaches "The method of claim 4, wherein the classification training images include images of road traffic situations"; (Wong, Para. 21, teaches training a situation classification model using image data of traffic and road conditions).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Pham and Maheshwari by including the road traffic classification images taught by Wong. One of ordinary skill in the art would be motivated to combine the references since it will aid in the improvement of driver safety and automation of vehicles (Wong, Para. 2, teaches the motivation of combination to be to improve driver safety and autonomous vehicle capability).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Pham in view of Maheshwari, Wong, and Tan et al. (US 11460857 B1, hereinafter Tan).

Regarding claim 8, the combination of references of Pham in view of Maheshwari and Wong does not explicitly teach "The method of claim 7, wherein the basic factors that correspond to the components of the representation include one or more of: a time of day in which the input image is acquired; lighting conditions in which the input image is acquired; a season of a year in which the input image is acquired; and weather conditions in which the input image is acquired”.
Tan is in the same field of art of object attribute determination with image classification of a vehicle. The combination of references of Pham in view of Maheshwari, Wong, and Tan further teaches "The method of claim 7, wherein the basic factors that correspond to the components of the representation include one or more of: a time of day in which the input image is acquired"; (Tan, Col. 12 lines 8-37, teach a perception component performing image classification based on environmental features which include a time of day);
"lighting conditions in which the input image is acquired"; (Tan, Col. 12 lines 8-37, teach a perception component performing image classification based on environmental features which include an indication of darkness/light, i.e., lighting conditions);
"a season of a year in which the input image is acquired"; (Tan, Col. 12 lines 8-37, teach a perception component performing image classification based on environmental features which include a season);
"and weather conditions in which the input image is acquired"; (Tan, Col. 12 lines 8-37, teach a perception component performing image classification based on environmental features which include a weather condition).
It would have been obvious to one having ordinary skill in the art before the effective filing date to modify the invention of Pham, Maheshwari, and Wong by including the feature or attribute types of time, lighting, season, and weather taught by Tan. One of ordinary skill in the art would be motivated to combine the references since these attribute types will aid in the improvement of vehicle safety and navigation through the environment (Tan, Col. 1 lines 46-61, teaches the motivation of combination to be use the attributes determined by the model to improve vehicle safety for proper vehicle navigation through the environment).
Thus, the claimed subject matter would have been obvious to a person having ordinary skill in the art before the effective filing date.

Allowable Subject Matter
Claims 9-12 are allowed. Claims 18-20 are objected to as being dependent upon a rejected base claim but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. The following is the examiner’s stated reason for indication of allowable subject matter: Examiner has reviewed Applicant’s arguments and the amended claims filed with the Office on February 06, 2026. Regarding Claim 9, Pham teaches "An image classifier for classifying an input image with respect to combinations of an object value and an attribute value, comprising: an encoder network configured to map the input image to a representation";(Pham, Abstract and Para. 56, teach an image classifier using an image classification network with respect to object-attribute pairs, i.e., combinations of object and attribute values, and a network that encodes features for the object including generating an image-object feature map for multiple attributes, i.e., encoder network to map an image including multiple components or attributes or features).                                                                                                                                                                                  
	In an analogous field of endeavor, Charraud et al. (US 20210407162 A1) teaches "the representation including multiple independent components such that an output of the encoder network includes a plurality of representation components"; (Charraud, Para. 38, teaches the trained artificial neural network wherein a first sub-network takes as input a digital image with undefined or unknown lighting condition and performs encoding to create a feature map representation of color information, lighting information, or the like, i.e., output of encoder includes a representation of multiple independent components including a plurality of representation components being the color and lighting information).
	In an analogous field of endeavor, Jun et al. (KR 20220013881 A), Pg. 2, teaches a unified network lower layer and an upper layer composed of a plurality of classification networks separated by the attributes and the plurality of classification networks include a coarse feature having a resolution lower than a specific resolution for each attribute and a high-resolution feature having a resolution higher than or equal to the specific resolution, i.e., attribute specific classification networks which use coarse features and high resolution features, as opposed to explicitly teaching a distinct object classification network supplied with a distinct first set of representation components which map the representation components to object values and a distinct attribute classification network which is supplied with a distinct second set of representation components and map the representation components to the attribute values. Therefore, Jun does not explicitly teach "an object classification head network supplied with a first subset of the representation components and configured to map the representation components of the first subset to one or more object values; an attribute classification head network supplied with a second subset of the representation components and configured to map the representation components of the second subset to one or more attribute values; the first subset and the second subset being mutually exclusive with respect to each other".
	Furthermore, Pham teaches "and an association unit configured to provide, to each respective classification head network, a linear combination of those of the representation components of the input image that are relevant for a classification task of the respective classification head network ";(Pham, Para. 56, teaches a network that creates an image-object feature map which maps the features, i.e., components, of an image to one or more of the multiple attributes wherein the network learns to associate object attribute pairs together, i.e., association unit for linear combination).      
	In an analogous field of endeavor, Minderer et al. (“Automatic shortcut removal for self-supervised representation learning. In International Conference on Machine Learning” (pp. 6927-6937). PMLR.) teaches "wherein a first linear combination provided to the object classification head network withholds those of the representation components on which the attribute classification head network relies"; (Minderer, Section 4.2.5. Results, teaches using shortcut removal to shift networks to using more shape information by utilizing images with conflicting texture and shape information wherein the textures are the low-level features, i.e., linear combination of object and attribute or feature to withhold representation components the attribute classification head network would rely on for a shortcut in order to lead to more semantically meaningful representations);
"and wherein a second linear combination provided to the attribute classification head network withholds those of the representation components on which the object classification head network relies";(Minderer, Section 3.2. Automatic adversarial shortcut removal and Figure 3, teaches a lens network which modifies the inputs and maps them back to the input space before feeding to a representation network for preventing and removing shortcuts wherein color channels are shifted, i.e., second linear combination to the attribute classification to withhold representation components being the features on which the object classification relies upon).
Therefore, none of the cited prior art references alone or in combination teach the
ordered combination of limitations of "an object classification head network supplied with a first subset of the representation components and configured to map the representation components of the first subset to one or more object values; an attribute classification head network supplied with a second subset of the representation components and configured to map the representation components of the second subset to one or more attribute values; the first subset and the second subset being mutually exclusive with respect to each other" with the rest of the claim limitations. Claims 10-12 are dependent upon Claim 9 and therefore contain the above indicated allowable subject matter. Claim 18 is a method claim dependent upon Claim 1, Claim 19 is a non-transitory storage medium dependent upon Claim 13, and Claim 20 is a computer dependent upon claim 14. Claims 18-20 contain the amendments to Claim 9 including the above indicated allowable subject matter. Therefore, Claims 18-20 contain the above indicated allowable subject matter.

	Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW STEVEN BUDISALICH whose telephone number is (703)756-5568. The examiner can normally be reached Monday - Friday 8:30am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Amandeep Saini can be reached on (571) 272-3382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx
for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANDREW S BUDISALICH/Examiner, Art Unit 2662
/AMANDEEP SAINI/Supervisory Patent Examiner, Art Unit 2662
Read full office action
Prosecution Timeline

Jul 11, 2022
Application Filed
Oct 15, 2024
Non-Final Rejection — §103
Jan 22, 2025
Response Filed
Feb 10, 2025
Final Rejection — §103
Jun 17, 2025
Response after Non-Final Action
Jul 15, 2025
Request for Continued Examination
Jul 17, 2025
Response after Non-Final Action
Jul 29, 2025
Non-Final Rejection — §103
Feb 06, 2026
Response Filed
Mar 12, 2026
Final Rejection — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/342,892
Patent 12602820
METHOD AND APPARATUS WITH ATTENTION-BASED OBJECT ANALYSIS
2y 5m to grant Granted Apr 14, 2026
18/038,197
Patent 12597106
METHOD AND APPARATUS FOR IDENTIFYING DEFECT GRADE OF BAD PICTURE, AND STORAGE MEDIUM
2y 5m to grant Granted Apr 07, 2026
18/215,428
Patent 12592078
VIDEO MONITORING DEVICE, VIDEO MONITORING SYSTEM, VIDEO MONITORING METHOD, AND STORAGE MEDIUM STORING VIDEO MONITORING PROGRAM
2y 5m to grant Granted Mar 31, 2026
18/333,890
Patent 12586232
METHOD FOR OBJECT DETECTION USING CROPPED IMAGES
2y 5m to grant Granted Mar 24, 2026
17/954,417
Patent 12567151
Microscopy System and Method for Instance Segmentation
2y 5m to grant Granted Mar 03, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

5-6
Expected OA Rounds
78%
Grant Probability
87%
With Interview (+8.9%)
2y 9m
Median Time to Grant
High
PTA Risk
Based on 46 resolved cases by this examiner. Grant probability derived from career allow rate.