DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment/Arguments
1. Applicant’s arguments filed on November 10, 2025 with respect to the rejection
under 35 U.S.C. §103 have been fully considered but are not persuasive. With respect to claims 1 and 17, Applicant argues that Liu does not teach (i) a feature extractor outputting a plurality of feature vectors in response to receiving a plurality of images, wherein each feature vector is mapping from raw data of one of the plurality of images, (ii) a classifier generating a plurality of predictions based on the plurality of feature vectors, and (iii) transmitting first knowledge data including information about a plurality of predictions to a server. Applicant further argues that Chen does not cure these alleged deficiencies.
Liu discloses that local sample data is input to a model at the user and that “the user side records the output results (namely the results of the softmax layer) of all local sample data passing through the model.” The output of the softmax layer represents a probability distribution across classification categories and therefore constitutes prediction results generated by the model.
Liu further teaches that the output results of the model are grouped by category and that the “average value in each group” is calculated as the characteristic vector of the category. Accordingly, Liu teaches deriving category-level features vectors from the output results of the model, the transmitted knowledge data includes the information about both feature vectors and predictions.
With respect to Applicant’s argument that the claims require feature vectors mapping from raw image data, Chen expressly discloses a feature extraction section in which convolutional layers receive raw image data and apply convolutional kernels to generate feature maps and feature vectors. Chen further discloses a classification section including a softmax function that converts feature vectors into probability values, thereby illustrating a classifier generating predictions in response to receiving feature vectors.
With respect to amended claim 9, Applicant’s argument that Liu does not teach transmitting first knowledge data including information about a plurality of feature vectors and information about a plurality of soft predictions to a server is not persuasive. While Liu discloses that the local data sample includes hard tags, Liu further expressly teaches that, after initial training, “the user side records the output results (namely the results of the softmax layer) of all local sample data passing through the model,” groups the results by category, and calculates the average value in each group as the characteristic vector of the category. The result of a softmax layer represent probability distributions across classification categories and therefore constitute soft predictions generated at the user terminal.
Liu further teaches that the user side transmits the category characteristics of each category to the server. Because these category characteristics are calculated from averaged softmax-layer outputs, the information transmitted to the server includes information about a plurality of feature vectors and information about a plurality of soft predictions, as recited in claim 9. Applicant’s characterization that Liu only relies on hard-labeled information at the user terminal is not persuasive, as the hard tags are used for initial training, while the uploaded category characteristics are derived from the model’s softmax outputs.
Applicant’s characterization of Chen as being limited to vehicle-related features is not persuasive. Chen discloses a federated learning system employing a neural network including a feature extraction section and a classification section, and further teaches that such a system may be implemented in a vehicle. Accordingly, Chen is directed to the same general field of distributed model training and is analogous to Liu. Chen therefore provides complementary teachings regarding deployment of a federated learning model in a vehicular environment, and its combination with Liu represents a predictable use of known distributed learning techniques.
Applicant’s arguments with respect to claims 2, 12, and 18 have been fully considered but are not persuasive. Applicant contends that Liu does not teach transmitting information in which both the plurality of feature vectors and the plurality of predictions are averaged prior to transmission to the server. This argument is not persuasive because it mischaracterizes how Liu defines and generates the transmitted category characteristics.
Liu expressly teaches that, after local training, the user side records the output results of the model, groups the results according to category, and “calculates the average value in each group characteristic vector of the category.” The output results recorded by the user side are the results of the softmax layer, which represent prediction outputs for the local samples. Thus, the averaging operation described by Liu is performed on the prediction results associated with each category, and the resulting averaged values from the transmitted category feature vectors.
Accordingly, Liu’s transmitted category characteristics inherently represents averaged feature vectors derived from averaged prediction outputs. The claims do not require that separate averaging operations be performed independently on prediction values and feature vectors, nor do the claims require such averages be labeled or stored as distinct data structures prior to transmission. Under the broadest reasonable interpretation, Liu’s disclosure of averaging model output results by category and transmitting the resulting characteristic vectors satisfies the recited limitations of claims 2, 12, and 18.
Applicant’s arguments have been fully considered but are not persuasive for the reasons set forth above. Accordingly, the rejections under 35 U.S.C. §103.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations in claim 10 are:
a feature extractor configured to output the plurality of feature vectors in response to receiving a plurality of images;
a classifier configured to output the plurality of predictions in response to receiving the plurality of feature vectors.
The feature extractor and classifier are generic placeholders without definite structure, and are described in purely functional terms using “configured to” language rather than reciting how the functions are performed. As a result, they are interpreted as means-plus-function limitations under 35 U.S.C 112(f).
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5, 7-14, and 16-19 are rejected under the 35 U.S.C. 103 as being unpatentable over Liu, (Pub No.: CN114626550A (Published: 06/14/2022)) in view of Chen et al., (Pub. No.: US 20240086699 A1 (Filed: 9/9/2022)).
Regarding claim 1, Liu teaches the following limitations:
a feature extractor outputting a plurality of feature vectors in response to receiving a plurality of images (Liu teaches generating “category feature vectors” from local sample data using a “feature uploading module.” While it does not use the terms “feature extractor” or “plurality of images,” it is inherent in the description that the system is extracting features from a plurality of images, as classification tasks involving “category feature vectors” typically rely on image inputs. The use of the term “acquiring” supports that the module receives input and produces feature vectors in response. Therefore, the reference inherently teaches a feature extractor outputting a plurality of feature vectors in response to receiving a plurality of images.)
a classifier outputting a plurality of predictions in response to receiving the plurality of feature vectors (Liu teaches that the “model” at the user side processes local sample data and generates “output results”, which are later grouped by category to form “category feature vectors.” The “output results” are described as the results of local sample data passing through the model and are used to represent each sample’s classification. It is therefore clear that the “output results” correspond to predictions made by the model. While the term “classifier” is not explicitly used, it is inherent that the model performs classification by generation a plurality of predictions in response to the feature vectors. Accordingly, the reference inherently teaches a classifier outputting a plurality of predictions in response to receiving the plurality of feature vectors.)
transmit first knowledge data including information about the plurality of feature vectors and information about the plurality of predictions to a server (Liu teaches that a “feature uploading module” transmits “class feature vectors” and corresponding weights from the user side to the server. These vectors are obtained by grouping the “output results” (predictions) of local sample data after model training. The “output results” are used to determine the class associated with each data sample based on its “tag.” Since the “class feature vectors” are derived from outputs, the data being transmitted includes both feature information and the associated classification information. Therefore, the reference teaches transmitting first knowledge data including information about the plurality of feature vectors and information about the plurality of predictions to a server.);
receive first aggregated knowledge from the server (Liu teaches that each user side (client) transmits “class feature vectors” and weights to the server using a “feature uploading module.” These vectors are generated from “output results” of local model processing and reflect both feature and prediction information, forming the first knowledge data. At the server side, a “labeling issuing module” aggregates this data from multiple clients by grouping vectors by category and computed a weighted average. The result is referred to as the “global class features” or “soft labels,” which are issued back to the clients. Because this global information is produced by aggregating the original class feature vectors (i.e., the first knowledge data), the reference teaches receiving first aggregated knowledge from the server.);
train a local model including the feature extractor and the classifier based on the first aggregated knowledge (Liu teaches that each user side continues training its local model using both local data and the “soft label” received from the server. The local model includes the same components used during initial training – namely, the model that generates “output results” used to form “class feature vectors.” Since the “soft label” is produced by aggregating the uploaded feature and classification information, it represents the first aggregated knowledge. The local training uses this aggregated knowledge to guide further optimization of the same model, which inherently includes the components responsible for feature extraction and classification. Therefore, the reference teaches training a local model including the feature extractor and classifier based on the first aggregated knowledge.”).
However, Liu does not teach but Liu in view of Chen teaches the following limitations:
A vehicle comprising (Chen, paragraph [0065] mentions “ FIG. 5 is a high-level block diagram illustrating an example system 500 for hardware-aware federated learning… The system 500 also includes multiple end devices 504a-z…The end devices 504a-z may each comprise… an electric vehicle”):
a controller programmed to (Chen, paragraph [0106] mentions “Aspect 14: An apparatus comprising: a memory; and at least one processor coupled to the memory, the at least one processor configured: to receive, from a server, information corresponding to a first jointly-trained artificial neural network (ANN); to determine a current hardware capability of a device for on-device training of the first jointly-trained ANN; to transmit, to the server, an indication of the current hardware capability; and to receive, from the server, responsive to the transmitted indication, information corresponding to a second jointly-trained ANN),”):
wherein each feature vector is mapping from raw data of one of the plurality of images (Chen, paragraph [0042] “Upon receiving the image 226, a convolutional layer 232 may apply convolutional kernels (not shown) to the image 226 to generate a first set of feature maps 218…” [0044] “the second set of feature maps 220 is convolved to generate a first feature vector 224. Furthermore, the first feature vector 224 is further convolved to generate a second feature vector 228.” – under the broadest reasonable interpretation, the recited “mapping from raw data” encompasses any transformation that converts image input data into a feature representation. Chen expressly teaches receiving image data and processing that image data through convolutional layers to generate feature vectors. Accordingly, each feature vector is a mapping from raw image data of one of the plurality of images.);
wherein the classifier generates the plurality of predictions based on the plurality of feature vectors (Chen, paragraph [0042] “The DCN 200 may include a feature extraction section and a classification section…” [0044] “the second set of feature maps 220 is convolved to generate a first feature vector 224. Furthermore, the first feature vector 224 is further convolved to generate a second feature vector 228. Each feature of the second feature vector 228 may include a number that corresponds to a possible feature of the image 226, such as “sign,” “60,” and “100.” A softmax function (not shown) may convert the numbers in the second feature vector 228 to a probability. As such, an output 222 of the DCN 200 is a probability of the image 226 including one or more features.” – under BRI, a classifier generates predictions by operating on feature vectors to produce output probabilities. Chen expressly teaches a classification section that receives feature vectors and applies a softmax function to generate probability outputs. Accordingly, the classifier generates a plurality of predictions based on the plurality of feature vectors.);
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having a combination of Liu and Chen before them, to implement the client as a vehicle comprising a controller, as taught by Chen. One would have been motivated to make such a combination in order to apply the federated learning framework to real-world vehicular environments, where each vehicle’s controller enables participation in distributed training based on local capabilities.
Regarding claim 2, Liu in view of Chen, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Lui in view of Chen further teachers:
wherein the information about the plurality of feature vectors includes an average of the plurality of feature vectors and the information about the plurality of predictions includes an average of the plurality of the predictions (Liu explains that the “feature uploading module” processes local model output by grouping the results according to their tag (i.e., class) and then “calculates the average value in each group as the characteristic vector of the category.” This means that for each class, the output vectors from multiple samples are averaged to form a representative feature vector – effectively an average of the plurality of feature vectors. While predictions are not explicitly mentioned, grouping the outputs by tag implies that classification has occurred, and the average per class reflects the model’s prediction behavior. Thus, the reference teaches including both an average of the feature vectors and an average representation of the predictions.”)
Regarding claim 3, Liu in view of Chen, as outlined above, all the elements of claim 2, therefore is rejected for the same reasons as those presented for claim 2, mutatis mutandis, Lui in view of Chen further teachers:
wherein the first knowledge data includes mapping between the average of the plurality of feature vectors and the average of the plurality of predictions (Liu describes that after local training the “feature uploading module” obtains output results from local sample data, groups them by tag, and then “calculates the average value in each group as the characteristic vector of the category.” These average vectors – which represent class-specific feature patterns – are uploaded to the server along with associated class information. This process inherently maps average category feature vector to its corresponding class tag (predicted label), forming a mapping between the average feature vectors and the average of prediction label for each class. Thus, the reference teaches first knowledge data that includes such a mapping.).
Regarding claim 4, Liu in view of Chen, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Lui in view of Chen further teachers:
wherein the plurality of predictions are prediction vectors, and each of the prediction vectors includes probabilities of classifications of objects (Liu describes that the local model generates “output results” for each data sample, which are grouped by tag and averaged to form class-level feature vectors. Although the term “prediction vector” is not explicitly used, the context implies that these “output results” represent classification scores used to determine the tag of each sample. In machine learning, such “output results” typically refers to vectors containing per-class probability values (e.g., soft labels) – meaning each output reflects the model’s prediction confidence across multiple classes. Therefore, it is inherent that the plurality of predictions are prediction vectors, and that each prediction vector includes probabilities of classifications of objects.)
Regarding claim 5, Liu in view of Chen, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Lui in view of Chen further teachers:
wherein the local model is a machine learning model for classifying objects, and a size of the first knowledge data is smaller than a size of the local model (Liu describes the “local model” trained using local sample data, where “output results” are grouped by tag and averaged to form class feature vectors. This confirms that the local model performs a classification function. Additionally, the system uploads only the averaged category feature vectors and corresponding tag information – not the entire model – as part of the first knowledge data. Therefore, it is inherent that the local model is a machine learning model for classifying objects, and that the size of the first knowledge (i.e., aggregated feature vectors and tags) is smaller than the size of the local model.).
Regarding claim 7, Liu in view of Chen, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Lui in view of Chen further teachers:
extract second knowledge data based on the trained model and local data (Liu teaches extracting knowledge data based on the trained model and local data. Specifically, it discloses that, after training the model using local sample data with hard tags, the “characteristics uploading module” is used to classify the initial training result at the user side, and acquire the class characteristics of each class. These class characteristics (or “class feature vectors”) are derived from both the trained model (the result of local training) and the local data (sample data with tags). Thus, the extracted class feature vectors constitute the second knowledge data, which is produced by applying the trained model to the user’s local data.);
transmit the second knowledge data to the server (Liu teaches transmitting the second data to the server. Specifically, it states that the feature uploading module is configured to transmit the class characteristics of each class – derived from local training – directly to the server.)
receive second aggregated knowledge from the server (Liu teaches receiving second aggregated knowledge from the server. Specifically, the “soft label issuing module” is described as issuing the soft label – i.e., the global category characteristics aggregated from multiple user terminals – to each user terminal. This constitutes the second aggregated knowledge being received by the local model.)
train the trained local model further based on the second aggregated knowledge (Liu teaches that the “training module” at each user end “further trains the model to converge according to the soft label and the hard label. As established, this “model” is the local model that has already undergone “initial training”. The “soft label” received from the server corresponds to the “second aggregated knowledge”. Therefore, Lui teaches training the already trained local model further based on the aggregated knowledge.).
Regarding claim 8, Liu in view of Chen, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Lui in view of Chen further teachers:
an imaging sensor configured to capture the plurality of images (Chen, paragraph [0041] mentions “a DCN 200 designed to recognize visual features from an image 226 input from an image capturing device 230, such as a car-mounted camera. The DCN 200 of the current example may be trained to identify traffic signs and a number provided on the traffic sign. Of course, the DCN 200 may be trained for other tasks, such as identifying lane markings or identifying traffic lights.”).
It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, having a combination of Liu and Chen before them, to incorporate the car-mount image capturing device disclosed in Chen into the federated learning system of Liu. One would have been motivated to do so in order to enable client-side collection of real-world image data (e.g., traffic signs, lane markings) for use in local training of the machine learning model, thus allowing a more accurate and context-specific feature extraction across heterogeneous clients. The “image capturing device” (e.g., car-mounted camera) in Chen would have been understood by a skilled artisan including an image sensor (such as a CMOS sensor) configured to capture a plurality of images.
Regarding claim 9, Liu teaches the following limitations:
A system for training a model, the system comprising: a server (Liu discloses a distributed model collaborative training system including a server);
transmit first knowledge data including information about a plurality of feature vectors and information about a plurality of soft predictions to a server (Liu teaches “The feature uploading module 506 groups all the output results of the local sample data passing through the model (i.e. the results of the softmax layer) by category… Thus, the feature uploading module 506 obtains class features for the respective classes, i.e., feature vectors and weights for each class. The feature upload module 506 transmits the category features of each category to the server.” – the reference teaches that the local model generates “results from a softmax layer” for local sample data. A softmax layer converts model outputs into a probability distribution across multiple classes, where each output value represents a confidence level associated with a class. Accordingly, the results of the softmax layer constitutes soft predictions, rather than hard class labels. Because the category feature vectors transmitted to the server are obtained by grouping and averaging these softmax outputs, the transmitted knowledge data necessarily includes information about a plurality of soft predictions. Therefore, Liu teaches transmitting first knowledge data including information about a plurality of feature vectors and information about a plurality of soft predictions to a server.)
receive first aggregated knowledge from the server (Liu teaches that each user side (client) transmits “class feature vectors” and weights to the server using a “feature uploading module.” These vectors are generated from “output results” of local model processing and reflect both feature and prediction information, forming the first knowledge data. At the server side, a “labeling issuing module” aggregates this data from multiple clients by grouping vectors by category and computed a weighted average. The result is referred to as the “global class features” or “soft labels,” which are issued back to the clients. Because this global information is produced by aggregating the original class feature vectors (i.e., the first knowledge data), the reference teaches receiving first aggregated knowledge from the server.);
train a local model based on the first aggregated knowledge (Liu teaches that each user side continues training its local model using both local data and the “soft label” received from the server. The local model includes the same components used during initial training – namely, the model that generates “output results” used to form “class feature vectors.” Since the “soft label” is produced by aggregating the uploaded feature and classification information, it represents the first aggregated knowledge. The local training uses this aggregated knowledge to guide further optimization of the same model, which inherently includes the components responsible for feature extraction and classification. Therefore, the reference teaches training a local model including the feature extractor and classifier based on the first aggregated knowledge.”),
However, Liu does not teach but Liu in view of Chen teaches the limitations:
wherein the server averages the first knowledge data received from the plurality of vehicles to generate the aggregated knowledge (Liu discloses that the “server side aggregates the category characteristics vectors and the category weights of the user sides 1, 2 and other user sides, carries out weighted average on the category characteristic vectors by using category weights, calculates the global category characteristics of the category across user sides, and the sends the global category characteristics as the soft label of the category to the user sides” Chen, paragraph [0065] mentions “an example system 500 for hardware-aware federated learning… The system 500 also includes multiple end devices 504a-z. The end devices 504a-z may each comprise… an electric vehicle” – plurality of end devices may be multiple vehicles.).
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having a combination of Liu and Chen before them, to implement the client as a vehicle comprising a controller, as taught by Chen. One would have been motivated to make such a combination in order to apply the federated learning framework to real-world vehicular environments, where each vehicle’s controller enables participation in distributed training based on local capabilities.
Regarding claim 10, Liu in view of Chen, as outlined above, all the elements of claim 9, therefore is rejected for the same reasons as those presented for claim 9, mutatis mutandis, Lui in view of Chen further teachers:
a feature extractor configured to output the plurality of feature vectors in response to receiving a plurality of images (Chen, paragraph [0042] mentions “The DCN 200 may be trained with supervised learning. During training, the DCN 200 may be presented with an image, such as the image 226 of a speed limit sign, and a forward pass may then be computed to produce an output 222. The DCN 200 may include a feature extraction section… Upon receiving the image 226, a convolutional layer 232 may apply convolutional kernels (not shown) to the image 226 to generate a first set of feature maps 218.” Chen paragraph [0043-0044] further mentions “The first set of feature maps 218 may be subsampled by a max pooling layer (not shown) to generate a second set of feature maps 220…In the example of FIG. 2D, the second set of feature maps 220 is convolved to generate a first feature vector 224. Furthermore, the first feature vector 224 is further convolved to generate a second feature vector 228”);
a classifier configured to output the plurality of soft predictions in response to receiving the plurality of feature vectors (Chen paragraph [0042] mentions “The DCN 200 may include...a classification section… Upon receiving the image 226, a convolutional layer 232 may apply convolutional kernels (not shown) to the image 226 to generate a first set of feature maps” Chen, paragraph [0044] further mentions “In the example of FIG. 2D, the second set of feature maps 220 is convolved to generate a first feature vector 224. Furthermore, the first feature vector 224 is further convolved to generate a second feature vector 228. Each feature of the second feature vector 228 may include a number that corresponds to a possible feature of the image 226, such as “sign,” “60,” and “100.” A softmax function (not shown) may convert the numbers in the second feature vector 228 to a probability. As such, an output 222 of the DCN 200 is a probability of the image 226 including one or more features.” – the classifier of Chen receives feature vectors and applies a softmax function to generation probability values corresponding to multiple classes. Probability outputs constitute soft predictions, as they represent confidence levels across classes rather than single hard classification. Accordingly, Chen teaches a classifier configured to output a plurality of soft predictions in response to receiving the plurality of feature vectors.).
Regarding claim 11, Lui in view of Chen, as outlined above, all the elements of claim 9, therefore is rejected for the same reasons as those presented for claim 9. Liu in view of Chen further teaches:
wherein the information about the plurality of feature vectors includes an average of the plurality of feature vectors and the information about the plurality of soft predictions includes an average of the plurality of soft predictions (Liu teaches “And the user side records the output results (namely the results of the softmax layer) of all local sample data passing through the model, then groups all the results according to the category, and calculates the average value in each group as the characteristic vector of the category.” – the results of the softmax layer are probability distributions across classes and therefore constitute soft predictions. Liu expressly teaches grouping these softmax outputs and calculating an average for each group. Accordingly, the reference teaches information about a plurality of soft predictions that includes an average of the plurality of soft predictions, as well as an average of the corresponding feature vectors.).
Regarding claim 12, Lui in view of Chen, as outlined above, all the elements of claim 11, therefore is rejected for the same reasons as those presented for claim 11. Liu in view of Chen further teaches:
wherein the first knowledge data includes mapping between the average of the plurality of feature vectors and the average of the plurality of soft predictions (Liu teaches “And the user side records the output results (namely the results of the softmax layer) of all local sample data passing through the model, then groups all the results according to the category, and calculates the average value in each group as the characteristic vector of the category.” – the results of the softmax layer constitute soft predictions. Liu teaches grouping these soft predictions by category and calculating an average value for each group. For each category, a corresponding characteristic feature vector is likewise obtained. Because the averaged feature vector and the averaged soft prediction are both generated for the same category, they are inherently associated with one another, thereby forming a mapping between the average of the plurality of feature vectors and the average of the plurality of soft predictions.)
Regarding claim 13, Lui in view of Chen, as outlined above, all the elements of claim 9, therefore is rejected for the same reasons as those presented for claim 9. Liu in view of Chen further teaches:
wherein the plurality of soft predictions are prediction vectors, and each of the prediction vectors includes probabilities of classifications of objects (Liu teaches “ the user side records the output results (namely the results of the softmax layer) of all local sample data passing through the model” – the results of the softmax layer are probability distributions over classification categories. Accordingly, each output result constitutes a prediction vector in which each element represents a probability of a corresponding classification. Because these outputs are generated by the softmax layer, they are soft predictions rather than hard labels. Therefore, Liu teaches that the plurality of soft predictions are prediction vectors, and that each prediction vector includes probabilities of classifications of objects.).
Regarding claim 14, Lui in view of Chen, as outlined above, all the elements of claim 9, therefore is rejected for the same reasons as those presented for claim 9, the claim recites similar limitations corresponding to claim 5 and is rejected for similar reasons as claim 5 using similar teachings and rationale.
Regarding claim 16, Lui in view of Chen, as outlined above, all the elements of claim 9, therefore is rejected for the same reasons as those presented for claim 9, the claim recites similar limitations corresponding to claim 7 and is rejected for similar reasons as claim 7 using similar teachings and rationale.
Regarding claim 17, Liu teaches the following limitations:
outputting, by a feature extractor of a local model, a plurality of feature vectors in response to receiving a plurality of images (Liu teaches generating “category feature vectors” from local sample data using a “feature uploading module.” While it does not use the terms “feature extractor” or “plurality of images,” it is inherent in the description that the system is extracting features from a plurality of images, as classification tasks involving “category feature vectors” typically rely on image inputs. The use of the term “acquiring” supports that the module receives input and produces feature vectors in response. Therefore, the reference inherently teaches a feature extractor outputting a plurality of feature vectors in response to receiving a plurality of images.);
outputting, by a classifier of the local model, a plurality of predictions in response to receiving the plurality of feature vectors (Liu teaches that the “model” at the user side processes local sample data and generates “output results”, which are later grouped by category to form “category feature vectors.” The “output results” are described as the results of local sample data passing through the model and are used to represent each sample’s classification. It is therefore clear that the “output results” correspond to predictions made by the model. While the term “classifier” is not explicitly used, it is inherent that the model performs classification by generation a plurality of predictions in response to the feature vectors. Accordingly, the reference inherently teaches a classifier outputting a plurality of predictions in response to receiving the plurality of feature vectors.);
transmitting knowledge data including information about the plurality of feature vectors and information about the plurality of predictions to a server (Liu teaches that a “feature uploading module” transmits “class feature vectors” and corresponding weights from the user side to the server. These vectors are obtained by grouping the “output results” (predictions) of local sample data after model training. The “output results” are used to determine the class associated with each data sample based on its “tag.” Since the “class feature vectors” are derived from outputs, the data being transmitted includes both feature information and the associated classification information. Therefore, the reference teaches transmitting first knowledge data including information about the plurality of feature vectors and information about the plurality of predictions to a server.);
receiving aggregated knowledge from the server (Liu teaches that each user side (client) transmits “class feature vectors” and weights to the server using a “feature uploading module.” These vectors are generated from “output results” of local model processing and reflect both feature and prediction information, forming the knowledge data. At the server side, a “labeling issuing module” aggregates this data from multiple clients by grouping vectors by category and computed a weighted average. The result is referred to as the “global class features” or “soft labels,” which are issued back to the clients. Because this global information is produced by aggregating the original class feature vectors (i.e., knowledge data), the reference teaches receiving aggregated knowledge from the server.);
training the local model based on the aggregated knowledge (Liu teaches that each user side continues training its local model using both local data and the “soft label” received from the server. The local model includes the same components used during initial training – namely, the model that generates “output results” used to form “class feature vectors.” Since the “soft label” is produced by aggregating the uploaded feature and classification information, it represents the aggregated knowledge. The local training uses this aggregated knowledge to guide further optimization of the same model, which inherently includes the components responsible for feature extraction and classification. Therefore, the reference teaches training a local model including the feature extractor and classifier based on the aggregated knowledge.”).
However, Liu does not teach but Liu in view of Chen teaches the limitations:
training a model in a vehicle (Chen, paragraph [0065] mentions “ FIG. 5 is a high-level block diagram illustrating an example system 500 for hardware-aware federated learning… The system 500 also includes multiple end devices 504a-z…The end devices 504a-z may each comprise… an electric vehicle”):
wherein each feature vector is mapping from raw data of one of the plurality of images (Chen, paragraph [0042] “Upon receiving the image 226, a convolutional layer 232 may apply convolutional kernels (not shown) to the image 226 to generate a first set of feature maps 218…” [0044] “the second set of feature maps 220 is convolved to generate a first feature vector 224. Furthermore, the first feature vector 224 is further convolved to generate a second feature vector 228.” – under the broadest reasonable interpretation, the recited “mapping from raw data” encompasses any transformation that converts image input data into a feature representation. Chen expressly teaches receiving image data and processing that image data through convolutional layers to generate feature vectors. Accordingly, each feature vector is a mapping from raw image data of one of the plurality of images.);
wherein the classifier generates the plurality of predictions based on the plurality of feature vectors (Chen, paragraph [0042] “The DCN 200 may include a feature extraction section and a classification section…” [0044] “the second set of feature maps 220 is convolved to generate a first feature vector 224. Furthermore, the first feature vector 224 is further convolved to generate a second feature vector 228. Each feature of the second feature vector 228 may include a number that corresponds to a possible feature of the image 226, such as “sign,” “60,” and “100.” A softmax function (not shown) may convert the numbers in the second feature vector 228 to a probability. As such, an output 222 of the DCN 200 is a probability of the image 226 including one or more features.” – under BRI, a classifier generates predictions by operating on feature vectors to produce output probabilities. Chen expressly teaches a classification section that receives feature vectors and applies a softmax function to generate probability outputs. Accordingly, the classifier generates a plurality of predictions based on the plurality of feature vectors.);
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having a combination of Liu and Chen before them, to implement the client as a vehicle comprising a controller, as taught by Chen. One would have been motivated to make such a combination in order to apply the federated learning framework to real-world vehicular environments, where each vehicle’s controller enables participation in distributed training based on local capabilities.
Regarding claim 18, Liu in view of Chen, as outlined above, all the elements of claim 17, therefore is rejected for the same reasons as those presented for claim 17, the claim recites similar limitations corresponding to claim 2 and is rejected for similar reasons as claim 2 using similar teachings and rationale.
Regarding claim 19, Liu in view of Chen, as outlined above, all the elements of claim 18, therefore is rejected for the same reasons as those presented for claim 18, the claim recites similar limitations corresponding to claim 3 and is rejected for similar reasons as claim 3 using similar teachings and rationale.
Claims 6, 15, and 20 are rejected under the 35 U.S.C. 103 as being unpatentable over Lui, (Pub No.: CN114626550A (Published: 06/14/2022)) in view of Chen et al., (Pub. No.: US 20240086699 A1 (Filed: 9/9/2022)) further in view of Li et al., (NPL – “Model-Contrastive Federated Learning”, published: 2021).
Regarding claim 6, Liu in view of Chen, as outlined above, all the elements of claim 1, therefore is rejected for the same reasons as those presented for claim 1, mutatis mutandis, Lui in view of Chen further teachers:
train the local model by minimizing a total of a prediction loss, a classifier consistency loss, and a feature extractor consistency loss (Liu teaches training the local model by minimizing multiple losses to ensure both local accuracy and global consistency. Specifically, it teaches that the training module at each user side performs training using a loss function comprising a cross-entropy loss between the model output and the hard label – prediction loss. Additionally, the same passage describes using KL divergence loss or a cross-entropy loss between model output and the soft label, thereby teaching the classifier consistency loss.
However, Liu in view of Chen does not teach:
a feature extractor consistency loss
However, Liu in view of Chen further in view of Li teaches the limitation:
a feature extractor consistency loss (Li, [page 10715 section 3.3, mentions “we propose MOON. MOON is designed as a simple and effective approach based on FedAvg, only introducing lightweight but novel modifications in the local training phase. Since there is always drift in local training and the global model learns a better representation than the local model, MOON aims to decrease the distance between the representation learned by the local model and the representation learned by the global model, and increase the distance between the representation learned by the local model and the representation learned by the previous local model. We achieve this from the inspiration of contrastive learning, which is now mainly used to learn visual representations.”)
Accordingly, it would have been obvious to a person having ordinary skill in the art, before the effective filing date of the claimed invention, having a combination of the Lui, Chen, and Li before them, to train a local model by minimizing a total of a prediction loss, a classifier consistency loss, and a feature extractor consistency loss. Liu teaches employing a cross-entropy loss between the model output and the hard label (prediction loss), and a KL-divergence or cross-entropy loss between the model output and the soft label (classifier consistency loss). MOON teaches applying a contrastive loss to align feature representations between the local model and the global model while separating them from the prior local model (feature extractor consistency loss). One would have been motivated to combine these teachings to improve convergence, representation alignment, and accuracy in heterogeneous federated learning environments, particularly for edge devices like vehicles.
Regarding claim 15, Lui in view of Chen, as outlined above, all the elements of claim 9, therefore is rejected for the same reasons as those presented for claim 9, the claim recites similar limitations corresponding to claim 6 and is rejected for similar reasons as claim 6 using similar teachings and rationale.
Regarding claim 20, Lui in view of Chen, as outlined above, all the elements of claim 17, therefore is rejected for the same reasons as those presented for claim 17, the claim recites similar limitations corresponding to claim 6 and is rejected for similar reasons as claim 6 using similar teachings and rationale.
Conclusion
THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Daravanh Phakousonh whose telephone number is (571)272-6324. The examiner can normally be reached Mon - Thurs 7 AM - 5 PM, Every other Friday 7 AM - 4PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached at 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Daravanh Phakousonh/ Examiner, Art Unit 2121
/Li B. Zhen/ Supervisory Patent Examiner, Art Unit 2121