Last updated: May 29, 2026
Application No. 17/765,711
METHOD, APPARATUS AND SYSTEM FOR TRAINING A NEURAL NETWORK, AND STORAGE MEDIUM STORING INSTRUCTIONS

Non-Final OA §101§102§103§112
Filed
Mar 31, 2022
Priority
Nov 08, 2019 — CN 201911086516.1 +1 more
Examiner
HUANG, YAO D
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Canon Kabushiki Kaisha
OA Round
1 (Non-Final)
This examiner grants 63% of cases after interview

— +31.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 127 resolved cases, 2023–2026
Examiner Intelligence

HUANG, YAO D View full profile →
Grants 63% of resolved cases
Career Allowance Rate
80 granted / 127 resolved
+8.0% vs TC avg
Strong +32% interview lift
Without
With
+31.8%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
16 currently pending
Career history
146
Total Applications
across all art units
Statute-Specific Performance

§101
2.6%
-37.4% vs TC avg
§103
92.9%
+52.9% vs TC avg
§102
2.4%
-37.6% vs TC avg
§112
2.1%
-37.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 127 resolved cases
Office Action

§101 §102 §103 §112
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  
Claim 14 invokes § 112(f). The limitations in this claim that invokes § 112(f) are:
“output unit for obtaining…” 
“update unit for updating…”
In these limitations, the term “unit” in the phrase “unit for” is considered to be a generic placeholder used in the manner of “means” in the phrase “means for.” 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
The above limitations are supported by paragraph [0102] of the specification, which teaches that the “units” may be implemented by a general purpose computer with software for performing the given functions.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
In claims 1, 14, 15, 18, and 19, the phrase “training of the second neural network does not start” is indefinite because it is unclear whether this term means that the training never starts, or instead means that training has not yet started. On its face, this term would typically mean that the training never starts, since the phrase “does not start” is not subject to any condition in the claim and is instead a blanket statement that describes the training. However, such an interpretation may be inconsistent with the other features of the claim, including “updating the current second neural network.” Therefore, the other interpretation is also possible. Since the above phrase can be interpreted in two mutually exclusive manners and it is unclear which interpretation should be used, this phrase is indefinite. For purposes of examination, the above phrase has been interpreted to mean that training has not yet started. If this is the intended meaning, the rejection can be overcome by amending “does not start” to “has not started” or by deleting the above phrase entirely.
The remaining dependent claims, including claims 2-13 and 16-17, are rejected due to their dependency on claim 1 or 15.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 18 and 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claim(s) does/do not fall within at least one of the four categories of patent eligible subject matter. 
“If a claim covers material not found in any of the four statutory categories, that claim falls outside the plainly expressed scope of § 101” (MPEP § 2106.03(I)).
Claim 18 is rejected because it covers software per se. See MPEP § 2106.03(I): “products that do not have a physical or tangible form, such as…a computer program per se (often referred to as ‘software per se’) when claimed as a product without any structural recitations” are not directed to any of the statutory categories. Here, claim 18 recites a “system” comprising “a cloud server and an embedded device that are connected to each other via a network.” Here, the terms “system,” “cloud server,” and “embedded” device broadly cover virtual devices and virtual servers, and do not require any physical or tangible form such as physical hardware, and the term “connected to each other via a network” does not require any hardware communication interfaces, but broadly covers software interfaces. Therefore, claim 18 is directed to software per se. To overcome this rejection, applicant could add physical components such as a memory and a processor.
Claim 19 is rejected because it coves signals per se. MPEP § 2106.03 states: “a transitory signal, while physical and real, does not possess concrete structure that would qualify as a device or part under the definition of a machine, is not a tangible article or commodity under the definition of a manufacture (even though it is man-made and physical in that it exists in the real world and has tangible causes and effects), and is not composed of matter such that it would qualify as a composition of matter.” Here, a “storage device” encompasses signals per se, because “device” is not limited to a physical hardware. This rejection can be overcome by amending the claim to recite a “non-transitory computer readable medium” storing the computer program.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-6, 13-14, and 19 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhang et al., “Deep Mutual Learning,” arXiv:1706.00384v1 [cs.CV] 1 Jun 2017 (“Zhang”). 
As to claim 1, Zhang teaches a method of training a neural network comprising a first neural network and a second neural network, characterized in that: training of the first neural network has not yet completed and training of the second neural network does not start, wherein for the current first neural network and the current second neural network, [§ 2.1, paragraph 1: “We formulate the proposed DML approach with a cohort of two networks”; FIG. 1, caption: “Each network is trained with a supervised learning loss.” See also Algorithm 1 (at end of § 2.2). As shown in Algorithm 1, the training of the first model has not yet completed, because the model is still being trained. Furthermore, the training of the second model has not yet started, noting that the term “training” is not specifically defined in this context; thus, a further iteration of the iterative process indicated by the “repeat” operation in Algorithm 1.] the method comprises:
an output step of obtaining a first output by subjecting a sample image to the current first neural network, and obtaining a second output by subjecting the sample image to the current second neural network; [FIG. 1, teaches the prediction p1 and p2, which are respective outputs of the two neural networks upon receiving the input image, which is represented as x.] and
an update step of updating the current first neural network according to a first loss function value, and updating the current second neural network according to a second loss function value, wherein the first loss function value is obtained according to the first output, and the second loss function value is obtained according to the first output and the second output. [Algorithm 1, which teaches the loss functions LΘ1 and LΘ2 for the first and second models, which are determined according to the outputs as indicated in equations (1)-(4). That is, the predictions (outputs) p1 and p2 are predicted from the logits z, and the loss functions are based on the KL divergence between the predictions p. The values of these loss functions are used to update the models Θ1 and Θ1 as shown in lines 2 and 4 of the algorithm.]

As to claim 2, Zhang teaches the method according to claim 1, wherein, the current first neural network has been updated once at most with respect to its previous state; and the current second neural network has been updated once at most with respect to its previous state. [This limitation is met by a second iteration, i.e., the iteration of t = 2 in the “repeat” condition of Algorithm 1, since at the start of the second iteration, both neural networks have been trained for at most 1 time in this algorithm since the initialization of the models Θ1 and Θ1. That is the initial conditions correspond to a previous state, and at the start of the second iteration, the models have been updated once.]

As to claim 3, Zhang teaches the method according to claim 1, wherein, the first output includes a first processing result obtained by subjecting the sample image to the current first neural network; and the second output includes a second processing result obtained by subjecting the sample image to the current second neural network. [As formulated in equation (1) and the corresponding text, the outputs p1 and p2 are based on the logits z1 and z2, each of which corresponds to a processing result obtained by subjecting an input image to the respective neural network.]

As to claim 4, Zhang teaches the method according to claim 3, wherein, in the update step, the second loss function value is calculated according to a real result in a label of the sample image, the first processing result and the second processing result. [Zhang § 2.1, paragraph 1 teaches that the samples xi (i=1 to N) have corresponding labels yi (i=1 to N), which correspond to a “real result.” The labels are used to calculate the “cross-entropy error between the predicted values and the correct labels” (see text above equation (2)). Note that this cross-entropy error, i.e., LC1 (and analogously, LC2) are part of the loss functions as defined in equations (4) and (5). Furthermore, the overall loss functions LΘ1 and LΘ2 are also both computed according to the processing results of each model, which forms the basis of p1 and p2 as defined in equations (4) and (5).]

As to claim 5, Zhang teaches the method according to claim 3, wherein, the first output includes a first sample feature obtained by subjecting the sample image to the current first neural network; and the second output includes a second sample feature obtained by subjecting the sample image to the current second neural network. [As noted above, the output of each of the two models Θ1 and Θ1 is a prediction (p1 and p2, respectively). This prediction constitutes a “feature” in the absence of further limitation as to the type or content of the “feature.”]

As to claim 6, Zhang teaches the method according to claim 5, wherein, in the update step, the second loss function value is calculated according to the first sample feature and the second sample feature. [As shown in equation (5), the second loss value, namely the value of LΘ2, is based on the first sample feature (p1) and the second sample feature (p2).] 

As to claim 13, the combination of Zhang and Li teaches the method according to claim 1, wherein, the first neural network is a teacher neural network, and the second neural network is a student neural network. [Zhang, abstract: “In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process.” That is, each of the two networks in Zhang serves the role of both teacher and student, such both can be considered to be both a teacher neural network and a student neural network, noting that the instant claim does not require the networks to be exclusively a student or exclusively a teacher. Specifically, one is the teacher when the other is updated.]

As to claim 14, this claim is directed to an apparatus for performing operations that are the same or substantially the same as those of claim 1. Therefore, the rejection made to claim 1 is applied to claim 14.
Furthermore, Zhang teaches an “apparatus” [§ 3.1, paragraph 2: “We implement all networks and training procedures in TensorFlow [1] and conduct all experiments on an NVIDIA GeForce GTX 1080 GPU.” That is, Zhang teaches the use of a general-purpose computer.]

As to claim 19, this claim is directed to a storage medium for performing operations that are the same or substantially the same as those of claim 1. Therefore, the rejection made to claim 1 is applied to claim 19.
Furthermore, Zhang teaches a “storage medium storing instructions that, when executed by a processor, enable to execute training of a neural network” [§ 3.1, paragraph 2: “We implement all networks and training procedures in TensorFlow [1] and conduct all experiments on an NVIDIA GeForce GTX 1080 GPU.” That is, Zhang teaches the use of a general-purpose computer that includes software implemented using TensorFlow and a processor including a CPU (implied) and a GPU. The generic computer component of a storage medium is implied by the fact that the method of Zhang is implemented using software on a general-purpose computer.]

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

1.	Claims 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Li et al., “Mimicking Very Efficient Network for Object Detection,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (“Li”).
As to claim 7, Zhang teaches the method according to claim 5, but does not teach the further limitations of the instant dependent claim.
Li teaches “wherein, in the update step, the second loss function value is calculated according to features in a specific area of the first sample feature and features in the specific area of the second sample feature” [§ 3.2, paragraph 3: “Therefore, we propose a new fully convolutional network feature mimic method by mimicking the features sampled from regions of proposals to solve the high-dimensional regression problem of fully convolutional feature map. The feature mimic method based on proposals sampling could also make the small network focus more on learning the region of interests features from the large model rather than the global context features. Local region features can be sampled by bounding boxes of different ratios and sizes from the feature maps of both the small network and the large network using spatial pyramid pooling [16]. Then the sampled features from the feature map of small network are regressed to the same dimension as the large model by a following transformation layer [29]. The loss function that small network intends to minimize is defined as…” As shown in equations (2)-(3), the loss function, which is minimized by the small network (analogous to the second network) includes u(i) which is the “feature sampled by spatial pyramid pooling from the feature map of the large model” and u(i) which is the “the sampled output feature of the small network.” Furthermore, in regards to the “specific area,” this is also illustrated in FIG. 1, whose caption states: “A Region Proposal Network generates candidate RoIs, which then used to extract features from the feature maps.” Note that the RoIs mentioned here  correspond to the “bounding boxes” quoted above.] “and wherein, the specific area is determined according to an object area in a label of the sample image.” [§ 3.2, paragraph 3 (partial paragraph below equation 4): “By optimizing this loss function, the small network can be trained under both the ground-truths and the additional supervision from the large models.” Note that in this context, “ground truth” refers to a bounding box, which is a type of label. See § 4.1, paragraph 1: “On Caltech, we train our models on the new annotated 10× training data provided by [37] and select only the images that contains ground-truth bounding boxes in the training dataset which has about 9k images in total.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zhang with the teachings of Li by modifying the method of Zhang to implement the training technique of Li for the application of object detection, so as to arrive at the claimed invention of the instant dependent claim. The motivation would have been to train detectors for high performance (see Zhang, § 4: “In this paper we propose a feature mimic method to further extend the mimic approach to object detection tasks. By supervision of the features from the large network, we can train networks from scratch to achieve superior performance than fine-tuning from ImageNet pre-trained models.”). 

As to claim 8, the combination of Zhang and Li teaches the method according to claim 7, as set forth above.
Li further teaches “wherein, the specific area is one of the object area, a smooth response area of the object area and a smooth response area at a corner point of the object area.” [§ 3.2, paragraph 3: “Therefore, we propose a new fully convolutional network feature mimic method by mimicking the features sampled from regions of proposals to solve the high-dimensional regression problem of fully convolutional feature map. The feature mimic method based on proposals sampling could also make the small network focus more on learning the region of interests features from the large model rather than the global context features. Local region features can be sampled by bounding boxes of different ratios and sizes from the feature maps of both the small network and the large network using spatial pyramid pooling [16].” That is, the bounding box is an “object area,” noting that the instant claim recites an alternative expression denoted by “one of.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zhang with the teachings of Li so as to also arrive at the claimed invention of the instant dependent claim. The motivation for doing so is covered by the motivation given for Li in the rejection of parent dependent claim 7.

2.	Claims 9-11 and 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Li, and further in view of Gidaris et al., “Object detection via a multi-region & semantic segmentation-aware CNN model,” ICCV 2015 (“Gidaris”).
As to claim 9, the combination of Zhang and Li teaches the method according to claim 7, but does not teach the further limitations of the instant dependent claim.
Gidaris teaches “wherein, the specific area is adjusted according to a feature value of the second sample feature.” [§ 4, paragraph 4: “After the last iteration T, the candidate detections {Dtc}Tt=1 produced on each iteration t are merged together.” Note that “detections” in this context refers to detection boxes. Furthermore, merging a box with others constitutes adjusting it.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of the references combined thus far with the teachings of Gidaris by modifying the method of Zhang, as modified thus far, to use Gidari’s technique of an iterative localization mechanism that alternates between scoring a box proposal and refining its location with a deep CNN regression model, so as to arrive at the claimed invention of the instant dependent claim. The motivation would have been to enable “capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization” (see Gidaris, abstract).

As to claim 10, the combination of Zhang, Li, and Gidaris teaches the method according to claim 9, as set forth above.
Gidaris further teaches “wherein, the adjusted specific area is a merged area formed by an area corresponding to a feature for which the feature value is larger than or equal to a predetermined threshold value in the second sample feature and the specific area. [§ 4, paragraph 4: “After the last iteration T, the candidate detections {Dtc}Tt=1 produced on each iteration t are merged together… We exploit this “by-product” of the iterative localization scheme by adding a step of bounding box voting. First, standard non-max suppression [10] is applied on Dc and produces the detections Yc = {(si,c, Bi,c)} using an IoU overlap threshold of 0.3. Then, the final bounding box coordinates Bi,c are further refined by having each box Bj,c ∈ N(Bic) (where N(Bic) denotes the set of boxes in Dc that overlap with Bic by more than 05 on IoU metric) to vote for the bounding box location using as weight its score wj,c = max(0,sj,c), or … The final set of object detections for class c will be Y′c.” That is, the merged candidates Dc further processed so that each box is a member of a set of boxes in Dc that overlap with Bi,c by the threshold of >0.5 for the IoU.]

As to claim 11, the combination of Zhang, Li, and Gidaris teaches the method according to claim 9, as set forth above.
Li further teaches “wherein, the second loss function value indicates a difference of features in the adjusted specific area of the first sample feature and the second sample feature.” [Li, § 3.2, last paragraph, which teaches a loss that uses a difference between u(i) and r(v(i)), which is the loss between the feature maps of the large and small models, analogous to the first and second sample feature of the instant claim.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far, including the above the teachings of Li, so as to arrive at the claimed invention of the instant dependent claim. The motivation for doing so is covered by the motivation given for Li in the rejection of parent dependent claim 7.
As to claim 15, Zhang teaches a method of training a neural network comprising a first neural network and a second neural network, wherein training of the first neural network has completed and training of the second neural network does not start, characterized in that: for the current second neural network, [§ 2.1, paragraph 1: “We formulate the proposed DML approach with a cohort of two networks”; FIG. 1, caption: “Each network is trained with a supervised learning loss.” See also Algorithm 1 (at end of § 2.2). As shown in Algorithm 1, the training of the first model has not yet completed, because the model is still being trained. Furthermore, the training of the second model has not yet started, noting that the term “training” is not specifically defined in this context; thus, a further iteration of the iterative process indicated by the “repeat” operation in Algorithm 1.]  the method comprises:
an output step of obtaining a first sample feature by subjecting a sample image to the first neural network, and obtaining a second sample feature by subjecting the sample image to the current second neural network; [FIG. 1, teaches the prediction p1 and p2, which are respective outputs of the two neural networks upon receiving the input image, which is represented as x.] and
an update step of updating the current second neural network according to a loss function value, […] [Algorithm 1, which teaches the loss functions LΘ1 and LΘ2 for the first and second models, which are determined according to the outputs as indicated in equations (1)-(4). That is, the predictions (outputs) p1 and p2 are predicted from the logits z, and the loss functions are based on the KL divergence between the predictions p. The values of these loss functions are used to update the models Θ1 and Θ1 as shown in lines 2 and 4 of the algorithm.]
Zhang does not explicitly teach “wherein the loss function value is obtained according to features in a specific area of the first sample feature and features in the specific area of the second sample feature, wherein the specific area is determined according to an object area in a label of the sample image; and wherein the specific area is adjusted according to a feature value of the second sample feature.”
Li teaches “wherein the loss function value is obtained according to features in a specific area of the first sample feature and features in the specific area of the second sample feature” [§ 3.2, paragraph 3: “Therefore, we propose a new fully convolutional network feature mimic method by mimicking the features sampled from regions of proposals to solve the high-dimensional regression problem of fully convolutional feature map. The feature mimic method based on proposals sampling could also make the small network focus more on learning the region of interests features from the large model rather than the global context features. Local region features can be sampled by bounding boxes of different ratios and sizes from the feature maps of both the small network and the large network using spatial pyramid pooling [16]. Then the sampled features from the feature map of small network are regressed to the same dimension as the large model by a following transformation layer [29]. The loss function that small network intends to minimize is defined as…” As shown in equations (2)-(3), the loss function, which is minimized by the small network (analogous to the second network) includes u(i) which is the “feature sampled by spatial pyramid pooling from the feature map of the large model” and u(i) which is the “the sampled output feature of the small network.” Furthermore, in regards to the “specific area,” this is also illustrated in FIG. 1, whose caption states: “A Region Proposal Network generates candidate RoIs, which then used to extract features from the feature maps.” Note that the RoIs mentioned here correspond to the “bounding boxes” quoted above.] and “wherein the specific area is determined according to an object area in a label of the sample image” [§ 3.2, paragraph 3 (partial paragraph below equation 4): “By optimizing this loss function, the small network can be trained under both the ground-truths and the additional supervision from the large models.” Note that in this context, “ground truth” refers to a bounding box, which is a type of label. See § 4.1, paragraph 1: “On Caltech, we train our models on the new annotated 10× training data provided by [37] and select only the images that contains ground-truth bounding boxes in the training dataset which has about 9k images in total.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zhang with the teachings of Li by modifying the method of Zhang to implement the training technique of Li for the application of object detection, so as to arrive at the above features of the instant dependent claim (i.e., ““wherein the loss function value is obtained according to features in a specific area of the first sample feature and features in the specific area of the second sample feature, wherein the specific area is determined according to an object area in a label of the sample image”). The motivation would have been to train detectors for high performance (see Zhang, § 4: “In this paper we propose a feature mimic method to further extend the mimic approach to object detection tasks. By supervision of the features from the large network, we can train networks from scratch to achieve superior performance than fine-tuning from ImageNet pre-trained models.”). 
The combination of references thus far does not teach “wherein the specific area is adjusted according to a feature value of the second sample feature.”
Gidaris teaches “wherein the specific area is adjusted according to a feature value of the second sample feature.” [§ 4, paragraph 4: “After the last iteration T, the candidate detections {Dtc}Tt=1 produced on each iteration t are merged together… We exploit this “by-product” of the iterative localization scheme by adding a step of bounding box voting. First, standard non-max suppression [10] is applied on Dc and produces the detections Yc = {(si,c, Bi,c)} using an IoU overlap threshold of 0.3. Then, the final bounding box coordinates Bi,c are further refined by having each box Bj,c ∈ N(Bic) (where N(Bic) denotes the set of boxes in Dc that overlap with Bic by more than 05 on IoU metric) to vote for the bounding box location using as weight its score wj,c = max(0,sj,c), or … The final set of object detections for class c will be Y′c.” That is, the specific area is adjusted based on the “overlap” (corresponding to a feature value of the instant claim), since the final bounding box coordinates are refined based on this overlap (feature value). Note also that “detections” in this context refers to detection boxes. Furthermore, merging a box with others constitutes adjusting it.]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of the references combined thus far with the teachings of Gidaris by modifying the method of Zhang, as modified thus far, to use Gidari’s technique of an iterative localization mechanism that alternates between scoring a box proposal and refining its location with a deep CNN regression model, so as to arrive at the claimed invention of the instant dependent claim. The motivation would have been to enable “capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization” (see Gidaris, abstract).

As to claim 16, the combination of Zhang, Li, and Gidaris teaches the method according to claim 15, as set forth above.
Li further teaches “wherein, the specific area is one of the object area, a smooth response area of the object area and a smooth response area at a corner point of the object area.” [§ 3.2, paragraph 3: “Therefore, we propose a new fully convolutional network feature mimic method by mimicking the features sampled from regions of proposals to solve the high-dimensional regression problem of fully convolutional feature map. The feature mimic method based on proposals sampling could also make the small network focus more on learning the region of interests features from the large model rather than the global context features. Local region features can be sampled by bounding boxes of different ratios and sizes from the feature maps of both the small network and the large network using spatial pyramid pooling [16].” That is, the bounding box is an “object area,” noting that the instant claim recites an alternative expression denoted by “one of.”]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zhang with the teachings of Li so as to also arrive at the claimed invention of the instant dependent claim. The motivation for doing so is covered by the motivation given for Li in the rejection of parent claim.


As to claim 17, the combination of Zhang, Li, and Gidaris teaches the method according to claim 15, wherein, the first neural network is a teacher neural network, and the second neural network is a student neural network. [Zhang, abstract: “In this paper, we present a deep mutual learning (DML) strategy where, rather than one way transfer between a static pre-defined teacher and a student, an ensemble of students learn collaboratively and teach each other throughout the training process.” That is, each of the two networks in Zhang serves the role of both teacher and student, such both can be considered to be both a teacher neural network and a student neural network, noting that the instant claim does not require the networks to be exclusively a student or exclusively a teacher. Specifically, one is the teacher when the other is updated.]

3.	Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang in view of Yang et al. (US 2020/0027019 A1) (“Yang”).
As to claim 18, Zhang teaches a system for training a neural network, [§ 3.1, paragraph 2: “We implement all networks and training procedures in TensorFlow [1] and conduct all experiments on an NVIDIA GeForce GTX 1080 GPU.”] […], the neural network comprising a first neural network for which training is executed […], and a second neural network for which training is executed […], characterized in that: training of the first neural network has not yet completed and training of the second neural network does not start, wherein for the current first neural network and the current second neural network, [§ 2.1, paragraph 1: “We formulate the proposed DML approach with a cohort of two networks”; FIG. 1, caption: “Each network is trained with a supervised learning loss.” See also Algorithm 1 (at end of § 2.2). As shown in Algorithm 1, the training of the first model has not yet completed, because the model is still being trained. Furthermore, the training of the second model has not yet started, noting that the term “training” is not specifically defined in this context; thus, a further iteration of the iterative process indicated by the “repeat” operation in Algorithm 1.] the system executes:
an output step of obtaining a first output by subjecting a sample image to the current first neural network, and obtaining a second output by subjecting the sample image to the current second neural network; [FIG. 1, teaches the prediction p1 and p2, which are respective outputs of the two neural networks upon receiving the input image, which is represented as x.] and
an update step of updating the current first neural network according to a first loss function value, and updating the current second neural network according to a second loss function value, wherein the first loss function value is obtained according to the first output, and the second loss function value is obtained according to the first output and the second output. [Algorithm 1, which teaches the loss functions LΘ1 and LΘ2 for the first and second models, which are determined according to the outputs as indicated in equations (1)-(4). That is, the predictions (outputs) p1 and p2 are predicted from the logits z, and the loss functions are based on the KL divergence between the predictions p. The values of these loss functions are used to update the models Θ1 and Θ1 as shown in lines 2 and 4 of the algorithm.]
Zhang does not teach the limitation of the system comprising “a cloud server and an embedded device that are connected to each other via a network” and the limitation that the training of the first and second neural networks are executed “in the cloud server” and “in the embedded device” respectively.
However, the above limitations merely define a system environment comprising two different types of general-purpose computers in which the method is performed.
Yang teaches “a cloud server and an embedded device that are connected to each other via a network” [[0070]: “The AI server 16 may be connected to at least one or more of the robot 11, self-driving vehicle 12, XR device 13, smartphone 14, or home appliance 15, which are AI devices constituting the AI system, through the cloud network 10 and may help at least part of AI processing conducted in the connected AI devices (11 to 15).” Here, the AI server is a cloud server since it is part of a cloud network, while “the robot 11, self-driving vehicle 12, XR device 13, smartphone 14, or home appliance 15” are embedded systems.] and the limitations of execution “in the cloud server” and “in the embedded device” [[0079]: “Here, the learning model may be the one trained by the robot 11 itself or trained by an external device such as the AI server 16.” [0090] “Here, the learning model may be the one trained by the self-driving vehicle 12 itself or trained by an external device such as the AI server 16.” See also [0098].]
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Zhang with the teachings of Yang by implementing the method of Zhang on a “a cloud server and an embedded device that are connected to each other via a network” as taught in Yang such that the training of the first and second neural networks are executed “in the cloud server” and “in the embedded device” respectively. Doing so would have been an obvious combination of prior art elements according to known methods to yield predictable results (MPEP § 2143(I)) because (1) the prior art included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference (as discussed above); (2) one of ordinary skill in the art could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately (specifically, cloud servers and embedded systems are merely types of computing devices that can perform functions such as training); (3) a finding that one of ordinary skill in the art would have recognized that the results of the combination were predictable (specifically, the predictable result in which different devices are used to perform the training process in a distributed manner).


Allowable Subject Matter
Claim 12 is given no prior art rejection and would be objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, if the § 112(b) rejection of the claim is overcome.
The following is a statement of reasons for the indication of allowable subject matter: 
The prior art of record does not teach or fairly suggest the loss function as recited in dependent claim 12.
For example, while Li teaches loss functions in equation (1) and (5) that uses a sum of a square of differences, Li does not teach the union term and the 1/(NE + NS) factor. Similarly, the Hu et al. and Wang et al. references cited with this action teaches mean squared error for losses, but do not teach the union term and the 1/(NE + NS) factor recited in claim 12.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following documents depict the state of the art.
Hu et al. "Objective Video Quality Assessment Based on Perceptually Weighted Mean Squared Error," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 27, no. 9, pp. 1844-1855, Sept. 2017, teaches conventional loss functions.
Wang et al. "A review of object detection based on convolutional neural network," 2017 36th Chinese Control Conference (CCC), Dalian, China, 2017, pp. 11104-11109 teaches conventional loss functions.
Zhou et al., “IoU Loss for 2D/3D Object Detection,” 2019 International Conference on 3D Vision (3DV) teaches conventional techniques in IoU loss.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Y.D.H./Examiner, Art Unit 2124                                                                                                                                                                                                        



/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action
Prosecution Timeline

Mar 31, 2022
Application Filed
May 06, 2026
Non-Final Rejection mailed — §101, §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/224,306
Patent 12626122
METHODS OF PROVIDING TRAINED HYPERDIMENSIONAL MACHINE LEARNING MODELS HAVING CLASSES WITH REDUCED ELEMENTS AND RELATED COMPUTING SYSTEMS
5y 1m to grant Granted May 12, 2026
17/447,542
Patent 12626138
CAUSALITY DETECTION FOR OUTLIER EVENTS IN TELEMETRY METRIC DATA
4y 8m to grant Granted May 12, 2026
16/461,763
Patent 12619852
Method and System for Simulating, Predicting, Interpreting, Comparing, or Visualizing Complex Data
6y 11m to grant Granted May 05, 2026
17/533,679
Patent 12608604
METHOD AND APPARATUS FOR TRAINING ARTIFICIAL INTELLIGENCE BASED ON EPISODE MEMORY
4y 5m to grant Granted Apr 21, 2026
17/747,036
Patent 12536455
Method for Early Warning Brandish of Transmission Wire Based on Improved Bayes-Adaboost Algorithm
3y 8m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
63%
Grant Probability
95%
With Interview (+31.8%)
4y 0m (~0m remaining)
Median Time to Grant
Low
PTA Risk
Based on 127 resolved cases by this examiner. Grant probability derived from career allowance rate.