DETAILED ACTION
This Office Action is responsive to the Applicant’s submission, filed on December 29, 2025, amending claims 1, 11, 13 and 14, and adding new claim 15. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is invoked.
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph:
(A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitations use a generic placeholder (e.g. “unit”) that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: the “coefficient storage unit,” the “feature storage unit,” the “storage control unit” and the “convolution operation unit” first recited in claim 1 and required by its dependent claims; the “transformation unit” in claim 3; the “first convolution operation unit,” the “second convolution operation unit,” and the “third convolution operation unit” in claim 8 and required by its dependent claims; the “detection unit” recited in claim 9; and the “unit” recited in claim 11.
Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph.
Claim Objections
Claims 1-12 and 15 are objected to because of the following informalities. Appropriate correction is required.
In particular, in claim 1, there is no antecedent basis for “the convolution computation processing” recited therein. Claims 2-12 and 15 depend from claim 1 and thereby include all of the limitations of claim 1. Accordingly, claims 2-12 and 15 are objected to for the same reasons as noted for claim 1.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-15 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Application Publication No. 2021/0073558 to Li et al. (“Li”), over U.S. Patent No. 12,159,214 to Ko et al. (“Ko”), and also over the article entitled “Heterogeneous Siamese Tracking System Based on PYNQ Framework” by Cui et al. (“Cui”).
Regarding claim 1, Li describes a method for detecting a target object, which includes inputting each of at least two feature groups output from at least two network layers of a neural network into a detector, and whereby the detector outputs a classification result and a regression result for each group (see e.g. paragraphs 0005-0008). Li particularly teaches that such a method can be implemented via an information processing apparatus (see e.g. paragraphs 0188-0190) which, like claimed, comprises
a storage control unit configured to set a part of previously obtained feature data as template feature data, and align an order of the template feature data with an order of data referenced by filter coefficients in a convolution computation processing (see e.g. paragraphs 0043-0047 and 0069-0070: Li teaches extracting a feature of a “reference frame” and a feature of a “frame under detection,” wherein the reference frame indicates a target object and the frame under detection, which can occur subsequent to the reference frame in a video sequence, indicates a current frame which is to be subject to target object detection. The extracted feature of the reference frame and the extracted feature of the frame under detection are input as a feature group into a detector, which outputs a corresponding classification result and a regression result, wherein the classification result indicates a probability that a candidate box is a bounding box for the target object and the regression result indicates a position offset of the candidate box – see e.g. paragraphs 0053-0058 and 0071-0072. Li particularly discloses that such detection entails obtaining a classification weight for the detector and a regression weight for the detector based on the extracted feature of the reference frame, wherein the extracted feature of the frame under detection is then processed with the classification and regression weights to obtain the classification and regression results – see e.g. paragraphs 0073-0074, 0078 and 0081. The extracted feature of the reference frame, or the obtained classification and/or regression weight based thereon, is considered “template feature data” like claimed. Accordingly, Li teaches setting a part of previously obtained feature data, e.g. the extracted feature of the reference frame or the obtained classification and/or regression weight, as template feature data. Li further teaches aligning an order of such template feature data with an order of data referenced by filter coefficients in a convolution computation processing. Particularly, the classification weight and regression weight serve as filter coefficients in a convolution operation; the classification and regression weights are each convolved with the extracted feature data of the frame under detection to obtain the classification result and regression result, respectively – see e.g. paragraphs 0085-0087. Since the classification and regression weights serve as filter coefficients in the convolution operation, the order of the classification and regression weights would necessarily be aligned with an order of data referenced by filter coefficients in a convolution computation processing, in this case, with the extracted feature data of the frame under detection. Li teaches that the above-described convolution operations can be performed via a suitably programmed processor – see e.g. paragraphs 0188-0191. Such a processor performing the above-noted operations is considered a “storage control unit” like claimed.); and
a convolution operation unit configured to compute new feature data by a convolution operation between feature data and filter coefficients, and compute, by a convolution operation between feature data and template feature data, correlation data between the feature data and the template feature data (see e.g. paragraphs 0043-0047 and 0069-0070: Like noted above, Li teaches extracting a feature of a “reference frame” and a feature of a “frame under detection,” wherein the reference frame indicates a target object and the frame under detection, which can occur subsequent to the reference frame in a video sequence, indicates a current frame which is to be subject to target object detection. Li particularly teaches that such feature extraction can be performed via a convolution operation by a convolutional layer in a neural network – see e.g. paragraphs 0049-0051 and 0065. Such a convolutional layer necessarily comprises filter coefficients, and thus Li is considered to teach computing new feature data, e.g. extracting a feature of a frame under detection, by a convolution operation between feature data and filter coefficients. Moreover, like further noted above, Li teaches that the extracted feature of the reference frame and the extracted feature of the frame under detection are input as a feature group into a detector, which outputs a corresponding classification result and a regression result, wherein the classification result indicates a probability that a candidate box is a bounding box for the target object and the regression result indicates a position offset of the candidate box – see e.g. paragraphs 0053-0058 and 0071-0072. As further noted above, Li discloses that such detection entails obtaining a classification weight for the detector and a regression weight for the detector based on the extracted feature of the reference frame, and then processing the extracted feature of the frame under detection with these classification and regression weights to obtain the classification and regression results, respectively – see e.g. paragraphs 0073-0074, 0078 and 0081. In particular, a convolution operation is performed with the classification weight on the extracted feature of the fame under detection so as to obtain the classification result, and a convolution operation is performed with the regression weight on the extracted feature of the frame under detection so as to obtain the regression result – see e.g. paragraph 0085-0087. As noted above, the extracted feature of the reference frame, or the obtained classification and/or regression weight based thereon, is considered “template feature data” like claimed. The classification result and/or regression result can be considered “correlation data” like claimed. Consequently, Li is further considered to teach computing, by a convolution operation between feature data and template feature data, i.e. between the extracted feature of the frame under detection and the classification and/or regression weights, correlation data between the feature data and the template feature data, i.e. the classification and/or regression results. As noted above, the claimed “convolution operation unit” is interpreted under 35 U.S.C. 112(f), and is therefore interpreted to cover the corresponding structure described in the specification as performing the claimed function. The specification of the instant application teaches that the convolution operation unit is implemented via a “computation processing unit” having a multiplier and a cumulative adder – see e.g. paragraphs 0034 and 0040 of the application as filed. Correspondingly, Li teaches that the above-described convolution operations can be performed via a suitably programmed processor, which understandably comprises a multiplier and cumulative adder – see e.g. paragraphs 0188-0191. Accordingly, such a processor programmed to perform the convolution operations described by Li is considered a “convolution operation unit” like claimed.).
Li thus teaches an information processing apparatus similar to that of claim 1, which is operable to perform computation processing in a neural network. However, while Li teaches that the information processing apparatus comprises storage (see e.g. paragraph 0193), Li does not particularly disclose: (i) that the feature data is stored in a “feature storage unit” of the information processing apparatus, wherein the feature storage unit is configured to store feature data; (ii) that the filter coefficients and template feature data are stored in a “coefficient storage unit” of the information processing apparatus, wherein the coefficient storage unit is configured to store filter coefficients of the neural network; and (iii) that the storage control unit is further configured to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is required by claim 1.
Ko nevertheless describes an integrated circuit for executing a machine-trained network (e.g. a convolutional neural network), wherein the integrated circuit comprises: (i) a coefficient storage unit (i.e. a weight memory and/or filter slice buffer) configured to store filter coefficients (i.e. weight values) of the machine-trained network; (ii) a feature storage unit (i.e. an activation memory and/or activation window buffer) configured to store feature data (i.e. activation values); (iii) a storage control unit (e.g. a memory control) configured to store data in the coefficient storage unit, and (iv) and an operation unit (e.g. a microprocessor, core controller and/or adder trees) configured to compute new feature data by an operation (e.g. a dot product) between feature data stored in the feature storage unit and filter coefficients stored in the filter coefficient storage unit (see e.g. column 1, lines 21-55; column 16, line 66 – column 17, line 19; column 30, lines 12-58; column 31, lines 4-44; column 32, lines 1-16; and FIG. 17).
It would have been obvious to one of ordinary skill in the art, having the teachings of Li and Ko before the effective filing date of the claimed invention, to modify the information processing apparatus taught by Li so as to include the integrated circuit taught by Ko to execute the neural network, wherein the integrated circuit comprises, inter alia: (i) a coefficient storage unit configured to store the filter coefficients of the neural network; (ii) a feature storage unit configured to store the feature data; and wherein (iii) the storage control unit is configured to store data in the coefficient storage unit. It would have been advantageous to one of ordinary skill to utilize such an integrated circuit because it can provide accelerated execution of the neural network, as is suggested by Ko (see e.g. column 1, lines 6-39). Li and Ko thus teach an information processing apparatus similar to that of claim 1, but do not explicitly disclose that the storage control unit is further configured to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is required by claim 1.
Cui generally describes “a Siamese network tracking scheme based on PYNQ framework, which is deployed on ZYNQ platform.” (Abstract). Regarding the claimed invention, Cui particularly demonstrates that platform comprises a coefficient storage unit (i.e. input buffers) configured to store filter coefficients (e.g. weights and biases) of a neural network, wherein template feature data is stored in a first memory region (e.g. in a template image buffer or template output buffer) of the coefficient storage unit, and the first memory region is different from a second memory region (e.g. a weight buffer) that the filter coefficients are stored in the coefficient storage unit (see e.g. sections IV.A SNA_IP Core and IV.B RPNA_IP core, and Figures 2 and 3).
It would have been obvious to one of ordinary skill in the art, having the teachings of Li, Ko and Cui before the effective filing date of the claimed invention, to modify the information processing apparatus taught by Li and Ko such that the storage control unit is configured to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. It would have been advantageous to one of ordinary skill to utilize such a combination, because it would enable the template data to be readily accessed by processing units, as is evident from Cui ((see e.g. sections IV.A SNA_IP Core and IV.B RPNA_IP core, and Figures 2 and 3). Accordingly, Li, Ko and Cui are considered to teach, to one of ordinary skill in the art, an information processing apparatus like that of claim 1.
As per claim 2, Li further teaches that a part of the feature data (i.e. feature data of the reference frame) computed by the convolution operation unit is applied as template feature data (i.e. as classification weights and/or regression weights) (see e.g. paragraphs 0073, 0078 and 0081). Like described above, it would have been obvious to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. Accordingly, the above-described combination of Li, Ko and Cui is further considered to teach an information processing apparatus like that of claim 2, in which the storage control unit stores in the coefficient storage unit a part of the feature data (i.e. the classification weights and/or regression weights), which is computed by the convolution operation unit, as the template feature data.
As per claim 3, Li teaches that a part of the feature data (i.e. feature data of the reference frame) computed by the convolution operation unit is applied as template feature data (i.e. as classification weights and/or regression weights) (see e.g. paragraphs 0073, 0078 and 0081). Li, however, does not explicitly disclose that the information processing apparatus further comprises a transformation unit configured to non-linearly transform feature data computed by the convolution operation unit, and wherein the storage control unit stores in the coefficient storage unit a part of the feature data non-linearly transformed by the transformation unit as the template feature data, as is claimed. The integrated circuit described by Ko nevertheless comprises a transformation unit (i.e. a “post-processor”) configured to non-linearly transform feature data computed by a convolution (i.e. dot product) operation (see e.g. column 1, lines 40-55; column 7, lines 46-59; column 8, lines 36-59; and column 29, lines 16-41). Accordingly, it would have been obvious to one of ordinary skill in the art, having the teachings of Li, Ko and Cui before the effective filing date of the claimed invention, to further modify the information processing apparatus taught by Li, Ko and Cui so as to comprise a transformation unit like further taught by Ko, which is configured to non-linearly transform feature data computed by a convolution operation (i.e. the convolution operation unit). It would have been particularly obvious to apply the transformation unit so as to non-linearly transform the part of the feature data computed by the convolution operation unit used as the template feature data (i.e. as classification weights and/or regression weights). It would have been advantageous to one of ordinary skill to utilize such a transformation unit because non-linear transformations are typical in convolutional layers of neural networks, as is taught by Ko (see e.g. column 1, lines 40-55). Like described above, it would have been obvious to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. Accordingly, it follows that the storage control unit would particularly store, in the coefficient storage unit, a part of the feature data non-linearly transformed by the transformation unit as the template feature data. Li, Ko and Cui are thus further considered to teach an information processing apparatus like that of claim 3.
As per claim 4, Li does not explicitly disclose that the storage control unit is configured to convert the template feature data into the same format as the filter coefficients and store the converted template feature data in the coefficient storage unit, as is claimed. Ko nevertheless teaches converting (e.g. quantizing) neural network weights into a particular format (see e.g. column 10, lines 19-39). Accordingly, it would have been obvious to one of ordinary skill in the art, having the teachings of Li, Ko and Cui before the effective filing date of the claimed invention, to further modify the storage control unit taught by Li, Ko and Cui so as to similarly convert the template feature data into a particular format (i.e. the same format as other weights/filter coefficients) like further taught by Ko. It would have been advantageous to one of ordinary skill to utilize such a combination because it can require less storage to store the weights of the neural network, as is taught by Ko (see e.g. column 10, lines 19-39). Like described above, it would have been obvious to configure the storage control unit taught by Li and Ko so as to store, in the coefficient storage unit, such template feature data like taught by Cui. Accordingly, Li, Ko and Cui are further considered to teach, to one of ordinary skill in the art, an information processing apparatus like that of claim 4.
As per claim 5, it would have been obvious, as is described above, to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. Cui does not explicitly disclose that the coefficient storage unit is a single memory apparatus comprising a memory region configured to store the filter coefficients and a memory region configured to store the template feature data, as is required by claim 5. Ko nevertheless suggests that the coefficient storage unit can be part of a single unified memory apparatus comprising a predefined memory region for each of the weights (see e.g. column 11, lines 1-16; and column 16, line 66 – column 17, line 19). It would have been obvious to one of ordinary skill in the art, having the teachings of Li, Ko and Cui before the effective filing date of the claimed invention, to further modify the information processing apparatus taught by Li, Ko and Cui such that the coefficient storage unit can be part of a single unified memory apparatus comprising a predefined memory region for each of the weights, as is taught by Ko. It thus follows that the template feature data would be assigned to and stored in a predefined memory region, while the other filter coefficients would similarly be assigned to and stored in another predefined memory region. It would have been advantageous to one of ordinary skill to utilize such a combination because it would enable the neural network to be efficiently executed, as is suggested by Ko (see e.g. column 11, lines 1-16; and column 16, line 66 – column 17, line 19). Accordingly, the above-described combination of Li, Ko and Cui is further considered to teach an information processing apparatus like that of claim 5.
As per claim 6, it would have been obvious, as is described above, to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. Cui does not explicitly disclose that the coefficient storage unit comprises a memory apparatus configured to store the filter coefficients and a memory apparatus configured to store the template feature data, as is required by claim 6. Nevertheless, Ko suggests that a coefficient storage unit can be part of a single unified memory comprising a predefined memory region for each of the neural weights (see e.g. column 11, lines 1-16; and column 16, line 66 – column 17, line 19). Ko further discloses that, in some embodiments, the unified memory can be comprised of a plurality of memory apparatuses (e.g. banks of SRAMS), wherein the neural network weights are divided across multiple apparatuses (see e.g. column 2, lines 40-48; column 11, lines 1-16; column 19, lines 38-61; and FIG. 11). It would have been obvious to one of ordinary skill in the art, having the teachings of Li, Ko and Cui before the effective filing date of the claimed invention, to further modify the information processing apparatus taught by Li, Ko and Cui such that the coefficient storage unit can be comprised of a plurality of memory apparatuses, wherein the neural network weights are divided across multiple apparatuses, as is taught by Ko. It thus follows that the template feature data can be assigned to and stored in a predefined memory region (i.e. of a first memory apparatus), while other filter coefficients would be assigned to and stored in another predefined memory region (i.e. of another memory apparatus). It would have been advantageous to one of ordinary skill to utilize such a combination because it would enable the neural network to be efficiently executed by multiple processing cores, as is suggested by Ko (see e.g. column 2, lines 40-48; column 11, lines 1-16; column 19, lines 38-61; and FIG. 11). Accordingly, the above-described combination of Li, Ko and Cui is further considered to teach an information processing apparatus like that of claim 6.
As per claim 7, Li further teaches that the feature data is a feature map (i.e. a feature of a frame under detection), and that the template feature data (i.e. the classification and/or regression weights) comprises feature amounts in a region of a target object (i.e. as indicated in a reference frame) to be a target of tracking in the feature map (see e.g. paragraphs 0006, 0044-0047, 0049, 0065 and 0073). Like described above, it would have been obvious to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. The storage control unit would thus store in the coefficient storage unit feature amounts in a region of a target object to be a target of tracking in the feature map as the template feature data. Accordingly, the above-described combination of Li, Ko and Cui is further considered to teach an information processing apparatus like that of claim 7.
As per claim 8, Li further suggests that the convolution operation unit comprises: (i) a first convolution operation unit configured to perform a convolution operation using filter coefficients (i.e. to perform a convolution operation on a frame under detection, which is necessarily performed using filter coefficients, to extract a feature of the frame under detection) (see e.g. paragraphs 0044-0046, 0049, 0051, 0053 and 0065); (ii) a second convolution operation unit configured to perform a convolution operation between a result of a convolution operation by the first convolution operation unit and the template feature data (i.e. the classification and/or regression weights) (see e.g. paragraphs 0070-0074 and 0085); and (iii) a third convolution operation unit configured to perform a convolution operation between a result of the convolution operation by the second convolution operation unit and the filter coefficients (see e.g. paragraphs 0090-0091). As described above, it would have been obvious to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. Accordingly, the above-described combination of Li, Ko and Cui is further considered to teach that the convolution operation unit comprises: (i) a first convolution operation unit configured to perform a convolution operation using filter coefficients stored in the coefficient storage unit; (ii) a second convolution operation unit configured to perform a convolution operation between a result of a convolution operation by the first convolution operation unit and the template feature data stored in the coefficient storage unit; and (iii) a third convolution operation unit configured to perform a convolution operation between a result of the convolution operation by the second convolution operation unit and the filter coefficients stored in the coefficient storage unit. Li does not explicitly disclose that the second convolution operation unit is configured to perform a convolution operation between a result of a nonlinear transformation on a result of the convolution operation by the first convolution operation unit and the template feature data like claimed. Ko nevertheless generally teaches applying a nonlinear transformation on a result of a convolution (i.e. dot product) operation (see e.g. column 1, lines 40-55; column 7, lines 46-59; column 8, lines 36-59; and column 29, lines 16-41). It would have been obvious to one of ordinary skill in the art, having the teachings of Li, Ko and Cui before the effective filing date of the claimed invention, to further modify the information processing apparatus taught by Li, Ko and Cui so as to apply a nonlinear transformation like taught by Ko on the results of the convolution operations produced by the convolution operation unit. It thus follows that the second convolution operation unit perform a convolution operation between a result of a nonlinear transformation on a result of the convolution operation by the first convolution operation unit and the template feature data stored in the coefficient storage unit. It would have been advantageous to one of ordinary skill to utilize such a nonlinear transformation because they are typical in convolutional layers of neural networks, as is taught by Ko (see e.g. column 1, lines 40-55). Accordingly, Li, Ko and Cui are considered to teach, to one of ordinary skill in the art, an information processing apparatus like that of claim 8.
As per claim 9, Li further teaches that the information processing apparatus further comprises a detection unit configured to detect an object based, in part, on a result of the convolution operation by the third convolution operation unit (see e.g. paragraphs 0053-0054, 0060-0062, 0085 and 0090-0091). As described above, it would have been obvious to further modify the information processing apparatus taught by Li, Ko and Cui so as to apply a nonlinear transformation like taught by Ko on the results of the convolution operations produced thereby. Accordingly, it follows that the detection unit would detect the object based on a result of a nonlinear transformation performed on a result of the convolution operation by the third convolution operation unit. Li, Ko and Cui are thus further considered to teach an information processing apparatus like that of claim 9.
As per claim 10, it would have been obvious, as is described above, to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. It thus follows that the coefficient storage unit would hold the filter coefficients that are used by the first convolution operation unit taught by Li and the filter coefficients that are used by the third convolution operation unit taught by Li. Accordingly, the above-described combination of Li, Ko and Cui is further considered to teach an information processing apparatus like that of claim 10.
As per claim 11, it would have been obvious, as is described above, to modify the information processing apparatus taught by Li so as to include the integrated circuit taught by Ko to execute Li’s neural network, wherein the integrated circuit comprises, inter alia, a coefficient storage unit configured to store the filter coefficients of the neural network. Ko particularly teaches that the integrated circuit comprises a unit configured to designate a particular memory region for storing the filter coefficients (i.e. neural network weights) in the coefficient storage unit (see e.g. column 3, lines 20-29; column 16, line 66 – column 17, line 19; column 30, line 12 – column 31, line 3). As further described above, it would have been obvious to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. It thus follows that the unit described by Ko for designating a particular memory region for storing the filter coefficients in the coefficient storage unit would particularly be configured to designate a memory region for storing the particular template feature data in the coefficient storage unit. Accordingly, the above-described combination of Li, Ko and Cui is further considered to teach an information processing apparatus like that of claim 11.
As per claim 12, Li further teaches periodically updating the template feature data (i.e. by taking the bounding box image of the target object in the frame under detection as a next reference frame) (see e.g. paragraphs 0063-0064). As described above, it would have been obvious to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. It thus follows that the storage control unit would particularly be configured to periodically update the template feature data stored in the coefficient storage unit, i.e. to determine whether or not to update the template feature data and, in a case where it determines to update the template feature data, to transfer new template feature data to the coefficient storage unit. Accordingly, the above-described combination of Li, Ko and Cui is further considered to teach an information processing apparatus like that of claim 12.
Regarding claims 13 and 14, Li describes a method for detecting a target object, as is noted above, and which includes inputting each of at least two feature groups output from at least two network layers of a neural network into a detector, and whereby the detector outputs a classification result and a regression result for each group (see e.g. paragraphs 0005-0008). Li particularly teaches that such a method comprises
setting a part of previously obtained feature data as template feature data (see e.g. paragraphs 0043-0047 and 0069-0070: Li teaches extracting a feature of a “reference frame” and a feature of a “frame under detection,” wherein the reference frame indicates a target object and the frame under detection, which can occur subsequent to the reference frame in a video sequence, indicates a current frame which is to be subject to target object detection. The extracted feature of the reference frame and the extracted feature of the frame under detection are input as a feature group into a detector, which outputs a corresponding classification result and a regression result, wherein the classification result indicates a probability that a candidate box is a bounding box for the target object and the regression result indicates a position offset of the candidate box – see e.g. paragraphs 0053-0058 and 0071-0072. Li particularly discloses that such detection entails obtaining a classification weight for the detector and a regression weight for the detector based on the extracted feature of the reference frame, wherein the extracted feature of the frame under detection is then processed with the classification and regression weights to obtain the classification and regression results – see e.g. paragraphs 0073-0074, 0078 and 0081. The extracted feature of the reference frame, or the obtained classification and/or regression weight based thereon, is considered “template feature data” like claimed. Accordingly, Li teaches setting a part of previously obtained feature data, e.g. the extracted feature of the reference frame or the obtained classification and/or regression weight, as template feature data.);
aligning an order of the template feature data with an order of data referenced by the filter coefficients in the convolution computation processing (see e.g. paragraphs 0085-0087: Li discloses that the classification weight and regression weight serve as filter coefficients in a convolution operation; the classification and regression weights are each convolved with the extracted feature data of the frame under detection to obtain the classification result and regression result, respectively. Since the classification and regression weights serve as filter coefficients in the convolution operation, the order of the classification and regression weights would necessarily be aligned with an order of data referenced by filter coefficients in a convolution computation processing, in this case, with the extracted feature data of the frame under detection. Accordingly, Li further teaches aligning an order of such template feature data with an order of data referenced by filter coefficients in a convolution computation processing.); and
computing new feature data by a convolution operation between feature data and filter coefficients, and computing, by a convolution operation between feature data and the template feature data, correlation data between the feature data and the template feature data (see e.g. paragraphs 0043-0047 and 0069-0070: Like noted above, Li teaches extracting a feature of a “reference frame” and a feature of a “frame under detection,” wherein the reference frame indicates a target object and the frame under detection, which can occur subsequent to the reference frame in a video sequence, indicates a current frame which is to be subject to target object detection. Li particularly teaches that such feature extraction can be performed via a convolution operation by a convolutional layer in a neural network – see e.g. paragraphs 0049-0051 and 0065. Such a convolutional layer necessarily comprises filter coefficients, and thus Li is considered to teach computing new feature data, e.g. extracting a feature of a frame under detection, by a convolution operation between feature data and filter coefficients. Moreover, like further noted above, Li teaches that the extracted feature of the reference frame and the extracted feature of the frame under detection are input as a feature group into a detector, which outputs a corresponding classification result and a regression result, wherein the classification result indicates a probability that a candidate box is a bounding box for the target object and the regression result indicates a position offset of the candidate box – see e.g. paragraphs 0053-0058 and 0071-0072. As further noted above, Li discloses that such detection entails obtaining a classification weight for the detector and a regression weight for the detector based on the extracted feature of the reference frame, and then processing the extracted feature of the frame under detection with these classification and regression weights to obtain the classification and regression results, respectively – see e.g. paragraphs 0073-0074, 0078 and 0081. In particular, a convolution operation is performed with the classification weight on the extracted feature of the frame under detection so as to obtain the classification result, and a convolution operation is performed with the regression weight on the extracted feature of the frame under detection so as to obtain the regression result – see e.g. paragraph 0085-0087. As noted above, the extracted feature of the reference frame, or the obtained classification and/or regression weight based thereon, is considered “template feature data” like claimed. The classification result and/or regression result can be considered “correlation data” like claimed. Consequently, Li is further considered to teach computing, by a convolution operation between feature data and the template feature data, i.e. between the extracted feature of the frame under detection and the classification and/or regression weights, correlation data between the feature data and the template feature data, i.e. the classification and/or regression results.).
Li thus teaches an information processing method similar to that of claim 13. Li discloses that such teachings can be implemented via a computer software program stored on a non-transitory computer-readable storage medium (see e.g. paragraph 0196). A non-transitory computer-readable storage medium comprising a software program to implement the above-described teachings of Li is considered a non-transitory computer-readable storage medium similar to that of claim 14. However, Li does not particularly disclose that the feature data is stored in a “feature storage unit,” and that the filter coefficients and template feature data are stored in a “coefficient storage unit,” as is required by claims 13 and 14. Li further does not teach storing, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is further required by claims 13 and 14.
Ko nevertheless describes an integrated circuit for executing a machine-trained network (e.g. a convolutional neural network), wherein the integrated circuit comprises: (i) a coefficient storage unit (i.e. a weight memory and/or filter slice buffer) configured to store filter coefficients (i.e. weight values) of the machine-trained network; (ii) a feature storage unit (i.e. an activation memory and/or activation window buffer) configured to store feature data (i.e. activation values); (iii) a storage control unit (e.g. a memory control) configured to store data in the coefficient storage unit, and (iv) and an operation unit (e.g. a microprocessor, core controller and/or adder trees) configured to compute new feature data by an operation (e.g. a dot product) between feature data stored in the feature storage unit and filter coefficients stored in the filter coefficient storage unit (see e.g. column 1, lines 21-55; column 16, line 66 – column 17, line 19; column 30, lines 12-58; column 31, lines 4-44; column 32, lines 1-16; and FIG. 17).
It would have been obvious to one of ordinary skill in the art, having the teachings of Li and Ko before the effective filing date of the claimed invention, to modify the information processing method and non-transitory computer-readable storage medium taught by Li so as to execute the neural network on an integrated circuit like taught by Ko, wherein coefficients of the neural network are stored in a coefficient storage unit and feature data is stored in a feature storage unit. It would have been advantageous to one of ordinary skill to utilize such an integrated circuit because it can provide accelerated execution of the neural network, as is suggested by Ko (see e.g. column 1, lines 6-39). Li and Ko thus teach an information processing method similar to that of claim 13 and a non-transitory computer-readable storage medium like that of claim 14, but do not explicitly teach storing, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is required by claims 13 and 14.
Cui generally describes “a Siamese network tracking scheme based on PYNQ framework, which is deployed on ZYNQ platform.” (Abstract). Regarding the claimed invention, Cui particularly demonstrates that platform comprises a coefficient storage unit (i.e. input buffers) configured to store filter coefficients (e.g. weights and biases) of a neural network, wherein template feature data is stored in a first memory region (e.g. in a template image buffer or template output buffer) of the coefficient storage unit, and the first memory region is different from a second memory region (e.g. a weight buffer) that the filter coefficients are stored in the coefficient storage unit (see e.g. sections IV.A SNA_IP Core and IV.B RPNA_IP core, and Figures 2 and 3).
It would have been obvious to one of ordinary skill in the art, having the teachings of Li, Ko and Cui before the effective filing date of the claimed invention, to modify the information processing method and non-transitory computer-readable medium taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. It would have been advantageous to one of ordinary skill to utilize such a combination, because it would enable the template data to be readily accessed by processing units, as is evident from Cui (see e.g. sections IV.A SNA_IP Core and IV.B RPNA_IP core, and Figures 2 and 3). Accordingly, Li, Ko and Cui are considered to teach, to one of ordinary skill in the art, an information processing method like that of claim 13 and a non-transitory computer-readable medium like that of claim 14.
As per claim 15, Li teaches selecting and applying filter coefficients (i.e. weights of a convolutional layer) in a case where a convolution operation unit computes the new feature data (e.g. extracts a feature of a “frame under detection” via a convolution operation), and selecting and applying template feature data (e.g. an extracted feature of a reference frame, or a classification and/or regression weight based thereon) in a case where the convolution operation unit computes the correlation data (i.e. a classification result and/or regression result) (see e.g. paragraphs 0043-0044, 0049-0051, 0065 and 0085-0087). As described above, it would have been obvious to configure the storage control unit taught by Li and Ko so as to store, in a first memory region different from a second memory region that the filter coefficients are stored in the coefficient storage unit, the template feature data, as is taught by Cui. It thus follows that the selectin unit selects the second memory region (i.e. the filter coefficients) in a case where the convolution operation unit computes the new feature data, and selects the first memory region (i.e. the template feature data) in a case where the convolution operation unit computes the correlation data. Accordingly, the above-described combination of Li, Ko and Cui is further considered to teach an information processing apparatus like that of claim 15.
Response to Arguments
The Examiner acknowledges the Applicant’s amendments to claims 1, 11, 13 and 14, and addition of new claim 15.
In response to the Applicant’s amendments to the title of the invention, the objection presented in the previous Office Action to the title is respectfully withdrawn.
Regarding the notification presented in the previous Office Action that claims 1-12 invoke 35 U.S.C. § 112(f), the Applicant argues that claim 1 has since been amended to recite sufficient structure, material or acts for performing the claimed function.
The Examiner, however, respectfully disagrees. The amendments recite additional functions (e.g., in claim 1, “set a part of previously obtained feature data as template feature data”) but have not added any structure (e.g. a processor) to perform the functions. Accordingly, claims 1-12 (and now 15) are still interpreted under 35 U.S.C. § 112(f), as is indicated above.
The Applicant’s arguments concerning the 35 U.S.C. § 103 rejections presented in the previous Office Action have been considered, but are moot in view of the new grounds of rejection presented above, which are required in response to the Applicant’s amendments.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BLAINE T BASOM whose telephone number is (571)272-4044. The examiner can normally be reached Monday-Friday, 9:00 am - 5:30 pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matt Ell can be reached at (571)270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BTB/
4/27/2026
/MATTHEW ELL/Supervisory Patent Examiner, Art Unit 2141