DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f): (FP 7.30.03)
(f) ELEMENT IN CLAIM FOR A COMBINATION-An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The following is a quotation of pre-AIA 35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.
The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art. The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA 35
112, sixth paragraph, is invoked.
As explained in MPEP 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph: the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as "configured to" or "so that"; and
the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function.
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. (FP 7.30.05)
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitation(s) is/are: a weight storage unit, a continual learning unit, a filter processing unit, and a comparison unit in claim 1 and a module, a module and a module in claim 5.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph, applicant may: (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA 35 U.S.C. 112, sixth paragraph. (FP 7.30.06)
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-5 are rejected under 35 U.S.C. 101
because the claimed invention is directed to an abstract idea without significantly
more.
When considering subject matter eligibility under 35 U.S.C. 101, it must be
determined whether the claim is directed to one of the four statutory categories of
invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the
claim does fall within one of the statutory categories, the second step in the analysis is
to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A
analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined
whether or not the claims recite a judicial exception (e.g., mathematical concepts,
mental processes, certain methods of organizing human activity). If it is determined in
Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the
second prong (Step 2A, Prong 2), where it is determined whether or not the claims
integrate the judicial exception into a practical application. If it is determined at step 2A,
Prong 2 that the claims do not integrate the judicial exception into a practical
application, the analysis proceeds to determining whether the claim is a patent-eligible
application of the exception (Step 2B). If an abstract idea is present in the claim, any
element or combination of elements in the claim must be sufficient to ensure that the
claim integrates the judicial exception into a practical application, or else amounts to
significantly more than the abstract idea itself. Applicant is advised to consult the 2019
PEG for more details of the analysis.
Step 1
According to the first part of the analysis, in the instant case, claims 1-3, 4, 5 are directed to a device, method, medium a ML model. Thus, each of the claims falls within one of the four statutory categories (i.e. process, machine, manufacture, or composition of matter). Step 2A,
Step 2A, Prong 1
Following the determination of whether or not the claims fall within one of the four
categories (Step 1), it must be determined if the claims recite a judicial exception (e.g.
mathematical concepts, mental processes, certain methods of organizing human
activity) (Step 2A, Prong 1). In this case, the claims are determined to recite a judicial
exception as explained below.
Regarding Claims 1, 4 and 5 these claims recite
Claim 1 recite:
a weight storage unit that stores weights of a plurality of filters used to detect a feature of a task; a continual learning unit that trains the weights of the plurality of filters in response to an input task in continual learning; a filter processing unit that, of a plurality of filters that have learned one task, locks the weights of a predetermined proportion of the filters to prevent the predetermined proportion of the filters from being used to learn a further task and initializes the weights of other filters to use the other filters to learn a further task; and a comparison unit that compares the weights of a plurality of filters that have learned two or more tasks and extracts overlap filters having a similarity in weight equal to or greater than a predetermined threshold value as shared filters shared by tasks.
Claim 4 and 5 recite:
training weights of a plurality of filters used to detect a feature of a task in response to an input task in continual learning; of a plurality of filters that have learned one task, locking the weights of a predetermined proportion of the filters to prevent the predetermined proportion of the filters from being used to learn a further task and initializing the weights of other filters to use the other filters to learn a further task; and comparing the weights of a plurality of filters that have learned two or more tasks, leaving one of overlap filters having a similarity in weight equal to or higher than a predetermined threshold value, and initializing the weights of other filters to use the other filters to learn a further task.
The claims recite a mental process. As set forth in MPEP 2106.04(a)(2)(III)(C), “Claims can recite a mental process even if they are claimed as being performed on a computer”. These are recited at a high level such that they could be performed mentally, and they are also disclosed as a human user performing these functions, simply using a computer as a tool-see spec, [0022]-[0029] etc., Fig. 2. Thus, the claim recites abstract ideas.
Step 2A, Prong 2
Following the determination that the claims recite a judicial exception, it must be
determined if the claims recite additional elements that integrate the exception into a
practical application of the exception (Step 2A, Prong 2). In this case, after considering
all claim elements individually and as an ordered combination, it is determined that the
claims do not include additional elements that integrate the exception into a practical
application of the exception as explained below.
In Prong Two, a claim is evaluated as a whole to determine whether the recited judicial exception is integrated into a practical application of that exception. A claim is not “directed to” a judicial exception, and thus is patent eligible, if the claim as a whole integrates the recited judicial exception into a practical application of that exception. A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the judicial exception. MPEP 2106.04(d). The claims recite an abstract idea and further the claims as a whole does not integrate the recited judicial exception into a practical application of the exception. A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the judicial exception. MPEP 2106.04(d).
Regarding Claims 1, 4, 5 these claims
This limitation recites using one or more neural networks as a tool to perform an
abstract idea, which is not indicative of integration into a practical application. MPEP 2106.05(f).)
This limitation is understood to be generic computer equipment and mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.0S(f))
Step 2B
Based on the determination in Step 2A of the analysis that the claims are
directed to a judicial exception, it must be determined if the claims contain any element
or combination of elements sufficient to ensure that the claim amounts to significantly
more than the judicial exception (Step 2B). In this case, after considering all claim
elements individually and as an ordered combination, it is determined that the claims do
not include additional elements that are sufficient to amount to significantly more than
the judicial exception for the same reasons given above in the Step 2A, Prong 2
analysis. Furthermore, each additional element identified above as being insignificant
extra-solution activity is also well-known, routine, conventional as described below.
Claims 1, 4 and 5: The claims do not include additional elements, alone or in combination, that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than generic computing components and field of use/technological environment which do not amount to significantly more than the abstract idea. The underlying concept merely receives information, analyzes it, and store the results of the analysis – this concept is not meaningfully different than concepts found by the courts to be abstract (see Electric Power Group, collecting information, analyzing it, and displaying certain results of the collection and analysis; see Cybersource, obtaining and comparing intangible data; see Digitech, organizing information through mathematical correlations; see Grams, diagnosing an abnormal condition by performing clinical tests and thinking about the results; see Cyberfone, using categories to organize store and transmit information; see Smartgene, comparing new and stored information and using rules to identify options).
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as a combination do not amount to significantly more than the abstract idea. For example, claim 1 recites the additional elements of “stores…” “trains…” and “locks… and initializes”.., and “compares…” and claim 4 and 5 recite ““training…” and “locking… and initializing”.., and “comparing…” These elements are recited at a high level of generality and are well-understood, routine, and conventional activities in the computer art. Generic computers performing generic computer functions, without an inventive concept, do not amount to significantly more than the abstract idea. Looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims do not amount to significantly more than the abstract idea itself.
Step 2A/2B Prong 2 Dependent Claims
Regarding to claim 2
Claim 2 merely recite other additional elements that define filters used which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Regarding to claim 3
Claim 3 merely recite other additional elements that define training initialized weights of filters other than the shared filters which performing generic functions that when looking at the elements as a combination does not add anything more than the elements analyzed individually. Therefore, these claims also do not amount to significantly more than the abstract idea itself. These claims are not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-5 are rejected under 35 U.S.C. 103 as being unpatentable over Mallya et al. (Mallya) “PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning”, Computer Vision and Pattern Recognition (cs.CV), 13, May, 2018. arXiv:1711.05769; https://doi.org/10.48550/arXiv.1711.05769 in view of Takayuki et al. (Takayuki) JP2018055259
In regard to claim 1, Mallya disclose A machine learning device comprising: (abstract, a single deep NN)
a weight storage unit that stores weights of a plurality of filters used to detect a feature of a task; (page 2-4, Fig. 1, storage to store the weights of filters in learning for an input task)
a continual learning unit that trains the weights of the plurality of filters in response to an input task in continual learning; (2. Related work and 3 Approach, 4. Experiments and Results, page 2-4, Fig. 1, training the weights of filters in learning for an input task)
a filter processing unit that, of a plurality of filters that have learned one task, locks the weights of a predetermined proportion of the filters to prevent the predetermined proportion of the filters from being used to learn a further task and initializes the weights of other filters to use the other filters to learn a further task; (2. Related work and 3 Approach, 4. Experiments and Results, page 2-4, Fig. 1, learning weights of a plurality of filters for task I and fix the weights of a prescribed proportion of filters for task I and retained for task I and kept fixed for the remainder of the method and not eligible for pruning and initialize the weights of the remainder of the filters to use the filters for learning task II after learning task I. It would be obvious to a person skilled in the art that the method is performed by a device provided with a memory unit)
But Mallya fail to explicitly disclose “and a comparison unit that compares the weights of a plurality of filters that have learned two or more tasks and extracts overlap filters having a similarity in weight equal to or greater than a predetermined threshold value as shared filters shared by tasks.”
Takayuki disclose and a comparison unit that compares the weights of a plurality of filters that have learned two or more tasks and extracts overlap filters having a similarity in weight equal to or greater than a predetermined threshold value as shared filters shared by tasks. (“FIG. 9C is a flowchart showing an outline of processing executed in each functional block of the NN learning device 50 of the present embodiment. In the flowchart of FIG. 9C, in the NN setting step S210, the structure of the neural network and the number of parameters that the NN setting unit 501 learns are determined. Here, it is not necessary to determine the number of partial kernels and the number of filters to be shared as in the first embodiment. The set neural network structure and the number of parameters are transmitted to the parameter initial value setting unit 503.
Next, in the parameter initial value setting step S220, the parameter initial value setting unit 503 determines initial values of the parameters of the neural network set in the NN setting step S210. A random value may be sufficient and a user may determine. The set parameter initial value is transmitted to the parameter optimization unit 504.
Next, in the parameter optimization step S230, the parameter optimization unit 504 optimizes the network parameters using the learning data and GT (correct answer value) held in the learning data holding unit 507. As in the first embodiment, Back Propagation may be used as the learning algorithm. Here, the kernel is not shared. The optimized network structure and parameters are transmitted to the kernel selector 502.
Next, in the kernel selection step S240, the kernel selection unit 502 shares a kernel or a filter in the network optimized in the parameter optimization step S230. There are the following two common methods. In the first method, as shown in FIG. 11, a filter or partial kernel having a high similarity is paired. FIG. 11 shows an example in which the filters of the partial kernel 2001 and the partial kernel 2009 are paired, and an example in which the partial kernel 2006 and the partial kernel 2012, the partial kernels 2015 and 2016, and the partial kernels 2022 and 2023 are paired. ing. The similarity may be obtained by using a correlation of weight coefficient matrices, a correlation of output values when the Convolution kernel is convolved, a correlation of output values after nonlinear processing such as relu performed thereafter, and the like.
In the second method, as shown in FIG. 12, the parameter-optimized network is clustered in the filter coefficient space. For example, when two-dimensional filters are shared as one unit, clustering may be performed in a (filter size) × (filter size) -dimensional space (for example, 3 × 3 = 9 dimensions). Further, when sharing a partial kernel as one unit, clustering may be performed in a (filter size) × (filter size) × (number of channels) dimensional space (for example, 3 × 3 × 64 = 576 dimensions). .
Next, in the parameter setting step S250, the parameter setting unit 508 sets the parameters of the kernel paired in the kernel selection step S240 or belonging to the same cluster. For example, processing may be performed such as averaging the weight W values of the common kernels, selecting one representative kernel, and replacing all kernels with the W values of the kernels. The set parameters of each kernel are transmitted to the additional learning unit 509.
Next, in the additional learning step S260, the additional learning processing unit 509 performs additional learning using the learning data using the parameters set in the parameter setting step as initial values of the network, and optimizes the parameters again. The additional learning may be performed in the same manner as the parameter optimization step S230 described above, but it is better to lower the learning rate of the network than during the previous parameter optimization step S230. The additionally learned network structure and parameters are transmitted to the NN parameter holding unit 506 and held.” during learning tasks using a NN, the degree of similarity among filters is determined from a correlation among weighting factor matrices and that a plurality of filters with a high degree of similarity are set as a shared filter for learning)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Takayuki‘s machine learning method into Mallya’s invention as they are related to the same field endeavor of model training and learning. The motivation to combine these arts, as proposed above, at least because Takayuki‘s identifying shared filter based on the similarity in the ML would help to provide more shared filter to Mallya’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that providing more shared filter in training ML model would help to improve accuracy of prediction precision and reduce memory usage.
In regard to claim 2, Mallya and Takayuki disclose The machine learning device according to claim 1,
But Mallya failed to explicitly disclose “wherein the comparison unit leaves one of the overlap filters as the shared filter and initializes the weights of filters other than the shared filter.”
Takayuki disclose wherein the comparison unit leaves one of the overlap filters as the shared filter and initializes the weights of filters other than the shared filter. (
“ Next, in the kernel selection step S120, the kernel selection unit 502 selects a partial kernel to be shared according to the number M of partial kernels set in the NN setting step S110. For example, in FIG. 10, the partial kernel 2001, the partial kernel 2010, and the partial kernel 2015 are shared. Common means that the weighting factors of the kernels used for convolution are the same value, and memory saving is achieved by using exactly the same partial kernels. A weighting coefficient learning method (optimization method) in the case of common use will be described later. Decide which partial kernel to assign to each of the M partial kernels. That is, all partial kernels are represented by M partial kernels. Partial kernels may be selected across layers (eg, selection of partial kernels 2001 and 2010) or within a layer (eg, selection of partial kernels 2015 and 2016). Then, the selected kernel information is transmitted to the parameter initial value setting unit 503.
Next, in the parameter initial value setting step S130, the parameter initial value setting unit 503 determines the initial values of the parameters of the network set in the NN setting step S110 and the kernel selection step S120. At this time, the parameters of the partial kernels shared in the kernel selection step S120 are set to have the same initial value. The initial value setting method may be a random value, or may be determined by a user by a predetermined method. The set parameter structure and its initial value are transmitted to the parameter optimization unit 504.
Next, in the parameter optimization step S140, the parameter optimization unit 504 optimizes the parameters in the network using the learning data and GT (correct answer value) held in the learning data holding unit 507. Back propagation may be used to optimize the parameters. In the present embodiment, the weights of the kernels selected in the kernel selection step S120 need to take the same value. For this reason, the update value ΔW for each weight W is calculated by normal Back Propagation, and then updated so that each weight W of the shared kernel becomes the same value. For example, it may be updated using the sum or average of ΔW with respect to the weight W in the common kernel, the maximum value, the median value, and the like. The parameters and network structures optimized after updating the parameters for the number of learning times (number of epochs) set in advance are transmitted to and held in the NN parameter holding unit 506.” select one of the common filters as the shared filter and initializes the weights of filters other than the shared filter since all weights of filters other than the shared filter are initiated to the weight value of the shared filter or can be determined by a user)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Takayuki‘s machine learning method into Mallya’s invention as they are related to the same field endeavor of model training and learning. The motivation to combine these arts, as proposed above, at least because Takayuki‘s identifying shared filter based on the similarity in the ML would help to provide more shared filter to Mallya’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that providing more shared filter in training ML model would help to improve accuracy of prediction precision and reduce memory usage.
In regard to claim 3, Mallya and Takayuki disclose The machine learning device according to claim 2,
Mallya disclose wherein the continual learning unit trains initialized weights of filters other than the shared filter in response to a further task in continual learning. (2. Related work and 3 Approach, 4. Experiments and Results, page 2-4, Fig. 1, training initialized weights of filters other than the fixed filter for learning task II after learning task I of learning task III after learning task I, II, see Fig.1)
In regard to claim 4. Mallya disclose A machine learning method comprising:
Mallya disclose training weights of a plurality of filters used to detect a feature of a task in response to an input task in continual learning; of a plurality of filters that have learned one task, (2. Related work and 3 Approach, 4. Experiments and Results, page 2-4, Fig. 1, training the weights of filters in learning for an input task and filters have learned task I, see Fig. 1)
locking the weights of a predetermined proportion of the filters to prevent the predetermined proportion of the filters from being used to learn a further task and initializing the weights of other filters to use the other filters to learn a further task; (2. Related work and 3 Approach, 4. Experiments and Results, page 2-4, Fig. 1, learning weights of a plurality of filters for task I and fix the weights of a prescribed proportion of filters for task I and retained for task I and kept fixed for the remainder of the method and not eligible for pruning and initialize the weights of the remainder of the filters to use the filters for learning task II after learning task I. It would be obvious to a person skilled in the art that the method is performed by a device provided with a memory unit)
and initializing the weights of other filters to use the other filters to learn a further task. (2. Related work and 3 Approach, 4. Experiments and Results, page 2-4, Fig. 1, initialize weights of filters other than the fixed filter for learning task II after learning task I of learning task III after learning task I, II, see Fig.1)
But Mallya failed to explicitly disclose “and comparing the weights of a plurality of filters that have learned two or more tasks, leaving one of overlap filters having a similarity in weight equal to or higher than a predetermined threshold value,”
Takayuki disclose and comparing the weights of a plurality of filters that have learned two or more tasks, (“FIG. 9C is a flowchart showing an outline of processing executed in each functional block of the NN learning device 50 of the present embodiment. In the flowchart of FIG. 9C, in the NN setting step S210, the structure of the neural network and the number of parameters that the NN setting unit 501 learns are determined. Here, it is not necessary to determine the number of partial kernels and the number of filters to be shared as in the first embodiment. The set neural network structure and the number of parameters are transmitted to the parameter initial value setting unit 503.
Next, in the parameter initial value setting step S220, the parameter initial value setting unit 503 determines initial values of the parameters of the neural network set in the NN setting step S210. A random value may be sufficient and a user may determine. The set parameter initial value is transmitted to the parameter optimization unit 504.
Next, in the parameter optimization step S230, the parameter optimization unit 504 optimizes the network parameters using the learning data and GT (correct answer value) held in the learning data holding unit 507. As in the first embodiment, Back Propagation may be used as the learning algorithm. Here, the kernel is not shared. The optimized network structure and parameters are transmitted to the kernel selector 502.
Next, in the kernel selection step S240, the kernel selection unit 502 shares a kernel or a filter in the network optimized in the parameter optimization step S230. There are the following two common methods. In the first method, as shown in FIG. 11, a filter or partial kernel having a high similarity is paired. FIG. 11 shows an example in which the filters of the partial kernel 2001 and the partial kernel 2009 are paired, and an example in which the partial kernel 2006 and the partial kernel 2012, the partial kernels 2015 and 2016, and the partial kernels 2022 and 2023 are paired. ing. The similarity may be obtained by using a correlation of weight coefficient matrices, a correlation of output values when the Convolution kernel is convolved, a correlation of output values after nonlinear processing such as relu performed thereafter, and the like.” the degree of similarity among filters is determined from a correlation among weighting factor matrices and that a plurality of filters with a high degree of similarity during training)
leaving one of overlap filters having a similarity in weight equal to or higher than a predetermined threshold value, (“ Next, in the kernel selection step S120, the kernel selection unit 502 selects a partial kernel to be shared according to the number M of partial kernels set in the NN setting step S110. For example, in FIG. 10, the partial kernel 2001, the partial kernel 2010, and the partial kernel 2015 are shared. Common means that the weighting factors of the kernels used for convolution are the same value, and memory saving is achieved by using exactly the same partial kernels. A weighting coefficient learning method (optimization method) in the case of common use will be described later. Decide which partial kernel to assign to each of the M partial kernels. That is, all partial kernels are represented by M partial kernels. Partial kernels may be selected across layers (eg, selection of partial kernels 2001 and 2010) or within a layer (eg, selection of partial kernels 2015 and 2016). Then, the selected kernel information is transmitted to the parameter initial value setting unit 503.
Next, in the parameter initial value setting step S130, the parameter initial value setting unit 503 determines the initial values of the parameters of the network set in the NN setting step S110 and the kernel selection step S120. At this time, the parameters of the partial kernels shared in the kernel selection step S120 are set to have the same initial value. The initial value setting method may be a random value, or may be determined by a user by a predetermined method. The set parameter structure and its initial value are transmitted to the parameter optimization unit 504.
Next, in the parameter optimization step S140, the parameter optimization unit 504 optimizes the parameters in the network using the learning data and GT (correct answer value) held in the learning data holding unit 507. Back propagation may be used to optimize the parameters. In the present embodiment, the weights of the kernels selected in the kernel selection step S120 need to take the same value. For this reason, the update value ΔW for each weight W is calculated by normal Back Propagation, and then updated so that each weight W of the shared kernel becomes the same value. For example, it may be updated using the sum or average of ΔW with respect to the weight W in the common kernel, the maximum value, the median value, and the like. The parameters and network structures optimized after updating the parameters for the number of learning times (number of epochs) set in advance are transmitted to and held in the NN parameter holding unit 506.” and “FIG. 9C is a flowchart showing an outline of processing executed in each functional block of the NN learning device 50 of the present embodiment. In the flowchart of FIG. 9C, in the NN setting step S210, the structure of the neural network and the number of parameters that the NN setting unit 501 learns are determined. Here, it is not necessary to determine the number of partial kernels and the number of filters to be shared as in the first embodiment. The set neural network structure and the number of parameters are transmitted to the parameter initial value setting unit 503.
Next, in the parameter initial value setting step S220, the parameter initial value setting unit 503 determines initial values of the parameters of the neural network set in the NN setting step S210. A random value may be sufficient and a user may determine. The set parameter initial value is transmitted to the parameter optimization unit 504.
Next, in the parameter optimization step S230, the parameter optimization unit 504 optimizes the network parameters using the learning data and GT (correct answer value) held in the learning data holding unit 507. As in the first embodiment, Back Propagation may be used as the learning algorithm. Here, the kernel is not shared. The optimized network structure and parameters are transmitted to the kernel selector 502.
Next, in the kernel selection step S240, the kernel selection unit 502 shares a kernel or a filter in the network optimized in the parameter optimization step S230. There are the following two common methods. In the first method, as shown in FIG. 11, a filter or partial kernel having a high similarity is paired. FIG. 11 shows an example in which the filters of the partial kernel 2001 and the partial kernel 2009 are paired, and an example in which the partial kernel 2006 and the partial kernel 2012, the partial kernels 2015 and 2016, and the partial kernels 2022 and 2023 are paired. ing. The similarity may be obtained by using a correlation of weight coefficient matrices, a correlation of output values when the Convolution kernel is convolved, a correlation of output values after nonlinear processing such as relu performed thereafter, and the like.” identify one of the common filters as a shared filter such as the degree of similarity among filters is determined from a correlation among weighting factor matrices and that filters with a high degree of similarity are set as the shared filter for learning and initializes the weights of filters other than the shared filter since all weights of filters other than the shared filter are initiated to the weight value of the shared filter or can be determined by a user)
It would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made to incorporate Takayuki‘s machine learning method into Mallya’s invention as they are related to the same field endeavor of model training and learning. The motivation to combine these arts, as proposed above, at least because Takayuki‘s identifying shared filter based on the similarity in the ML would help to provide more shared filter to Mallya’s system. Therefore it would have been obvious to one having ordinary skill in the art before the effective filing data of the claimed invention was made that providing more shared filter in training ML model would help to improve accuracy of prediction precision and reduce memory usage.
In regard to claim 5, claim 5 is a medium claim corresponding to the method claim 4 above and, therefore, is rejected for the same reasons set forth in the rejections of claim 4.
Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure.
U.S. Patent Documents PATENT DATE INVENTOR(S) TITLE
US 12061988 B1 2024-08-13 Sather et al.
Decomposition Of Ternary Weight Tensors
Some embodiments provide a method for training parameters of a network. The method receives a network with layers of nodes. Each node of a set of the layers computes an output value based on a set of input values and a set of trained weight values. A first layer of the network includes a first number of filters. The method replaces the first layer with a second layer having a second number of filters that is less than the first number and a third layer, following the second layer, having the first number of filters. Each weight value in the filters of the second and third layers is restricted to a set of allowed quantized weight values. A total number of weight values in the filters of the second and third layers is less than a total number of weight values in the filters of the first layer…. See abstract.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XUYANG XIA whose telephone number is (571)270-3045. The examiner can normally be reached Monday-Friday 8am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Welch can be reached at 571-272-7212. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
XUYANG XIA
Primary Examiner
Art Unit 2143
/XUYANG XIA/Primary Examiner, Art Unit 2143