Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Examiner’s Note
The Examiner encourages Applicant to schedule an interview to discuss issues related to, for example, the rejections noted below under 35 U.S.C § 101 and 103.
Providing supporting paragraph(s) for each limitation of amended/new claim(s) in Remarks is strongly requested for clear and definite claim interpretations by Examiner.
For clarification, claim 13 may be amended (e.g., “non-transitory one or more computer readable storage media”) based on par 17 “A computer readable storage medium or media, as used herein, is not to be construed as being transitory signals per se”.
Priority
Acknowledgment is made of applicant's claim for the present application filed on 02/07/2022.
Response to Arguments
Applicant's arguments filed on 03/03/2026 have been fully considered but they are not persuasive.
In Remarks, pp. 7-13, Applicant contends:
The Office Action, on page 6, alleges that claim l's limitations are related to "pruning", "linearizing", and "optimizing" fall within the mental process or a mathematical concept grouping of abstract ideas. However, Applicant respectfully disagrees for the reasons set forth.
…
These features define a specific technological process for controlling a manufacturing environment using an improved neural network model that result in sub-optimal production levels. Accordingly, independent claim I is not directed to a judicial exception under Step 2A, Prong One.
…
Consistent with this conclusion and stated in MPEP §2106.04(d)(J), claims are not abstract when they are "directed to an improvement in computer capabilities." When evaluated as a whole, claim I is directed to a specific improvement in a technical field, namely the training of neural networks, including deep neural networks, rather than to a generalized or abstract idea.
…
Further, the claimed method produces a tangible real-world effect. The final step requires "changing, by the computing device, operation inputs in the manufacturing environment to match the predicted inputs," which implies that the results of the neural network processing are used to modify operational parameters of a physical manufacturing system.
Thus, the claim is not directed to generating information for display or analysis alone. Instead, the claimed method uses the modified neural network to actively control and adjust a technological process. The integration of sensor data processing, structural model modification, optimization, and control of operation inputs constitutes a specific application in the field of industrial automation
Examiner’s response:
The examiner understands the applicant’s assertion.
However, it appears that each processing step is just applying the abstract idea to a general field of endeavor with additional elements. In addition, improvements to technology or technical field are not necessarily reflected in the claims. Thus, the claim does not integrate the judicial exception into a practical application, and the claim does not amount to significantly more than the judicial exception.
The examiner understands the applicant’s assertion “The Office Action, on page 6, alleges that claim 1's limitations are related to "pruning", "linearizing", and "optimizing" fall within the mental process or a mathematical concept grouping of abstract ideas. However, Applicant respectfully disagrees for the reasons set forth” and “These features define a specific technological process for controlling a manufacturing environment using an improved neural network model that result in sub-optimal production levels. Accordingly, independent claim 1 is not directed to a judicial exception under Step 2A, Prong One.”
However, note that each limitation describes each functionality in a very high level without details. In other words, each limitation just states what it does, but it doesn’t mention how it does for each functionality. For example, the pruning step has been amended, but, under its broadest reasonable interpretation, it still covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitation in the context of this claim still encompasses the user mentally thinking with a physical aid (e.g., pencil and paper). Providing details of how it does may help overcome the current rejections.
In addition, it is not clear how the features define a specific technological process for controlling a manufacturing environment using a neural network model that result in sub-optimal production levels. The last limitation says changing operation inputs, but it is not clear how predicted inputs are used to control the manufacturing environment using the neural network model that result in sub-optimal production levels. Note that “operation inputs” and “manufacturing environment” may be interpreted broadly under a broadest reasonable interpretation (BRI) as well.
The examiner also understands the applicant’s assertions “directed to an improvement in computer capabilities” and “Further, the claimed method produces a tangible real-world effect. The final step requires "changing, by the computing device, operation inputs in the manufacturing environment to match the predicted inputs," which implies that the results of the neural network processing are used to modify operational parameters of a physical manufacturing system” and “The integration of sensor data processing, structural model modification, optimization, and control of operation inputs constitutes a specific application in the field of industrial automation”.
However, the last limitation is recited in a very high level, and does not provide details of how the prediction results are used to change the operation inputs. As the Applicant pointed out, par 15 states “Aspects of the invention provide an improvement in the field of manufacturing environments by providing a technical solution to the problem of dynamic manufacturing environments with changing states that result in sub-optimal production levels.” However, the claim doesn’t recite “states” and does not say how the states are used to control the production levels in the manufacturing environment. In other words, it is not clear how the prediction results are used to control the production levels in the manufacturing environment by modifying operational parameters of a physical manufacturing system in the field of industrial automation.
The limitations still do not clearly show e.g., improvements in computer technology and improvements to other technical fields. Rather, the improvements in Remarks are about just improving the abstract ideas of the independent claims. It doesn’t seem that the independent claims clearly show how the inventive concept of the claims enables improvements and how they are tied together. The applicant may need to amend the claims to show how the claim languages and improvements are tied together.
To find a valid improvement to a technology, MPEP 2106.04(d)(1) says the specification must explain the improvement and that the claim must reflect the disclosed improvement. Furthermore, the improvement should not be merely a consequence of the abstract idea. See MPEP 2106.05(a). An improvement in the abstract idea itself is not an improvement to technology.
For at least these reasons, Applicant's arguments are not convincing.
The Examiner encourages Applicant to schedule an interview to discuss issues related to, for example, the rejections noted below under 35 U.S.C § 101.
Applicant’s arguments regarding 35 USC § 103 with respect to the independent claims have been considered but are moot because the arguments are directed to amended limitation(s) that has/have not been previously examined.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-8,11-18 and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding claim 1
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:
The limitations of
“…, comprising:
…;
…;
…;
pruning, …, the deep learning network, wherein pruning the deep learning network comprises removing redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information;
predicting, …, an output of the pruned deep learning network from the inputs of the manufacturing environment;
…;
…; and
changing, …, operation inputs in the manufacturing environment to match the calculated predicted inputs”, as drafted, under its broadest reasonable interpretation, covers performance of the limitation in the mind. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, the limitations in the context of this claim encompass the user mentally thinking with a physical aid (e.g., pencil and paper). Note that “operation inputs” and “manufacturing environment” may be interpreted broadly under a broadest reasonable interpretation (BRI).
The limitation(s) of
“linearizing, …, the pruned deep learning network;
optimizing, …, an output of the linearized pruned deep learning network to calculate predicted inputs for the manufacturing environment;”, as drafted, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
The claim recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). In particular, the claim recites an additional element(s) (“by a computing device”, “by the computing device”, “by the computing device and using the pruned deep learning network”) – using a device and a model to process data. The device and the model in each step are recited at a high-level of generality (i.e., as a generic computer performing a generic computer function of processing data) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“receiving, by a computing device, data from sensors in a manufacturing environment”) – the act of receiving data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of receiving data is recited at a high-level of generality (i.e., as a generic act of receiving performing a generic act function of receiving data) such that it amounts no more than a mere act to apply the exception using a generic act of receiving. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“mapping, by the computing device, the data into a deep learning network”) – the act of providing (i.e. inputting) data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of inputting data is recited at a high-level of generality (i.e., as a generic act of inputting performing a generic act function of inputting data) such that it amounts no more than a mere act to apply the exception using a generic act of inputting. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
In particular, the claim recites an additional element(s) (“learning, by the computing device, correlations between inputs and outputs of the manufacturing environment using the data”). The additional element is recited at such a high level without any details as to how a model is trained such that it amounts to only the idea of a solution or outcome because it fails to recite details of how a solution to a problem is accomplished, and, therefore, represents no more than mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
As discussed above, with respect to integration of the abstract idea into a practical application, the additional elements of using a generic computer component to perform each step amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible. MPEP 2106.05(f).
As discussed above, the claim recites the additional element(s) of receiving data at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g). However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network” or “Storing and retrieving information in memory”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.
As discussed above, the claim recites the additional element(s) of inputting data at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g) – “Mere Data Gathering”. However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network” or “Storing and retrieving information in memory”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.
The additional elements regarding training are recited at such a high level without any details as to how a model is trained such that it amounts to only the idea of a solution or outcome because it fails to recite details of how a solution to a problem is accomplished, and, therefore, represents no more than mere instructions to apply the judicial exception on a computer (see MPEP 2106.05(f)). Accordingly, this additional element does not amount to significantly more than the abstract idea. The claim is directed to an abstract idea.
Regarding claim 2
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
In particular, the claim recites an additional element (“wherein the sensors are based on supervisory control and data acquisition (SCADA) architecture”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).
Regarding claim 3
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
In particular, the claim recites an additional element (“wherein the sensors are based on data acquisition (DAQ) architecture”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).
Regarding claim 4
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
In particular, the claim recites an additional element (“wherein the deep learning network is a recurrent neural network (RNN) network”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).
Regarding claim 5
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
In particular, the claim recites an additional element (“wherein the RNN network is a long-short term memory (LSTM) network”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).
Regarding claim 6
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:
The limitations of
“further comprising linearizing the deep learning network by replacing a rectified linear unit (ReLU) activation function with a set of equivalent linear equations to the deep learning network in response to the deep learning network being a RNN network”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim does not recite additional elements. Thus, the claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Thus, the claim is not patent eligible.
Regarding claim 7
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:
The limitations of
“further comprising:
linearizing the deep learning network by replacing a tanh activation function with a piecewise linear function (PLU) activation function; and
reformulating the PLU activation function into a set of equivalent linear equations to the deep learning network in response to the deep learning network being a LSTM network”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim does not recite additional elements. Thus, the claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Thus, the claim is not patent eligible.
Regarding claim 8
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1:
The limitations of
“further comprising linearizing the deep learning network by replacing a bilinear term in the deep learning network by a McCormick envelope”, as drafted, are a process that, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations. That is, nothing in the claim element precludes the step from practically being performed based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation based on mathematical relationships and/or mathematical formulas or equations and/or mathematical calculations, but for the recitation of generic computer components, then it falls within the “Mathematical concepts” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
Step 2A Prong 2: This judicial exception is not integrated into a practical application. In particular, the claim does not recite additional elements. Thus, the claim is directed to an abstract idea.
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. Thus, the claim is not patent eligible.
Regarding claim 11
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
In particular, the claim recites an additional element (“wherein the manufacturing environment is a dynamic manufacturing environment”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).
Regarding claim 12
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a method; therefore, it falls into the statutory category of processes.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 1.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
In particular, the claim recites an additional element (“wherein the computing device includes software provided as a service in a cloud environment”). This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not integrate the abstract idea into a practical application. See MPEP 2106.05(h)
Step 2B: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
This is a recitation of a particular type or source of model/data to be used in performing the abstract idea. Limiting the abstract idea to a particular type or source of model/data is an attempt to limit the abstract idea to a particular field of use or technological environment, which does not amount to significantly more than the abstract idea. See MPEP 2106.05(h).
Regarding claim 13
The claim recites “A computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to:” to perform precisely the method of Claim 1. As performance of an abstract idea on generic computer components (see MPEP 2106.05(f)) and “Storing and retrieving information in memory” (see MPEP 2106.05(g) on Insignificant Extra-Solution Activity, and MPEP 2106.05(d) on Well-Understood, Routine, Conventional Activity) cannot integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself, the claim is rejected for reasons set forth in the rejection of Claim 1.
Regarding claim 14
The claim is rejected for the reasons set forth in the rejection of Claim 4 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.
Regarding claim 15
The claim is rejected for the reasons set forth in the rejection of Claim 5 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.
Regarding claim 16
The claim is rejected for the reasons set forth in the rejection of Claim 2 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.
Regarding claim 17
The claim recites “A system comprising: a processor, a computer readable memory, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to:” to perform precisely the method of Claim 1. As performance of an abstract idea on generic computer components (see MPEP 2106.05(f)) and “Storing and retrieving information in memory” (see MPEP 2106.05(g) on Insignificant Extra-Solution Activity, and MPEP 2106.05(d) on Well-Understood, Routine, Conventional Activity) cannot integrate the abstract idea into a practical application nor provide significantly more than the abstract idea itself, the claim is rejected for reasons set forth in the rejection of Claim 1.
Regarding claim 18
The claim is rejected for the reasons set forth in the rejection of Claim 5 under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without integrating the judicial exception into a practical application nor providing significantly more than the judicial exception.
Regarding claim 20
The claim is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Step 1: The claim recites a system; therefore, it falls into the statutory category of a machine.
Step 2A Prong 1: The claim recites the abstract idea identified above regarding claim 17.
Step 2A Prong 2: This judicial exception is not integrated into a practical application.
In particular, the claim recites an additional element(s) (“inputting the predicted inputs into components of the dynamic manufacturing environment”) – the act of providing (i.e. inputting) data. The claim is adding an insignificant extra-solution activity to the judicial exception – see MPEP 2106.05(g). The act of inputting data is recited at a high-level of generality (i.e., as a generic act of inputting performing a generic act function of inputting data) such that it amounts no more than a mere act to apply the exception using a generic act of inputting. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
As discussed above, the claim recites the additional element(s) of inputting data at a high-level of generality and is adding an insignificant extra-solution activity – see MPEP 2106.05(g) – “Mere Data Gathering”. However, the addition of insignificant extra-solution activity does not amount to an inventive concept, particularly when the activity is well-understood, routine, and conventional. See MPEP 2106.05(d)(II) – “Receiving or transmitting data over a network” or “Storing and retrieving information in memory”. Accordingly, this additional element does not provide an inventive concept and significantly more than the abstract idea. Thus, the claim is not patent eligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1, 4-5, 11, 13-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (Online cement clinker quality monitoring: A soft sensor model based on multivariate time series analysis and CNN) in view of Cox et al. (Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory) in view of Han et al. (Learning both Weights and Connections for Efficient Neural Networks) in view of Ma et al. (US20210201155A1)
Regarding claim 1
Zhao teaches
A method, comprising:
receiving, by a computing device, data from sensors in a manufacturing environment;
(Zhao [fig(s) 1-2] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc. And the sampling interval of these on-line sensors is generally 5 s. Specially, the raw material quality parameters are sampled and tested manually every two hours. And the raw quality test sample is hybrid multiplex sample, which is obtained by taking samples from the production line at equal intervals within 2 h and then mixed. When the latest three-rate values are obtained by manual test, the quality parameters of the raw material being processed at that time and the follow-on raw material to be processed are considered to be the same as the latest measured values. In this way, the information of the raw material quality parameters will be as timely and accurate as possible in the f-CaO content monitoring. Besides, f-CaO content is manually sampled and tested with the interval of 1 h.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.2] “The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.”;)
mapping, by the computing device, the data into a deep learning network;
(Zhao [fig(s) 1-2] [table(s) 8-9] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc. And the sampling interval of these on-line sensors is generally 5 s. Specially, the raw material quality parameters are sampled and tested manually every two hours. And the raw quality test sample is hybrid multiplex sample, which is obtained by taking samples from the production line at equal intervals within 2 h and then mixed. When the latest three-rate values are obtained by manual test, the quality parameters of the raw material being processed at that time and the follow-on raw material to be processed are considered to be the same as the latest measured values. In this way, the information of the raw material quality parameters will be as timely and accurate as possible in the f-CaO content monitoring. Besides, f-CaO content is manually sampled and tested with the interval of 1 h.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.3] “Therefore, the predicted results of the MVTS–CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models.”;)
learning, by the computing device, correlations between inputs and outputs of the manufacturing environment using the data;
(Zhao [fig(s) 1-2] [table(s) 8-9] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process. The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.” [sec(s) 5.3] “Therefore, the predicted results of the MVTS–CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models.”;
Examiner notes that paragraph 97 of the Instant Specification describes “the prediction module 430 learns the correlations by inputting the data from the sensors 460 into the RNN network shown in Ex. (2) and receiving an output from the RNN network. In embodiments, over time, the prediction module 430 learns which inputs correlate to which outputs.”)
(Note: Hereinafter, if a limitation has bold brackets (i.e. [·]) around claim languages, the bracketed claim languages indicate that they have not been taught yet by the current prior art reference but they will be taught by another prior art reference afterwards.)
predicting, by the computing device and using the [pruned] deep neural network, an output of the [pruned] deep neural network from the inputs of the manufacturing environment;
(Zhao [fig(s) 1-2] [table(s) 8-9] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process. The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.” [sec(s) 5.3] “Therefore, the predicted results of the MVTS–CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models.”;)
optimizing, by the computing device, an output of the [linearized pruned] deep neural network to calculate predicted inputs for the manufacturing environment; and
(Zhao [fig(s) 3] “Determining action time distribution period” [table(s) 8-9] [sec(s) 1] “This paper proposed a soft sensor model based on multivariate time series analysis and convolutional neural network (MVTS– CNN) for the online f-CaO content monitoring.” [sec(s) 4] “Taking the time series within the active duration distribution range as input data can increase the model’s applicability in different production conditions. In order to determine the active duration distribution range of each variable, a multivariate time series analysis method founded on the time delay range and the longest active duration is proposed.” [sec(s) 4.1] “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method. And in the third part, the processed time series X˙(k) = [X˙1(k), X˙2(k), . . . , X˙11(k), X˙12(k)] with different time length will be compressed into a uniform series length by different mean filters and formed into a new time series matrix XR as the input of CNN.” See also [sec(s) 4.2] [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process.”; e.g., “training” read(s) on “optimizing”. In addition, e.g., “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method” read(s) on “calculate predicted inputs”.)
changing, by the computing device, operation inputs in the manufacturing environment to match the calculated predicted inputs.
(Zhao [fig(s) 3] “Determining action time distribution period” [table(s) 8-9] [sec(s) 1] “This paper proposed a soft sensor model based on multivariate time series analysis and convolutional neural network (MVTS– CNN) for the online f-CaO content monitoring.” [sec(s) 4] “Taking the time series within the active duration distribution range as input data can increase the model’s applicability in different production conditions. In order to determine the active duration distribution range of each variable, a multivariate time series analysis method founded on the time delay range and the longest active duration is proposed.” [sec(s) 4.1] “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method. And in the third part, the processed time series X˙(k) = [X˙1(k), X˙2(k), . . . , X˙11(k), X˙12(k)] with different time length will be compressed into a uniform series length by different mean filters and formed into a new time series matrix XR as the input of CNN.” See also [sec(s) 4.2] [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process.”; e.g., “determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method” along with training for the cement clinker production read(s) on “changing … operation inputs”.)
However, Zhao does not appear to explicitly teach:
pruning, by the computing device, the deep learning network, wherein pruning the deep learning network comprises removing redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information;
predicting, by the computing device and using the [pruned] deep learning network, an output of the [pruned] deep learning network from the inputs of the manufacturing environment;
linearizing, by the computing device, the pruned deep learning network;
optimizing, by the computing device, an output of the [linearized pruned] deep learning network to calculate predicted inputs for the manufacturing environment; and
Cox teaches
pruning, by the computing device, the deep learning network, [wherein pruning the deep learning network comprises removing redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information];
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.” [sec(s) IV] “In the first experiment, we train a recurrent language model to predict the next word in a sequence by minimizing the cross-entropy error over the full vocabulary, as described in [19].”; Note that Zhao teaches “computing device” and “deep learning network”.)
(Note: Hereinafter, if a limitation has one or more bold underlines, the one or more underlined claim languages indicate that they are taught by the current prior art reference, while the one or more non-underlined claim languages indicate that they have been taught already by one or more previous art references.)
predicting, by the computing device and using the pruned deep learning network, an output of the pruned deep learning network from the inputs of the manufacturing environment;
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.” [sec(s) IV] “In the first experiment, we train a recurrent language model to predict the next word in a sequence by minimizing the cross-entropy error over the full vocabulary, as described in [19].”;)
linearizing, by the computing device, the pruned deep learning network;
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.” [sec(s) III] “In this case, it is reasonable to linearize the activation function (for the purposes of our analysis), within some regime, as in Eqn. (5). Furthermore, we can simplify the effect of an arbitrary compression scheme, such as SVD rank reduction, as a perturbation δ on the original weight matrix,
PNG
media_image1.png
84
913
media_image1.png
Greyscale
.
PNG
media_image2.png
99
1019
media_image2.png
Greyscale
(5)” [sec(s) IV] “In the first experiment, we train a recurrent language model to predict the next word in a sequence by minimizing the cross-entropy error over the full vocabulary, as described in [19].”; Note that Zhao teaches “computing device”)
optimizing, by the computing device, an output of the linearized pruned deep learning network to calculate predicted inputs for the manufacturing environment; and
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.” [sec(s) III] “In this case, it is reasonable to linearize the activation function (for the purposes of our analysis), within some regime, as in Eqn. (5). Furthermore, we can simplify the effect of an arbitrary compression scheme, such as SVD rank reduction, as a perturbation δ on the original weight matrix,
PNG
media_image1.png
84
913
media_image1.png
Greyscale
.
PNG
media_image2.png
99
1019
media_image2.png
Greyscale
(5)” [sec(s) IV] “In the first experiment, we train a recurrent language model to predict the next word in a sequence by minimizing the cross-entropy error over the full vocabulary, as described in [19].”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao with the linearized pruned network of Cox.
One of ordinary skill in the art would have been motived to combine in order to achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and prediction tasks.
(Cox [sec(s) Abs] “We show that considerable rank reduction is possible when compressing recurrent weights, even without fine tuning. Furthermore, we propose a perturbation model for the effect of general perturbations, such as a compression, on the recurrent parameters of RNNs. The model is tested against a noiseless memorization experiment that elucidates the short-term memory performance. In this way, we demonstrate that the effect of compression of recurrent parameters is dependent on the degree of temporal coherence present in the data and task. This work can guide on-the-fly RNN compression for novel environments or tasks, and provides insight for applying RNN compression in low-power devices, such as hearing aids.” [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency.”)
However, the combination of Zhao, Cox does not appear to explicitly teach:
pruning, by the computing device, the deep learning network, [wherein pruning the deep learning network comprises removing redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information];
Han teaches
pruning, by the computing device, the deep learning network, wherein pruning the deep learning network comprises removing redundant neurons and redundant connection weights by comparing information in a neuron [to] another neuron and removing neurons having same information;
(Han [fig(s) 3] “pruning synapses” and “pruning neurons” [fig(s) 3] “pruning synapses” and “pruning neurons” [sec(s) 1] “To achieve this goal, we present a method to prune network connections in a manner that preserves the original accuracy. After an initial training phase, we remove all connections whose weight is lower than a threshold. This pruning converts a dense, fully-connected layer to a sparse layer. This first phase learns the topology of the networks — learning which connections are important and removing the unimportant connections. We then retrain the sparse network so the remaining connections can compensate for the connections that have been removed. The phases of pruning and retraining may be repeated iteratively to further reduce network complexity. In effect, this training process learns the network connectivity in addition to the weights - much as in the mammalian brain [8][9], where synapses are created in the first few months of a child’s development, followed by gradual pruning of little-used connections, falling to typical adult values.” [sec(s) 3] “Our pruning method employs a three-step process, as illustrated in Figure 2, which begins by learning the connectivity via normal network training. Unlike conventional training, however, we are not learning the final values of the weights, but rather we are learning which connections are important. The second step is to prune the low-weight connections. All connections with weights below a threshold are removed from the network — converting a dense network into a sparse network, as shown in Figure 3. The final step retrains the network to learn the final weights for the remaining sparse connections. This step is critical. If the pruned network is used without retraining, accuracy is significantly impacted.” [sec(s) 3.5] “After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron.” [sec(s) 4] “We carried out the experiments on Nvidia TitanX and GTX980 GPUs.”;)
Examiner notes that paragraph 89 of the Instant Specification describes “In embodiments, the optimization module 480 determines the redundant neurons by comparing the information in a neuron to another neuron in the RNN network. In embodiments, if the neurons have the same information, i.e., a same input, and same weighted connections, the optimization module 480 determines that the neurons are redundant and removes one of the redundant neurons, along with the corresponding redundant connections”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao, Cox with the deep learning network pruning of Han.
One of ordinary skill in the art would have been motived to combine in order to improve the energy efficiency and storage of neural networks without affecting accuracy by finding the right connections, and lead to smaller memory capacity and bandwidth requirements for real-time image processing, making it easier to be deployed on mobile systems.
(Han [sec(s) 6] “We have presented a method to improve the energy efficiency and storage of neural networks without affecting accuracy by finding the right connections. Our method, motivated in part by how learning works in the mammalian brain, operates by learning which connections are important, pruning the unimportant connections, and then retraining the remaining sparse network. We highlight our experiments on AlexNet and VGGNet on ImageNet, showing that both fully connected layer and convolutional layer can be pruned, reducing the number of connections by 9× to 13× without loss of accuracy. This leads to smaller memory capacity and bandwidth requirements for real-time image processing, making it easier to be deployed on mobile systems.”)
However, the combination of Zhao, Cox, Han does not appear to explicitly teach:
pruning the deep learning network comprises removing redundant neurons and redundant connection weights by comparing information in a neuron [to] another neuron;
Ma teaches
pruning the deep learning network comprises removing redundant neurons and redundant connection weights by comparing information in a neuron to another neuron;
(Ma [par(s) 28-31] “calculating the relativity using the average value method,
PNG
media_image3.png
143
309
media_image3.png
Greyscale
where γn l∈[0, 1] represents the relativity; comparing the relativity γnl between various neurons in the lth layer, deleting the neuron with the minimum relativity, updating the number nl of the neurons in the lth layer of the dynamic neural network and the weight matrix Θ(l);”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao, Cox, Han with the comparison between neurons of Ma.
One of ordinary skill in the art would have been motived to combine in order to make the network structure simpler and increase the operating speed and control accuracy by dynamically adjusting the structure in the training process of the neural network.
(Ma [par(s) 37] “The present invention has the advantageous effects that: a dynamic neural network is trained through the grey relation analysis method-based network structure adjustment algorithm designed by the present invention, and an intelligent controller of the dynamic neural network-based variable cycle engine is constructed. The problem of coupling between nonlinear multiple variables caused by the increase of control variables of the variable cycle engine and the problem that the traditional control method relies too much on model accuracy are effectively solved. Meanwhile, the structure is dynamically adjusted in the training process of the neural network, so that the network structure is simpler, and the operating speed and control accuracy are increased.”)
Regarding claim 4
The combination of Zhao, Cox, Han, Ma teaches claim 1.
Zhao further teaches
wherein the deep learning network is a recurrent neural network (RNN) network.
(Zhao [table(s) 8-9] [sec(s) 5] “Long short term memory network (LSTM) [39] has the ability of mining long-distance time series data, and it has been widely used in speech recognition [40], machine translation [41], time series forecasting [42], load prediction [43] and other fields. Therefore, the predicted results of the MVTS– CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models. The applicability of multivariate time series analysis method in other modeling methods can also be further tested in this way. To ensure the objectivity and authenticity of the experimental results, we manually optimized the parameters and structure of each model in the comparative experiment before the comparative experiments.”;)
Regarding claim 5
The combination of Zhao, Cox, Han, Ma teaches claim 4.
Zhao further teaches
wherein the RNN network is a long-short term memory (LSTM) network.
(Zhao [table(s) 8-9] [sec(s) 5] “Long short term memory network (LSTM) [39] has the ability of mining long-distance time series data, and it has been widely used in speech recognition [40], machine translation [41], time series forecasting [42], load prediction [43] and other fields. Therefore, the predicted results of the MVTS– CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models. The applicability of multivariate time series analysis method in other modeling methods can also be further tested in this way. To ensure the objectivity and authenticity of the experimental results, we manually optimized the parameters and structure of each model in the comparative experiment before the comparative experiments.”;)
Regarding claim 11
The combination of Zhao, Cox, Han, Ma teaches claim 1.
Zhao further teaches
wherein the manufacturing environment is a dynamic manufacturing environment.
(Zhao [fig(s) 1-2] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc. And the sampling interval of these on-line sensors is generally 5 s. Specially, the raw material quality parameters are sampled and tested manually every two hours. And the raw quality test sample is hybrid multiplex sample, which is obtained by taking samples from the production line at equal intervals within 2 h and then mixed. When the latest three-rate values are obtained by manual test, the quality parameters of the raw material being processed at that time and the follow-on raw material to be processed are considered to be the same as the latest measured values. In this way, the information of the raw material quality parameters will be as timely and accurate as possible in the f-CaO content monitoring. Besides, f-CaO content is manually sampled and tested with the interval of 1 h.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.2] “The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.”;)
Regarding claim 13
Zhao teaches
A computer program product comprising one or more computer readable storage media having program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to:
(Zhao [fig(s) 1-2] [sec(s) 5.2] “The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.”;)
receive data from sensors in a dynamic manufacturing environment;
(Zhao [fig(s) 1-2] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc. And the sampling interval of these on-line sensors is generally 5 s. Specially, the raw material quality parameters are sampled and tested manually every two hours. And the raw quality test sample is hybrid multiplex sample, which is obtained by taking samples from the production line at equal intervals within 2 h and then mixed. When the latest three-rate values are obtained by manual test, the quality parameters of the raw material being processed at that time and the follow-on raw material to be processed are considered to be the same as the latest measured values. In this way, the information of the raw material quality parameters will be as timely and accurate as possible in the f-CaO content monitoring. Besides, f-CaO content is manually sampled and tested with the interval of 1 h.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.2] “The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.”;)
map the data into a deep learning network;
(Zhao [fig(s) 1-2] [table(s) 8-9] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc. And the sampling interval of these on-line sensors is generally 5 s. Specially, the raw material quality parameters are sampled and tested manually every two hours. And the raw quality test sample is hybrid multiplex sample, which is obtained by taking samples from the production line at equal intervals within 2 h and then mixed. When the latest three-rate values are obtained by manual test, the quality parameters of the raw material being processed at that time and the follow-on raw material to be processed are considered to be the same as the latest measured values. In this way, the information of the raw material quality parameters will be as timely and accurate as possible in the f-CaO content monitoring. Besides, f-CaO content is manually sampled and tested with the interval of 1 h.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.3] “Therefore, the predicted results of the MVTS–CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models.”;)
learn correlations between inputs and outputs of the dynamic manufacturing environment using the data;
(Zhao [fig(s) 1-2] [table(s) 8-9] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process. The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.” [sec(s) 5.3] “Therefore, the predicted results of the MVTS–CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models.”;)
predict inputs for the dynamic manufacturing environment using the [pruned] deep learning network;
(Zhao [fig(s) 3] “Determining action time distribution period” [table(s) 8-9] [sec(s) 1] “This paper proposed a soft sensor model based on multivariate time series analysis and convolutional neural network (MVTS– CNN) for the online f-CaO content monitoring.” [sec(s) 4] “Taking the time series within the active duration distribution range as input data can increase the model’s applicability in different production conditions. In order to determine the active duration distribution range of each variable, a multivariate time series analysis method founded on the time delay range and the longest active duration is proposed.” [sec(s) 4.1] “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method. And in the third part, the processed time series X˙(k) = [X˙1(k), X˙2(k), . . . , X˙11(k), X˙12(k)] with different time length will be compressed into a uniform series length by different mean filters and formed into a new time series matrix XR as the input of CNN.” See also [sec(s) 4.2] [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process.”; e.g., “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method” read(s) on “predict inputs”.)
optimize a predicted output from the [linearized pruned] deep learning network to calculate predicted inputs for the manufacturing environment; and
(Zhao [fig(s) 3] “Determining action time distribution period” [table(s) 8-9] [sec(s) 1] “This paper proposed a soft sensor model based on multivariate time series analysis and convolutional neural network (MVTS– CNN) for the online f-CaO content monitoring.” [sec(s) 4] “Taking the time series within the active duration distribution range as input data can increase the model’s applicability in different production conditions. In order to determine the active duration distribution range of each variable, a multivariate time series analysis method founded on the time delay range and the longest active duration is proposed.” [sec(s) 4.1] “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method. And in the third part, the processed time series X˙(k) = [X˙1(k), X˙2(k), . . . , X˙11(k), X˙12(k)] with different time length will be compressed into a uniform series length by different mean filters and formed into a new time series matrix XR as the input of CNN.” See also [sec(s) 4.2] [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process.”; e.g., “training” read(s) on “optimize”. In addition, e.g., “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method” read(s) on “calculate predicted inputs”.)
change operation inputs in the dynamic manufacturing environment to match the calculated predicted inputs.
(Zhao [fig(s) 3] “Determining action time distribution period” [table(s) 8-9] [sec(s) 1] “This paper proposed a soft sensor model based on multivariate time series analysis and convolutional neural network (MVTS– CNN) for the online f-CaO content monitoring.” [sec(s) 4] “Taking the time series within the active duration distribution range as input data can increase the model’s applicability in different production conditions. In order to determine the active duration distribution range of each variable, a multivariate time series analysis method founded on the time delay range and the longest active duration is proposed.” [sec(s) 4.1] “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method. And in the third part, the processed time series X˙(k) = [X˙1(k), X˙2(k), . . . , X˙11(k), X˙12(k)] with different time length will be compressed into a uniform series length by different mean filters and formed into a new time series matrix XR as the input of CNN.” See also [sec(s) 4.2] [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process.”; e.g., “determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method” along with training for the cement clinker production read(s) on “change operation inputs”.)
However, Zhao does not appear to explicitly teach:
prune the deep learning network, wherein to prune the deep learning network, the program instructions are executable to remove redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information;
predict inputs for the dynamic manufacturing environment using the [pruned] deep learning network;
linearize the pruned deep learning network;
optimize a predicted output from the [linearized pruned] deep learning network to calculate predicted inputs for the dynamic manufacturing environment; and
Cox teaches
prune the deep learning network, wherein to prune the deep learning network, the program instructions are executable to [remove redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information];
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.”; Note that Zhao teaches “deep learning network”.)
predict inputs for the dynamic manufacturing environment using the pruned deep learning network;
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.”; Note that Zhao teaches “deep learning network”.)
linearize the pruned deep learning network;
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.” [sec(s) III] “In this case, it is reasonable to linearize the activation function (for the purposes of our analysis), within some regime, as in Eqn. (5). Furthermore, we can simplify the effect of an arbitrary compression scheme, such as SVD rank reduction, as a perturbation δ on the original weight matrix,
PNG
media_image1.png
84
913
media_image1.png
Greyscale
.
PNG
media_image2.png
99
1019
media_image2.png
Greyscale
(5)”; Note that Zhao teaches “deep learning network”.)
optimize a predicted output from the linearized pruned deep learning network to calculate predicted inputs for the dynamic manufacturing environment; and
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.” [sec(s) III] “In this case, it is reasonable to linearize the activation function (for the purposes of our analysis), within some regime, as in Eqn. (5). Furthermore, we can simplify the effect of an arbitrary compression scheme, such as SVD rank reduction, as a perturbation δ on the original weight matrix,
PNG
media_image1.png
84
913
media_image1.png
Greyscale
.
PNG
media_image2.png
99
1019
media_image2.png
Greyscale
(5)”; Note that Zhao teaches “deep learning network”.)
The combination of Zhao, Cox, Han, Ma is combinable with Cox for the same rationale as set forth above with respect to claim 1.
However, the combination of Zhao, Cox does not appear to explicitly teach:
prune the deep learning network, wherein to prune the deep learning network, the program instructions are executable to [remove redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information];
Han teaches
prune the deep learning network, wherein to prune the deep learning network, the program instructions are executable to remove redundant neurons and redundant connection weights by comparing information in a neuron [to] another neuron and removing neurons having same information;
(Han [fig(s) 3] “pruning synapses” and “pruning neurons” [fig(s) 3] “pruning synapses” and “pruning neurons” [sec(s) 1] “To achieve this goal, we present a method to prune network connections in a manner that preserves the original accuracy. After an initial training phase, we remove all connections whose weight is lower than a threshold. This pruning converts a dense, fully-connected layer to a sparse layer. This first phase learns the topology of the networks — learning which connections are important and removing the unimportant connections. We then retrain the sparse network so the remaining connections can compensate for the connections that have been removed. The phases of pruning and retraining may be repeated iteratively to further reduce network complexity. In effect, this training process learns the network connectivity in addition to the weights - much as in the mammalian brain [8][9], where synapses are created in the first few months of a child’s development, followed by gradual pruning of little-used connections, falling to typical adult values.” [sec(s) 3] “Our pruning method employs a three-step process, as illustrated in Figure 2, which begins by learning the connectivity via normal network training. Unlike conventional training, however, we are not learning the final values of the weights, but rather we are learning which connections are important. The second step is to prune the low-weight connections. All connections with weights below a threshold are removed from the network — converting a dense network into a sparse network, as shown in Figure 3. The final step retrains the network to learn the final weights for the remaining sparse connections. This step is critical. If the pruned network is used without retraining, accuracy is significantly impacted.” [sec(s) 3.5] “After pruning connections, neurons with zero input connections or zero output connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron.” [sec(s) 4] “We carried out the experiments on Nvidia TitanX and GTX980 GPUs.”;)
Examiner notes that paragraph 89 of the Instant Specification describes “In embodiments, the optimization module 480 determines the redundant neurons by comparing the information in a neuron to another neuron in the RNN network. In embodiments, if the neurons have the same information, i.e., a same input, and same weighted connections, the optimization module 480 determines that the neurons are redundant and removes one of the redundant neurons, along with the corresponding redundant connections”)
The combination of Zhao, Cox is combinable with Han for the same rationale as set forth above with respect to claim 1.
However, the combination of Zhao, Cox, Han does not appear to explicitly teach:
the program instructions are executable to remove redundant neurons and redundant connection weights by comparing information in a neuron [to] another neuron;
Ma teaches
the program instructions are executable to remove redundant neurons and redundant connection weights by comparing information in a neuron to another neuron;
(Ma [par(s) 28-31] “calculating the relativity using the average value method,
PNG
media_image3.png
143
309
media_image3.png
Greyscale
where γn l∈[0, 1] represents the relativity; comparing the relativity γnl between various neurons in the lth layer, deleting the neuron with the minimum relativity, updating the number nl of the neurons in the lth layer of the dynamic neural network and the weight matrix Θ(l);”;)
The combination of Zhao, Cox, Han is combinable with Ma for the same rationale as set forth above with respect to claim 1.
Regarding claim 14
The claim is a computer readable storage media claim corresponding to the method claim 4, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 15
The claim is a computer readable storage media claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Claim(s) 2-3, 12, 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (Online cement clinker quality monitoring: A soft sensor model based on multivariate time series analysis and CNN) in view of Cox et al. (Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory) in view of Han et al. (Learning both Weights and Connections for Efficient Neural Networks) in view of Ma et al. (US20210201155A1) in view of Grossman et al. (US 20230164156 A1)
Regarding claim 2
The combination of Zhao, Cox, Han, Ma teaches claim 1.
However, the combination of Zhao, Cox, Han, Ma does not appear to explicitly teach:
wherein the sensors are based on supervisory control and data acquisition (SCADA) architecture.
Grossman teaches
wherein the sensors are based on supervisory control and data acquisition (SCADA) architecture.
(Grossman [par(s) 3] “supervisory control and data acquisition (SCADA) is a control system architecture comprising computers, networked data communications, and graphical user interfaces (GUI) for high-level process supervisory management.” [par(s) 44] “FIG. 5 shows an example of an abnormal event detection system according to aspects of the present disclosure. An embodiment of the disclosure includes network traffic data for abnormal event detection modeling. In some cases, the data may be obtained from one or more networks sensors 502 for modeling module 503. For example, the one or more network sensors 502 may include SCADA network sensor 302, wind farm network sensor 301, external network sensor 303, etc.). Additionally, one or more time series data (e.g., such as power sensor 401 data, energy market data 402, weather data 403, etc.) are applied to the modeling module 503. The modeling module 503 can build machine learning and AI models that use data from network sensors 502 (e.g., wind farm network data from wind farm network sensor 301, SCADA network data from SCADA network sensor 302) and time series data 501 (e.g., which may include external third party time series data, such as weather data, energy market data, cyber security data, etc.). In some cases, the modeling module 503 may combine, integrate, and fuse the information to create fused models.” [par(s) 90] “A SCADA system performs a supervisory operation over multiple other proprietary devices. For example, SCADA may provide computerized control over functional levels in a manufacturing operation or physical or mechanical system.”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao, Cox, Han, Ma with the data acquisition architecture of Grossman.
One of ordinary skill in the art would have been motived to combine in order to provide efficient mitigation techniques that can identify and process large amounts of data to detect abnormal events related to failures and attacks.
(Grossman [par(s) 5] “In some cases, wind farms may be controlled by systems such as SCAD A control systems. However, in some aspects, wind farms may be vulnerable to component failures, network reconnaissance, network exploitation, cyberattacks, etc. There is a need in the art for more efficient wind farm mitigation techniques that can identify and process large amounts of data to detect abnormal events related to failures and attacks ( e.g., in order to protect individual wind turbines, wind farms, and associated power grids when such abnormal events are detected).”)
Regarding claim 3
The combination of Zhao, Cox, Han, Ma teaches claim 1.
However, the combination of Zhao, Cox, Han, Ma does not appear to explicitly teach:
wherein the sensors are based on data acquisition (DAQ) architecture.
Grossman teaches
wherein the sensors are based on data acquisition (DAQ) architecture.
(Grossman [par(s) 3] “supervisory control and data acquisition (SCADA) is a control system architecture comprising computers, networked data communications, and graphical user interfaces (GUI) for high-level process supervisory management.” [par(s) 44] “FIG. 5 shows an example of an abnormal event detection system according to aspects of the present disclosure. An embodiment of the disclosure includes network traffic data for abnormal event detection modeling. In some cases, the data may be obtained from one or more networks sensors 502 for modeling module 503. For example, the one or more network sensors 502 may include SCADA network sensor 302, wind farm network sensor 301, external network sensor 303, etc.). Additionally, one or more time series data (e.g., such as power sensor 401 data, energy market data 402, weather data 403, etc.) are applied to the modeling module 503. The modeling module 503 can build machine learning and AI models that use data from network sensors 502 (e.g., wind farm network data from wind farm network sensor 301, SCADA network data from SCADA network sensor 302) and time series data 501 (e.g., which may include external third party time series data, such as weather data, energy market data, cyber security data, etc.). In some cases, the modeling module 503 may combine, integrate, and fuse the information to create fused models.” [par(s) 90] “A SCADA system performs a supervisory operation over multiple other proprietary devices. For example, SCADA may provide computerized control over functional levels in a manufacturing operation or physical or mechanical system.”;)
The combination of Zhao, Cox, Han, Ma is combinable with Grossman for the same rationale as set forth above with respect to claim 2.
Regarding claim 12
The combination of Zhao, Cox, Han, Ma teaches claim 1.
However, the combination of Zhao, Cox, Han, Ma does not appear to explicitly teach:
wherein the computing device includes software provided as a service in a cloud environment.
Grossman teaches
wherein the computing device includes software provided as a service in a cloud environment.
(Grossman [par(s) 63-98] “A processor 820 is an intelligent hardware device, (e.g., a general-purpose processing component, a digital signal processor (DSP), a central processing unit (CPU), a graphics processing unit (GPU), a microcontroller, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a programmable logic device, a discrete gate or transistor logic component, a discrete hardware component, or any combination thereof). … In some examples, abnormal event detection system 800 may include, or be coupled to, a cloud. A cloud is a computer network configured to provide on-demand availability of computer system resources, such as data storage and computing power. In some examples, the cloud provides resources without active management by the user. The term cloud is sometimes used to describe data centers available to many users over the Internet. Some large cloud networks have functions distributed over multiple locations from central servers. A server is designated an edge server if it has a direct or close connection to a user. In some cases, a cloud is limited to a single organization. In other examples, the cloud is available to many organizations. In one example, a cloud includes a multi-layer communications network comprising multiple edge routers and core routers. In another example, a cloud is based on a local collection of switches in a single physical location. … Supervisory computers may be considered the core of the SCADA system. For example, the computers are used to gather data on the process and send control commands to field connected devices. Supervisory computers refer to the computer and software responsible for communicating with field connection controllers.”;)
The combination of Zhao, Cox, Han, Ma is combinable with Grossman for the same rationale as set forth above with respect to claim 2.
Regarding claim 16
The claim is a computer readable storage media claim corresponding to the method claim 2, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Claim(s) 6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (Online cement clinker quality monitoring: A soft sensor model based on multivariate time series analysis and CNN) in view of Cox et al. (Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory) in view of Han et al. (Learning both Weights and Connections for Efficient Neural Networks) in view of Ma et al. (US20210201155A1) in view of Nakagawa et al. (Deep Recurrent Factor Model: Interpretable Non-Linear and Time-Varying Multi-Factor Model) in view of Serra et al. (Bounding and Counting Linear Regions of Deep Neural Networks)
Regarding claim 6
The combination of Zhao, Cox, Han, Ma teaches claim 1.
However, the combination of Zhao, Cox, Han, Ma does not appear to explicitly teach:
linearizing the deep learning network by replacing a rectified linear unit (ReLU) activation function with a set of equivalent linear equations to the deep learning network in response to the deep learning network being a RNN network.
Nakagawa teaches
linearizing the deep learning network [by replacing a rectified linear unit (ReLU) activation function with a set of equivalent linear equations] to the deep learning network in response to the deep learning network being a RNN network.
(Nakagawa [sec(s) Introduction] “Although LSTM performs well in many sequential prediction problems, it has significant disadvantages, such as a lack of transparency and limitations in the interpretability of the prediction. As it is difficult for institutional investors to use black-box-type machine learning technique such as LSTM in actual investment practice because they need to be accountable to their customers, we present the application of layer-wise relevance propagation (LRP) (Bach et al. 2015) to linearize the proposed LSTM model. We consider this LSTM+LRP model a deep recurrent factor model. We can model non-linear and time-varying factors as a return model and comprehend which factor contributes to the prediction as a risk model.”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao, Cox, Han, Ma with the RNN network linearization of Nakagawa.
One of ordinary skill in the art would have been motived to combine in order to provide better predictive capability than the traditional linear model and fully-connected deep learning methods, and capture nonlinear and time-varying relationship with factors in an interpretable way.
(Nakagawa [sec(s) Abs] “Finally, we perform an empirical analysis of the Japanese stock market and show that our recurrent model has better predictive capability than the traditional linear model and fully-connected deep learning methods” [sec(s) Conclusions] “We proposed a deep recurrent factor model that is nonlinear and time-varying multi-factor model implemented with LSTM+LRP. The empirical analysis of the Japanese stock market shows the LRP approximation is effective in terms of return and risk models. Our model can capture nonlinear and time-varying relationship with factors and stock return in an interpretable way.”)
However, the combination of Zhao, Cox, Han, Ma, Nakagawa does not appear to explicitly teach:
linearizing the deep learning network [by replacing a rectified linear unit (ReLU) activation function with a set of equivalent linear equations] to the deep learning network in response to the deep learning network being a RNN network.
Serra teaches
linearizing the deep learning network by replacing a rectified linear unit (ReLU) activation function with a set of equivalent linear equations to the deep learning network in response to the deep learning network being a RNN network.
(Serra [fig(s) 1] “This paper shows improved bounds on the ”number of linear regions” (typically used to study the expressiveness of DNNs) of PWL functions modeled by DNNs that use rectified linear activation functions, and a method for exact counting of the number of such regions in trained networks. We compare upper bounds from the first and latest results in the literature (Montufar et al., 2014; Montufar, 2017) with our main result (See Theorem 1). Using the proposed exact counting algorithm, we show the actual number of linear regions in 10 rectifier networks for MNIST digit recognition task with each configuration of two hidden layers totaling 22 neurons, reporting average and min-max range.” [sec(s) Abs] “We investigate the complexity of deep neural networks (DNN) that represent piecewise linear (PWL) functions. In particular, we study the number of linear regions, i.e. pieces, that a PWL function represented by a DNN can attain, both theoretically and empirically. We present (i) tighter upper and lower bounds for the maximum number of linear regions on rectifier networks, which are exact for inputs of dimension one; (ii) a first upper bound for multi-layer maxout networks; and (iii) a first method to perform exact enumeration or counting of the number of regions by modeling the DNN with a mixed-integer linear formulation.” [sec(s) 1] “Two important considerations that are part of most successful architectures are greater depth and the use of PWL activation functions such as rectified linear units (ReLUs). This large gap between practice and theory has driven researchers toward mathematical modeling of the expressive power of DNNs” [sec(s) 6] “We perform an experiment to count linear regions of small-sized networks with ReLU activation units on the MNIST benchmark dataset (LeCun et al., 1998). In this experiment, we train rectifier networks with two hidden layers summing up to 22 neurons. We train 10 networks for each configuration for 20 epochs or training steps, and we count all linear regions within 0 ≤ x ≤ 1. The counting code is written in C++ (gcc 4.8.4) using CPLEX Studio 12.8 as a solver and ran in Ubuntu 14.04.4 on a machine with 40 Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz processors and 132 GB of RAM. The runtimes for counting different configuration can be found in Appendix L.”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao, Cox, Han, Ma, Nakagawa with the ReLU for linear equations of Serra.
One of ordinary skill in the art would have been motived to combine in order to show improved bounds on the “number of linear regions” of PWL functions modeled by DNNs that use rectified linear activation functions, by achieving tighter upper and lower bounds on the maximal number of linear regions of the PWL function corresponding to a DNN that employs ReLUs.
(Serra [sec(s) 1] “This paper directly improves on the results of Montufar et al. (Pascanu et al., 2014; Montufar et al., 2014; Montufar, 2017), Raghu et al. (2017), and Arora et al. (2018). Fig. 1 highlights the main contributions, and the following list summarizes all the contributions: • We achieve tighter upper and lower bounds on the maximal number of linear regions of the PWL function corresponding to a DNN that employs ReLUs as shown in Fig. 1. As a special case, we present the exact maximal number of regions when the input dimension is one. We additionally provide the first upper bound for multi-layer maxout networks. (See Sections 3 and 4).” [fig(s) 1] “This paper shows improved bounds on the ”number of linear regions” (typically used to study the expressiveness of DNNs) of PWL functions modeled by DNNs that use rectified linear activation functions, and a method for exact counting of the number of such regions in trained networks. We compare upper bounds from the first and latest results in the literature (Montufar et al., 2014; Montufar, 2017) with our main result (See Theorem 1). Using the proposed exact counting algorithm, we show the actual number of linear regions in 10 rectifier networks for MNIST digit recognition task with each configuration of two hidden layers totaling 22 neurons, reporting average and min-max range.”)
Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (Online cement clinker quality monitoring: A soft sensor model based on multivariate time series analysis and CNN) in view of Cox et al. (Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory) in view of Han et al. (Learning both Weights and Connections for Efficient Neural Networks) in view of Ma et al. (US20210201155A1) in view of Sildir et al. (A Mixed-Integer linear programming based training and feature selection method for artificial neural networks using piece-wise linear approximations) in view of Zhou et al. (LSTM-based Energy Management for Electric Vehicle Charging in Commercial-building Prosumers)
Regarding claim 7
The combination of Zhao, Cox, Han, Ma teaches claim 1.
However, the combination of Zhao, Cox, Han, Ma does not appear to explicitly teach:
linearizing the deep learning network by replacing a tanh activation function with a piecewise linear function (PLU) activation function; and
reformulating the PLU activation function into a set of equivalent linear equations to the deep learning network in response to the deep learning network being a LSTM network.
Sildir teaches
linearizing the deep learning network by replacing a tanh activation function with a piecewise linear function (PLU) activation function; and
(Sildir [fig(s) 1-2] [sec(s) 2] “We use hyperbolic tangent activation function at the hidden layer and identity activation function at the input and output layers while input and output layer activation functions are not shown explicitly in the above formulation. … Fig. 2.1 illustrates ways of approximating the hyperbolic tangent function for different number of piece-wise linear segments (D’Ambrosio et al., 2010). More detailed review on the required pieces for different functions can be found in (Frenzen et al., 2010). Please note that the proposed PWL formulation is generic in nature and can be implemented to different activation functions e.g. sigmoid or RelU. The piece-wise linear approximation for the hyperbolic tangent function, tanh(x), can be implemented using five segments as follows:
PNG
media_image4.png
206
837
media_image4.png
Greyscale
” [sec(s) 1] “Piece-wise linear formulations require the addition of breakpoints for each piece-wise linear segment. Afterwards, binary and auxiliary variables are introduced into the formulation in order to enforce the selection of the proper segments. This way, proper calculations of the convex (linear) combinations among the adjacent segments can be realized during the optimization iteration. Accordingly, the original non-convex NLP problem is approximated into a mixed integer linear programming problem (MILP), whose optimal solution, if exists, is unique due to its convexity. … On the other hand, this issue can be tackled by employing piece-wise linear (PWL) approximation for the proposed pruning and input selection algorithms. As a consequence, resulting MILP problem can be decomposed and/or parallelized using state-of-the-art mixed integer programming solvers, e.g. CPLEX, dealing with the overall complexity of the feature selection problem in a better and reliable sense. This paper proposes a simultaneous global training and feature selection algorithm for feed-forward ANNs. The non-convex activation functions in the hidden layer and the objective function for the training problem are approximated through piecewise linear functions to ensure convexity in the hidden layer, while computing the best features to be selected during training.” [sec(s) 3.2] “Finally, please note that the size of the network, e.g. number of neurons and hidden layers, can be increased”;)
reformulating the PLU activation function into a set of equivalent linear equations to the deep learning network in response to the deep learning network being a [LSTM] network.
(Sildir [fig(s) 1-2] [sec(s) 2] “We use hyperbolic tangent activation function at the hidden layer and identity activation function at the input and output layers while input and output layer activation functions are not shown explicitly in the above formulation. … Fig. 2.1 illustrates ways of approximating the hyperbolic tangent function for different number of piece-wise linear segments (D’Ambrosio et al., 2010). More detailed review on the required pieces for different functions can be found in (Frenzen et al., 2010). Please note that the proposed PWL formulation is generic in nature and can be implemented to different activation functions e.g. sigmoid or RelU. The piece-wise linear approximation for the hyperbolic tangent function, tanh(x), can be implemented using five segments as follows:
PNG
media_image4.png
206
837
media_image4.png
Greyscale
After assigning the output layer activation function, f1, to be equal to identity, the training and feature selection problem using five piece-wise linear segments can be formulated as follows:
PNG
media_image5.png
777
726
media_image5.png
Greyscale
where v is the objective function being the sum of squared errors; q is the input for the approximated hyperbolic tangent function, f’2; λk are the positive auxiliary variables and βm are the binary variables associated with the breakpoints where m stands for the linear segments of the approximation. For more detailed information about formulating the piece-wise linear approximations for non-convex optimization problems, the reader is referred to (Vielma, 2015). … (2.7) … (2.8)” [sec(s) 1] “Piece-wise linear formulations require the addition of breakpoints for each piece-wise linear segment. Afterwards, binary and auxiliary variables are introduced into the formulation in order to enforce the selection of the proper segments. This way, proper calculations of the convex (linear) combinations among the adjacent segments can be realized during the optimization iteration. Accordingly, the original non-convex NLP problem is approximated into a mixed integer linear programming problem (MILP), whose optimal solution, if exists, is unique due to its convexity. … On the other hand, this issue can be tackled by employing piece-wise linear (PWL) approximation for the proposed pruning and input selection algorithms. As a consequence, resulting MILP problem can be decomposed and/or parallelized using state-of-the-art mixed integer programming solvers, e.g. CPLEX, dealing with the overall complexity of the feature selection problem in a better and reliable sense. This paper proposes a simultaneous global training and feature selection algorithm for feed-forward ANNs. The non-convex activation functions in the hidden layer and the objective function for the training problem are approximated through piecewise linear functions to ensure convexity in the hidden layer, while computing the best features to be selected during training.” [sec(s) 3.2] “Finally, please note that the size of the network, e.g. number of neurons and hidden layers, can be increased”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao, Cox, Han, Ma with the PLU) activation function of Sildir.
One of ordinary skill in the art would have been motived to combine in order to achieve efficient approximations with only a few number of breakpoints, and significant feature space reduction bringing about notable improvement in test accuracy.
(Sildir [sec(s) Abs] “Results show that efficient approximations are obtained through the usage of the method with only a few number of breakpoints. Significant feature space reduction is observed bringing about notable improvement in test accuracy.”)
However, the combination of Zhao, Cox, Han, Ma, Sildir does not appear to explicitly teach:
reformulating the PLU activation function into a set of equivalent linear equations to the deep learning network in response to the deep learning network being a [LSTM] network.
Zhou teaches
reformulating the PLU activation function into a set of equivalent linear equations to the deep learning network in response to the deep learning network being a LSTM network.
(Zhou [fig(s) 1-3] [sec(s) I] “Therefore, the BEMS optimization problem is non-convex and is hard to solve. Many methods have been proposed to solve the problem, including mixed-integer linear programming (MILP)” [sec(s) III MILP-BASED BEMS MODEL] “In this section, we formulate a daily BEMS optimization model based on conventional MILP as (2) - (16) to provide the training dataset for the proposed machine learning method.” [sec(s) IV. LSTM RNN-BASED BEMS MODEL] “To achieve a fast and optimal schedule for the BEMS, we use a system structure in which the training and execution of the LSTM network are separated.” [sec(s) V] “We can also observe from Fig. 8 that the popular rolling-horizon optimization method, namely the MILP with incomplete information, achieves a higher electricity cost than the LSTM method. In fact, the solution using the MILP with incomplete information depends largely on the prediction of the future state of the system such as the PV output prediction. The accuracy of the prediction can barely reach 100% (90% in this study), leading to a higher electricity cost. In contrast, the LSTM solution does not depend on the prediction of any system states. The training process enables the LSTM model to learn the mapping relationship between the inputs and the optimal output from the historical data obtained by the MILP with complete information.”;)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao, Cox, Han, Ma, Sildir with the MILP for LSTM of Zhou.
One of ordinary skill in the art would have been motived to combine in order to not only release the prediction and calculation pressures, but also achieve better results than the commonly used method in commercial building prosumer energy management.
(Zhou [sec(s) VI] “Furthermore, the added preliminary data processing step can significantly help improve the accuracy of the network output, and the additional filtering has enhanced the load leveling effect. In general, the proposed LSTM-based algorithm can not only release the prediction and calculation pressures, but also achieve better results than the commonly used method in commercial building prosumer energy management.”)
Claim(s) 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (Online cement clinker quality monitoring: A soft sensor model based on multivariate time series analysis and CNN) in view of Cox et al. (Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory) in view of Han et al. (Learning both Weights and Connections for Efficient Neural Networks) in view of Ma et al. (US20210201155A1) in view of Patil et al. (A Mixed Integer Programming Approach to Training Dense Neural Networks)
Regarding claim 8
The combination of Zhao, Cox, Han, Ma teaches claim 1.
However, the combination of Zhao, Cox, Han, Ma does not appear to explicitly teach:
linearizing the deep learning network by replacing a bilinear term in the deep learning network by the McCormick envelope.
Patil teaches
linearizing the deep learning network by replacing a bilinear term in the deep learning network by a McCormick envelope.
(Patil [sec(s) 3] “ANNs with ReLU activated hidden layer units are another class of models commonly used in practice. ReLU is commonly paired with soft-max loss functions as an activation function for training deep supervised ANNs [13]. ReLU activations are especially beneficial in negating the vanishing gradient problem since the activation function has a gradient of 1 at all activated units [12]. Although fully MIP training directly addresses the vanishing gradient problem by eliminating the need for back-propagation entirely, this issue can still arise if the MIP model is used as pre-training initialization for an SGD approach to replace randomized initialization. … Similarly to the binary case the two main reformulation challenges that arise from these constraints are the piece-wise definition of the activation and the presence of bi-linear terms. Unlike the binary case however, the bi-linear terms involve the multiplication of two real valued decision variables and not a binary variable with a continuous variable. Since the resulting formulation would be challenging for commercial solvers to solve effectively we propose a relaxation formulation for the the ReLU activation case. In particular, we utilize piece-wise McCormick relaxations to reformulate the problem as a linear MIP. … The values for αLp and αUp are adapted from the partition bounds first defined by [8]. With the ReLU activated neurons now defined by Constraints (22) through (34), the rest of model is identical to that defined in Proposition 2. The complete MIP formulation for our problem can be found in the appendix.” [sec(s) 4] “We note that although not explicitly stated, the same reformulation techniques can also be applied to the output layer of the ANN. Our formulation relies piece-wise McCormick relaxations to reformulate the training problem as a linear MIP.”; For more details about mixed-integer linear programming (MILP) with piecewise McCormick envelopes, please refer to this paper’s reference [8] which is Castro et al. (Tightening piecewise mccormick relaxations for bilinear problems) (e.g., [sec(s) 3] “McCormick (1976) envelopes can be used to provide different types of relaxations to problems of type (P). In the standard approach, a linear programming (LP) relaxation is derived by replacing each bilinear term involving variables xi and xj with a new variable wij = xixj and adding four sets of constraints. A tighter mixed-integer linear programming (MILP) relaxation can be constructed by partitioning the domain of one of the variables (xj) of the bilinear term into n disjoint regions, with new binary variables being added to the formulation to select the optimal partition for xj.” [sec(s) 7] “This paper has shown that the quality of the relaxation resulting from piecewise McCormick envelopes for bilinear terms can be improved significantly once partition-dependent lower and upper bounds are defined for all bilinear variables.”);)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao, Cox, Han, Ma with the McCormick envelope of Patil.
One of ordinary skill in the art would have been motived to combine in order to achieve strong predictive accuracy with more parsimonious architectures compared to traditional models that require deep and neuron-dense architectures.
(Patil [sec(s) 6] “We showed that our greedy layer-wise, binary activated MIP outperforms traditional training models in two experiments. In both experiments, our model meets the testing accuracy threshold with the fewest number of layers and the fewer number of units in each layer. For architectures constrained by the number of units, we also see that our binary activated MIP is able to compete with its greedy counterpart. In essence, our models can achieve strong predictive accuracy with more parsimonious architectures compared to traditional models that require deep and neuron-dense architectures.”)
Claim(s) 17-18, 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al. (Online cement clinker quality monitoring: A soft sensor model based on multivariate time series analysis and CNN) in view of Cox et al. (Parameter Compression of Recurrent Neural Networks and Degradation of Short-term Memory) in view of Dai et al. (Grow and Prune Compact, Fast, and Accurate LSTMs) in view of Ma et al. (US20210201155A1)
Regarding claim 17
Zhao teaches
A system comprising:
a processor, a computer readable memory, one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable to:
(Zhao [fig(s) 1-2] [sec(s) 5.2] “The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.”;)
receive data from sensors in a dynamic manufacturing environment;
(Zhao [fig(s) 1-2] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc. And the sampling interval of these on-line sensors is generally 5 s. Specially, the raw material quality parameters are sampled and tested manually every two hours. And the raw quality test sample is hybrid multiplex sample, which is obtained by taking samples from the production line at equal intervals within 2 h and then mixed. When the latest three-rate values are obtained by manual test, the quality parameters of the raw material being processed at that time and the follow-on raw material to be processed are considered to be the same as the latest measured values. In this way, the information of the raw material quality parameters will be as timely and accurate as possible in the f-CaO content monitoring. Besides, f-CaO content is manually sampled and tested with the interval of 1 h.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.2] “The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.”;)
map the data into a recurrent neural network (RNN) network;
(Zhao [fig(s) 1-2] [table(s) 8-9] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc. And the sampling interval of these on-line sensors is generally 5 s. Specially, the raw material quality parameters are sampled and tested manually every two hours. And the raw quality test sample is hybrid multiplex sample, which is obtained by taking samples from the production line at equal intervals within 2 h and then mixed. When the latest three-rate values are obtained by manual test, the quality parameters of the raw material being processed at that time and the follow-on raw material to be processed are considered to be the same as the latest measured values. In this way, the information of the raw material quality parameters will be as timely and accurate as possible in the f-CaO content monitoring. Besides, f-CaO content is manually sampled and tested with the interval of 1 h.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.3] “Therefore, the predicted results of the MVTS–CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models.”;)
learn correlations between inputs and outputs of the dynamic manufacturing environment using the RNN network;
(Zhao [fig(s) 1-2] [table(s) 8-9] [sec(s) 3] “In the production process automation control system, a large number of sensors are used to measure process variables in the production, such as pressure, temperature, quality, voltage and current, etc.” [sec(s) Abs] “Compared with traditional CNN, support vector machines (SVM) and long-short term memory networks (LSTM), the results demonstrate that the MVTS–CNN model has higher accuracy, better generalization ability and superior robustness.” [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process. The short training time of 20,000 iterations with i7-8700 processor indicates that the structure of the MVTS–CNN model is simple and the computational complexity is moderate.” [sec(s) 5.3] “Therefore, the predicted results of the MVTS–CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models.”;)
predict inputs for the dynamic manufacturing environment using the [pruned] RNN network;
(Zhao [fig(s) 3] “Determining action time distribution period” [table(s) 8-9] [sec(s) 1] “This paper proposed a soft sensor model based on multivariate time series analysis and convolutional neural network (MVTS– CNN) for the online f-CaO content monitoring.” [sec(s) 4] “Taking the time series within the active duration distribution range as input data can increase the model’s applicability in different production conditions. In order to determine the active duration distribution range of each variable, a multivariate time series analysis method founded on the time delay range and the longest active duration is proposed.” [sec(s) 4.1] “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method. And in the third part, the processed time series X˙(k) = [X˙1(k), X˙2(k), . . . , X˙11(k), X˙12(k)] with different time length will be compressed into a uniform series length by different mean filters and formed into a new time series matrix XR as the input of CNN.” See also [sec(s) 4.2] [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process.” [sec(s) 5.3] “Therefore, the predicted results of the MVTS–CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models.”; e.g., “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method” read(s) on “predict inputs”.)
optimize a predicted output from the [linearized pruned] RNN network to calculate predicted inputs; and
(Zhao [fig(s) 3] “Determining action time distribution period” [table(s) 8-9] [sec(s) 1] “This paper proposed a soft sensor model based on multivariate time series analysis and convolutional neural network (MVTS– CNN) for the online f-CaO content monitoring.” [sec(s) 4] “Taking the time series within the active duration distribution range as input data can increase the model’s applicability in different production conditions. In order to determine the active duration distribution range of each variable, a multivariate time series analysis method founded on the time delay range and the longest active duration is proposed.” [sec(s) 4.1] “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method. And in the third part, the processed time series X˙(k) = [X˙1(k), X˙2(k), . . . , X˙11(k), X˙12(k)] with different time length will be compressed into a uniform series length by different mean filters and formed into a new time series matrix XR as the input of CNN.” See also [sec(s) 4.2] [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process.” [sec(s) 5.3] “Therefore, the predicted results of the MVTS–CNN model are compared with that of SVM and LSTM to evaluate its performance. Since the multivariate time series analysis method can be considered as data processing, and in order to avoid the influence of different input data on the model comparison, MVTS-SVM and MVTS-LSTM are designed and involved as competing models.”; e.g., “training” read(s) on “optimize”. In addition, e.g., “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method” read(s) on “calculate predicted inputs”.)
change operation inputs in the dynamic manufacturing environment to match the calculated predicted inputs.
(Zhao [fig(s) 3] “Determining action time distribution period” [table(s) 8-9] [sec(s) 1] “This paper proposed a soft sensor model based on multivariate time series analysis and convolutional neural network (MVTS– CNN) for the online f-CaO content monitoring.” [sec(s) 4] “Taking the time series within the active duration distribution range as input data can increase the model’s applicability in different production conditions. In order to determine the active duration distribution range of each variable, a multivariate time series analysis method founded on the time delay range and the longest active duration is proposed.” [sec(s) 4.1] “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method. And in the third part, the processed time series X˙(k) = [X˙1(k), X˙2(k), . . . , X˙11(k), X˙12(k)] with different time length will be compressed into a uniform series length by different mean filters and formed into a new time series matrix XR as the input of CNN.” See also [sec(s) 4.2] [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process.”; e.g., “determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method” along with training for the cement clinker production read(s) on “change operation inputs”.)
However, Zhao does not appear to explicitly teach:
prune the RNN network, wherein to prune the RNN network, the program instructions are executable to remove redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information;
predict inputs for the dynamic manufacturing environment using the [pruned] RNN network;
linearize the pruned RNN network;
optimize a predicted output from the [linearized pruned] RNN network to calculate predicted inputs; and
Cox teaches
prune the RNN network, wherein to prune the RNN network, the program instructions are executable to [remove redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information];
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.”;)
predict inputs for the dynamic manufacturing environment using the pruned RNN network;
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.”;)
linearize the pruned RNN network;
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.” [sec(s) III] “In this case, it is reasonable to linearize the activation function (for the purposes of our analysis), within some regime, as in Eqn. (5). Furthermore, we can simplify the effect of an arbitrary compression scheme, such as SVD rank reduction, as a perturbation δ on the original weight matrix,
PNG
media_image1.png
84
913
media_image1.png
Greyscale
.
PNG
media_image2.png
99
1019
media_image2.png
Greyscale
(5)”;)
optimize a predicted output from the linearized pruned RNN network to calculate predicted inputs; and
(Cox [sec(s) I] “In this paper, we show that recurrent neural networks, including those using a memory cell based architecture, such as MGRU, achieve significant complexity reduction of the feedforward and recurrent connection weights, for both classification and language modeling sequence prediction tasks. In addition, we provide a more fundamental understanding of how complexity reduction, viewed as a general perturbation or corruption, is impacted by temporal dependency. Therefore, we devise a perturbation model of the effect of a general compression method, such as singular value decomposition (SVD) rank reduction, on the short-term memory performance of recurrent networks. This model is tested on a noiseless memorization task to elucidate the conditions over which scaling of short-term memory performance agrees. In this way, it is shown how the achievable compression is dependent on the degree of temporal coherence present in the task and data.” [sec(s) II] “An effective form of complexity reduction, which has been demonstrated on feed-forward and convolutional neural networks, is rank reduction via singular value decomposition on the network parameters. For RNNs, the forward and recurrent matrix of weights can be individually decomposed into their singular values and orthonormal bases, ∑ and U, V, respectively. By eliminating the smallest singular values, in order from least to greatest, an optimal reduced rank representation, Q[Symbol font/0x25] and V[Symbol font/0x25] is found, as in Eqn. (3). This compressed representation has only R∙(M+N) parameters, where R is the rank and M and N are the original dimensions of a particular weight matrix.” [sec(s) III] “In this case, it is reasonable to linearize the activation function (for the purposes of our analysis), within some regime, as in Eqn. (5). Furthermore, we can simplify the effect of an arbitrary compression scheme, such as SVD rank reduction, as a perturbation δ on the original weight matrix,
PNG
media_image1.png
84
913
media_image1.png
Greyscale
.
PNG
media_image2.png
99
1019
media_image2.png
Greyscale
(5)”;)
The combination of Zhao, Cox is combinable with Cox for the same rationale as set forth above with respect to claim 1.
However, the combination of Zhao, Cox does not appear to explicitly teach:
prune the RNN network, wherein to prune the RNN network, the program instructions are executable to [remove redundant neurons and redundant connection weights by comparing information in a neuron to another neuron and removing neurons having same information];
Dai teaches
prune the RNN network, wherein to prune the RNN network, the program instructions are executable to remove redundant neurons and redundant connection weights by comparing information in a neuron [to] another neuron and removing neurons having same information;
(Dai [fig(s) 2] [sec(s) Abs] “We employ grow-and-prune (GP) training to iteratively adjust the hidden layers through gradient-based growth and magnitude-based pruning of connections.” [sec(s) 3 Grow-and-Prune Training] “3.2 Conventional training based on back propagation on fully-connected NNs yields over-parameterized models. Han et al. have successfully implemented pruning to drastically reduce the size of large CNNs and LSTMs [10,24]. The pruning phase is complemented with a brain-inspired growth phase for large CNNs in [9]. The network growth phase allows a CNN to grow neurons, connections, and feature maps, as necessary, during training. Thus, it enables automated search in the architecture space. It has been shown that a sequential combination of growth and pruning can yield additional compression on CNNs relative to pruning-only methods (e.g., 1.7× for AlexNet and 2.0× for VGG-16 on top of the pruning-only methods) [9]. In this work, we extend GP training to LSTMs. We illustrate the GP training flow in Fig. 2. It starts from a randomly initialized sparse seed architecture. The seed architecture contains a very limited fraction of connections to facilitate initial gradient back-propagation. The remaining connections in the matrices are dormant and masked to zero. The flow ensures that all neurons in the network are connected. During training, it first grows connections based on the gradient information. Then, it prunes away redundant connections for compactness, based on their magnitudes. Finally, GP training rests at an accurate, yet compact, inference model. We explain the details of each phase next.”; e.g., “Final architecture” of fig 2 read(s) on “remove redundant neurons”.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Zhao, Cox with the RNN network pruning of Dai.
One of ordinary skill in the art would have been motived to combine in order to achieve comparable or even improved accuracy with fewer external stacked layers relative to a conventional LSTM, leading to higher compactness, and reduce overfitting and lead to better generalization.
(Dai [sec(s) 3] “1. Strengthened control: Hidden layers in DNN gates enhance gate control through multilevel abstraction. This makes an H-LSTM more capable and intelligent, and alleviates its reliance on external stacking. Consequently, an H-LSTM can achieve comparable or even improved accuracy with fewer external stacked layers relative to a conventional LSTM, leading to higher compactness. 2. Easy regularization: The conventional approach only uses dropout in the input/output layers and recurrent connections in the LSTMs. In our case, it becomes possible to apply dropout even to all control gates within an LSTM cell. This reduces overfitting and leads to better generalization.”)
However, the combination of Zhao, Cox, Han does not appear to explicitly teach:
the program instructions are executable to remove redundant neurons and redundant connection weights by comparing information in a neuron [to] another neuron;
Ma teaches
the program instructions are executable to remove redundant neurons and redundant connection weights by comparing information in a neuron to another neuron;
(Ma [par(s) 28-31] “calculating the relativity using the average value method,
PNG
media_image3.png
143
309
media_image3.png
Greyscale
where γn l∈[0, 1] represents the relativity; comparing the relativity γnl between various neurons in the lth layer, deleting the neuron with the minimum relativity, updating the number nl of the neurons in the lth layer of the dynamic neural network and the weight matrix Θ(l);”;)
The combination of Zhao, Cox, Dai is combinable with Ma for the same rationale as set forth above with respect to claim 1.
Regarding claim 18
The claim is a system claim corresponding to the method claim 5, and is directed to largely the same subject matter. Thus, it is rejected for the same reasons as given in the rejections of the method claim.
Regarding claim 20
The combination of Zhao, Cox, Dai, Ma teaches claim 17.
Zhao further teaches
wherein the changing the operation inputs includes inputting the predicted inputs into components of the dynamic manufacturing environment.
(Zhao [fig(s) 3] “Determining action time distribution period” [table(s) 8-9] [sec(s) 1] “This paper proposed a soft sensor model based on multivariate time series analysis and convolutional neural network (MVTS– CNN) for the online f-CaO content monitoring.” [sec(s) 4] “Taking the time series within the active duration distribution range as input data can increase the model’s applicability in different production conditions. In order to determine the active duration distribution range of each variable, a multivariate time series analysis method founded on the time delay range and the longest active duration is proposed.” [sec(s) 4.1] “The second part determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method. And in the third part, the processed time series X˙(k) = [X˙1(k), X˙2(k), . . . , X˙11(k), X˙12(k)] with different time length will be compressed into a uniform series length by different mean filters and formed into a new time series matrix XR as the input of CNN.” See also [sec(s) 4.2] [sec(s) 5.2] “As an important parameter in supervised learning and deep learning, learning rate determines whether and when the objective function can converge to the global minimum. … As is shown in Fig. 8a and Table 7, the trained model has good fitting effect on the training sets, after the 20,000 times of training iterations. And the local enlarged figure of training results (Fig. 8b) shows that the MVTS–CNN model can extract appropriate features for different f-CaO content in the model training process.”; e.g., “determines the active duration distribution range of each input variable and process the time series X˙(k) by the multivariate time series analysis method” along with training for the cement clinker production read(s) on “changing the operation inputs”.)
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Nicolae et al. (PLU: The Piecewise Linear Unit Activation Function) teaches PLU for tanh.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409. The examiner can normally be reached Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/S.K./Examiner, Art Unit 2129 3/27/2026
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129