Office Action Analysis: 18067410 — SYSTEM AND METHOD WITH SEQUENCE MODELING OF SENSOR DATA FOR MANUFACTURING

Office Action

§101 §103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The abstract of the disclosure is objected to because it exceeds 150 words.  A corrected abstract of the disclosure is required and must be presented on a separate sheet, apart from any other text. See MPEP § 608.01(b).
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

	Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. 
101 Subject Matter Eligibility Analysis 
Step 1: Claims 1-20 are within the four statutory (a process, machine, manufacture or composition of matter.) Claims 1-7 describe a process and 8-20 describes a machine. 
With respect to claim 1: 
Step 2A Prong 1: The claim recites an abstract idea enumerated in the 2019 PEG
generating a first set of embeddings, the first set of embeddings including (a) history measurement embeddings based on history measurement data, the history measurement data relating to attributes of one or more other parts that traversed the plurality of stations before the given part; (b) history part identifier embeddings based on one or more history part identifiers of the one or more other parts, and (c) history station identifier embeddings based on history station identifiers corresponding to the history measurement data; (This is an abstract idea of a "Mental Process." The "generating" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The specification describes the embeddings as vectors to be inputted into encoder/decoder and generating a vector could be done manually by an individual.)
generating a second set of embeddings, the second set of embeddings including (a) measurement embeddings based on observed measurement data, the observed measurement data relating to attributes of the given part as obtained by one or more sensors at each station of a station subsequence of the station sequence; (b) part identifier embeddings based on a part identifier of the given part, and (c) station identifier embeddings based on station identifiers that corresponding to the observed measurement data; (This is an abstract idea of a "Mental Process." The "generating" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The specification describes the embeddings as vectors to be inputted into encoder/decoder and generating a vector could be done manually by an individual.)
generating a history embedding sequence by concatenating the first set of embeddings; (This is an abstract idea of a "Mental Process." The "generation" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The concatenating could be done manually by an individual.)
generating an input embedding sequence by concatenating the second set of embeddings; (This is an abstract idea of a "Mental Process." The "generation" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The concatenating could be done manually by an individual.)
generating, …, intermediate history features based on the history embedding sequence; and (This is an abstract idea of a "Mental Process." The "generating" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The generation could be made manually by an individual.)
generating, …, predicted measurement data based on the intermediate history features and the input embedding sequence, (This is an abstract idea of a "Mental Process." The "generating" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The generation could be made manually by an individual.)
Step 2A Prong 2: The judicial exception is not integrated into a practical application 
Additional elements:
establishing a station sequence that includes a plurality of stations that a given part traverses; (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
via an encoding network (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
via a decoding network (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
wherein the predicted measurement data includes next measurement data of the given part at a next station, the next station being after the station subsequence in the station sequence. (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional elements “establishing…” and “wherein…” add insignificant extra-solution activity to the judicial exception and cannot provide an inventive concept. Storing and retrieving information in memory is directed to a well understood routine conventional activity of data transmission (MPEP 2106.05(d)(II)(iv)).
The additional elements “via an encoding network” and “via a decoding network” are recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
When considered in combination, these additional elements represent insignificant extra-solution activity and mere instructions to apply an expectation, which do not provide an inventive concept.
Therefore, claim 1 is ineligible.
With respect to claim 2: 
Step 2A Prong 1: claim 2, which incorporates the rejection of claim 1, does not recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
the history measurement data is based on multimodal sensor data; (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
the observed measurement data is based on multimodal sensor data; and (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
the predicted measurement data is based on multimodal sensor data. (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional elements add insignificant extra-solution activity to the judicial exception and cannot provide an inventive concept. Storing and retrieving information in memory is directed to a well understood routine conventional activity of data transmission (MPEP 2106.05(d)(II)(iv)).
Therefore, claim 2 is ineligible.
With respect to claim 3: 
Step 2A Prong 1: claim 3, which incorporates the rejection of claim 1, does not recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
a transformer model comprises the encoding network and the decoding network. (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element is recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
Therefore, claim 3 is ineligible.
With respect to claim 4: 
Step 2A Prong 1: claim 4, which incorporates the rejection of claim 3, recites an additional abstract idea:
generating loss data by evaluating a loss function based on ground-truth measurement data and the predicted measurement data; and (this is an abstract idea of a “mathematical concept”. The recited “loss function” represents a mathematical function that would fall under the “mathematical concepts” grouping.)
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
updating parameters of the transformer model based on the loss data, wherein the ground-truth measurement data including next observed measurement data of the given part at the next station. (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element is recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
Therefore, claim 4 is ineligible. 
With respect to claim 5: 
Step 2A Prong 1: claim 5, which incorporates the rejection of claim 3, does not recite an abstract idea.
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
applying a query, a key, and a value to the decoding network, (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
the query is computed based on the input embedding sequence, (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
the key is computed based on the intermediate history features, and (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
the value is computed based on the intermediate history features. (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element “applying a query…” is recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
The additional elements “ the query…”, “the key…” and “the value” add insignificant extra-solution activity to the judicial exception and cannot provide an inventive concept. Storing and retrieving information in memory is directed to a well understood routine conventional activity of data transmission (MPEP 2106.05(d)(II)(iv)).
When considered in combination, these additional elements represent insignificant extra-solution activity and mere instructions to apply an expectation, which do not provide an inventive concept.
Therefore, claim 5 is ineligible.
With respect to claim 6: 
Step 2A Prong 1: claim 6, which incorporates the rejection of claim 1, recites an additional abstract idea:
combining the history embedding sequence with positional embedding to generate intermediate embedding data; and (This is an abstract idea of a "Mental Process." The "combining" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The combination could be made manually by an individual.)
generating the intermediate history features by applying one or more self-attention networks to the intermediate embedding data, (This is an abstract idea of a "Mental Process." The "generating" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The generating could be done manually by an individual.)
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
the positional embedding relates to ordering and positional dependency of the history embedding sequence, and (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
the one or more self-attention networks encode the intermediate embedding data to generate the intermediate history features. (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element “the positional embedding…” adds insignificant extra-solution activity to the judicial exception and cannot provide an inventive concept. Storing and retrieving information in memory is directed to a well understood routine conventional activity of data transmission (MPEP 2106.05(d)(II)(iv)).
The additional element “ the one or more…” is recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
When considered in combination, these additional elements represent insignificant extra-solution activity and mere instructions to apply an expectation, which do not provide an inventive concept.
Therefore, claim 6 is ineligible. 
With respect to claim 7: 
Step 2A Prong 1: claim 7, which incorporates the rejection of claim 1, recites an additional abstract idea:
combining the input embedding sequence with positional embedding to generate predicted embedding data; and (This is an abstract idea of a "Mental Process." The "combining" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The combination could be made manually by an individual.)
generating the predicted measurement data by applying one or more cross-attention networks to the predicted embedding data alongside the history features, (This is an abstract idea of a "Mental Process." The "generating" step under its broadest reasonable interpretation, covers concepts that can be practically performed in the human mind. The generating could be done manually by an individual.)
Step 2A Prong 2: The judicial exception is not integrated into a practical application.
the positional embedding relates to ordering and positional dependency of the input embedding sequence, and (this limitation amounts to adding insignificant extra-solution activity to the judicial exception).
the one or more cross-attention networks decode the predicted embedding data alongside the history features to generate the predicted measurement data. (This amounts to no more than mere instructions to “apply” the exception using a generic computer component.)
Step 2B: the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception
The additional element “the positional embedding…” adds insignificant extra-solution activity to the judicial exception and cannot provide an inventive concept. Storing and retrieving information in memory is directed to a well understood routine conventional activity of data transmission (MPEP 2106.05(d)(II)(iv)).
The additional element “ the one or more…” is recited in a generic level and they represent generic computer components to apply the abstract idea. Mere instructions to apply an exception cannot provide an inventive concept (MPEP 2106.05(f)).
When considered in combination, these additional elements represent insignificant extra-solution activity and mere instructions to apply an expectation, which do not provide an inventive concept.
Therefore, claim 7 is ineligible. 
With respect to claim 8: 
	The claim recites similar limitations as corresponding to claim 1. Therefore, the same subject matter analysis that was utilized for claim 1, as described above, is equally applicable to claim 8. Therefore, claim 8 is ineligible.

With respect to claim 9: 
	The claim recites similar limitations as corresponding to claim 2. Therefore, the same subject matter analysis that was utilized for claim 2, as described above, is equally applicable to claim 9. Therefore, claim 9 is ineligible.
With respect to claim 10: 
	The claim recites similar limitations as corresponding to claim 3. Therefore, the same subject matter analysis that was utilized for claim 3, as described above, is equally applicable to claim 10. Therefore, claim 10 is ineligible.
With respect to claim 11: 
	The claim recites similar limitations as corresponding to claim 4. Therefore, the same subject matter analysis that was utilized for claim 4, as described above, is equally applicable to claim 11. Therefore, claim 11 is ineligible.
With respect to claim 12: 
	The claim recites similar limitations as corresponding to claim 5. Therefore, the same subject matter analysis that was utilized for claim 5, as described above, is equally applicable to claim 12. Therefore, claim 12 is ineligible.
With respect to claim 13: 
	The claim recites similar limitations as corresponding to claim 6. Therefore, the same subject matter analysis that was utilized for claim 6, as described above, is equally applicable to claim 13. Therefore, claim 13 is ineligible.
With respect to claim 14: 
	The claim recites similar limitations as corresponding to claim 7. Therefore, the same subject matter analysis that was utilized for claim 7, as described above, is equally applicable to claim 14. Therefore, claim 14 is ineligible.
With respect to claim 15: 
	The claim recites similar limitations as corresponding to claim 1. Therefore, the same subject matter analysis that was utilized for claim 1, as described above, is equally applicable to claim 15. Therefore, claim 1 is ineligible.
With respect to claim 16: 
	The claim recites similar limitations as corresponding to claim 3. Therefore, the same subject matter analysis that was utilized for claim 3, as described above, is equally applicable to claim 16. Therefore, claim 16 is ineligible.
With respect to claim 17: 
	The claim recites similar limitations as corresponding to claim 4. Therefore, the same subject matter analysis that was utilized for claim 4, as described above, is equally applicable to claim 17. Therefore, claim 17 is ineligible.
With respect to claim 18: 
	The claim recites similar limitations as corresponding to claim 5. Therefore, the same subject matter analysis that was utilized for claim 5, as described above, is equally applicable to claim 18. Therefore, claim 18 is ineligible.
With respect to claim 19: 
	The claim recites similar limitations as corresponding to claim 6. Therefore, the same subject matter analysis that was utilized for claim 6, as described above, is equally applicable to claim 19. Therefore, claim 19 is ineligible.
With respect to claim 20: 
	The claim recites similar limitations as corresponding to claim 7. Therefore, the same subject matter analysis that was utilized for claim 7, as described above, is equally applicable to claim 20. Therefore, claim 20 is ineligible.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhiqiang (NPL: ‘Novel Transformer Based on Gated Convolutional Neural network for Dynamic Soft Sensor Modeling of Industrial processes’) in view of Murai (US 2022/0326699 A1)
Regarding claim 1, Zhiqiang teaches:
A computer-implemented method for predictive measurement monitoring, the method comprising: (Introduction “As a result, a series of soft sensor modeling methods have been developed, which are playing an increasingly important role in improving the production efficiency and the product quality, and providing guidance for the optimal control of the reaction process”). 
generating a first set of embeddings, the first set of embeddings including (a) history measurement embeddings based on history measurement data, the history measurement data relating to attributes of one or more other parts that traversed the plurality of stations before the given part; (b) history part identifier embeddings based on one or more history part identifiers of the one or more other parts, and (c) history station identifier embeddings based on history station identifiers corresponding to the history measurement data; (Fig. 1 shows the historical data (on the left) going into the embedding layer. In Section III. Proposed Methodology “First, the gated CNN layer encodes the input data. The gated signal and the feature signal are generated by two types of convolution kernels, which are then multiplied positionwise to obtain the final CNN embedding”)
generating a second set of embeddings, the second set of embeddings including (a) measurement embeddings based on observed measurement data, the observed measurement data relating to attributes of the given part as obtained by one or more sensors at each station of a station subsequence of the station sequence; (b) part identifier embeddings based on a part identifier of the given part, and (c) station identifier embeddings based on station identifiers that corresponding to the observed measurement data; (Fig. 1 shows the observed data (X) going into the embedding layer. In Section III. Proposed Methodology “First, the gated CNN layer encodes the input data. The gated signal and the feature signal are generated by two types of convolution kernels, which are then multiplied positionwise to obtain the final CNN embedding”)
generating a history embedding sequence by concatenating the first set of embeddings; (Fig. 1 shows the embeddings being combined)
generating an input embedding sequence by concatenating the second set of embeddings; (Fig. 1 shows the embeddings being combined)
generating, via an encoding network, intermediate history features based on the history embedding sequence; and (Section III. Proposed Methodology (A Transformer) “The transformer adopts the encoder–decoder architecture, and its encoder and decoder are both stacked by multiple independent feature extractors”)
generating, via a decoding network, predicted measurement data based on the intermediate history features and the input embedding sequence, wherein the predicted measurement data includes next measurement data of the given part at a next station, the next station being after the station subsequence in the station sequence. (Introduction “A dynamic soft-sensing model is proposed to make full use of the historical information and the current observable state to predict key quality indicators at present or at a certain time in the future” and in Section C. Highway Network and Output Layer “Since the self-attention mechanism is able to capture the relationship between any two moments, the output hidden state at the last time step of the transformer [which contains a decoder] is kept and combined with the output of the highway network to obtain the prediction result through the fully connected linear layer”).
Zhiqiang does not teach:
establishing a station sequence that includes a plurality of stations that a given part traverses;
However, Murai does:
establishing a station sequence that includes a plurality of stations that a given part traverses; ([0020] “A semiconductor device fabrication process is used to manufacture stand-alone semiconductor devices and integrated circuit chips, for example. The fabrication process includes a sequence of automated steps that gradually form electronic circuits on a semiconductor wafer. “)
Zhiqiang and Murai are considered analogous art to the claimed invention because they are in the same field of endeavor being sensor data prediction. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the method and process of Zhiqiang with manufacturing process of Murai. One would want to do this to predict sensor data in a manufacturing setting. 
Regarding claim 2, Zhiqiang teaches:
the history measurement data is based on multimodal sensor data; (Section II. Problem Formulation “The dynamic soft sensor modeling problem based on time series data can be defined by giving a series of industrial process data as follows:                         
                            {
                            
                                
                                    X
                                
                                
                                    t
                                    -
                                    w
                                    +
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    X
                                
                                
                                    t
                                
                            
                            ;
                            
                                
                                    Y
                                
                                
                                    t
                                    -
                                    w
                                    +
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    Y
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            }
                        
                     where                         
                            
                                
                                    X
                                
                                
                                    t
                                
                            
                            ∈
                             
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                     represents the auxiliary variables observed by the DCS in the real time,                         
                            m
                        
                     is the dimension of auxiliary variables,                         
                            
                                
                                    Y
                                
                                
                                    t
                                
                            
                            ∈
                             
                            
                                
                                    R
                                
                                
                            
                        
                     is the dominant variable to be predicted, and                         
                            w
                        
                     is the window size of the historical observation data. The dominant variable                         
                            
                                
                                    Y
                                
                                
                                    t
                                    +
                                    h
                                
                            
                        
                     at a certain time in the future is predicted, where h is the desirable horizon ahead the current time step. In other words, to predict                         
                            
                                
                                    Y
                                
                                
                                    t
                                    +
                                    h
                                
                            
                        
                    , the variables in (1) are available. When                         
                            h
                             
                            =
                             
                            0
                        
                    , it means that we predict the dominant variable at the current time step.”)
the observed measurement data is based on multimodal sensor data; and (Section II. Problem Formulation “The dynamic soft sensor modeling problem based on time series data can be defined by giving a series of industrial process data as follows:                         
                            {
                            
                                
                                    X
                                
                                
                                    t
                                    -
                                    w
                                    +
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    X
                                
                                
                                    t
                                
                            
                            ;
                            
                                
                                    Y
                                
                                
                                    t
                                    -
                                    w
                                    +
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    Y
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            }
                        
                     where                         
                            
                                
                                    X
                                
                                
                                    t
                                
                            
                            ∈
                             
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                     represents the auxiliary variables observed by the DCS in the real time,                         
                            m
                        
                     is the dimension of auxiliary variables,                         
                            
                                
                                    Y
                                
                                
                                    t
                                
                            
                            ∈
                             
                            
                                
                                    R
                                
                                
                            
                        
                     is the dominant variable to be predicted, and                         
                            w
                        
                     is the window size of the historical observation data. The dominant variable                         
                            
                                
                                    Y
                                
                                
                                    t
                                    +
                                    h
                                
                            
                        
                     at a certain time in the future is predicted, where h is the desirable horizon ahead the current time step. In other words, to predict                         
                            
                                
                                    Y
                                
                                
                                    t
                                    +
                                    h
                                
                            
                        
                    , the variables in (1) are available. When                         
                            h
                             
                            =
                             
                            0
                        
                    , it means that we predict the dominant variable at the current time step.”)
the predicted measurement data is based on multimodal sensor data. (Section II. Problem Formulation “The dynamic soft sensor modeling problem based on time series data can be defined by giving a series of industrial process data as follows:                         
                            {
                            
                                
                                    X
                                
                                
                                    t
                                    -
                                    w
                                    +
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    X
                                
                                
                                    t
                                
                            
                            ;
                            
                                
                                    Y
                                
                                
                                    t
                                    -
                                    w
                                    +
                                    1
                                
                            
                            ,
                            …
                            ,
                            
                                
                                    Y
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            }
                        
                     where                         
                            
                                
                                    X
                                
                                
                                    t
                                
                            
                            ∈
                             
                            
                                
                                    R
                                
                                
                                    m
                                
                            
                        
                     represents the auxiliary variables observed by the DCS in the real time,                         
                            m
                        
                     is the dimension of auxiliary variables,                         
                            
                                
                                    Y
                                
                                
                                    t
                                
                            
                            ∈
                             
                            
                                
                                    R
                                
                                
                            
                        
                     is the dominant variable to be predicted, and                         
                            w
                        
                     is the window size of the historical observation data. The dominant variable                         
                            
                                
                                    Y
                                
                                
                                    t
                                    +
                                    h
                                
                            
                        
                     at a certain time in the future is predicted, where h is the desirable horizon ahead the current time step. In other words, to predict                         
                            
                                
                                    Y
                                
                                
                                    t
                                    +
                                    h
                                
                            
                        
                    , the variables in (1) are available. When                         
                            h
                             
                            =
                             
                            0
                        
                    , it means that we predict the dominant variable at the current time step.”)
Regarding claim 3, Zhiqiang teaches:
a transformer model comprises the encoding network and the decoding network. (Section III. Proposed Methodology (A. Transformer) “The transformer adopts the encoder–decoder architecture, and its encoder and decoder are both stacked by multiple independent feature extractors”)
Regarding claim 4, Zhiqiang teaches:
generating loss data by evaluating a loss function based on ground-truth measurement data and the predicted measurement data; and (Section III. Proposed Methodology (D. Objective Function and Optimization Strategy) “In soft sensor modeling field, the loss function with the mean square error (MSE) is usually used as the objective function.”)
updating parameters of the transformer model based on the loss data, wherein the ground-truth measurement data including next observed measurement data of the given part at the next station. (Section III. Proposed Methodology (D. Objective Function and Optimization Strategy) “The stochastic gradient descent (SGD) algorithm with a momentum term is utilized to optimize the parameters of the model. The momentum term can help the model accelerate the convergence and avoid falling into local optimum.”)
Regarding claim 5, Zhiqiang teaches:
applying a query, a key, and a value to the decoding network, (Section III. Proposed Methodology (A. Transformer) “In the self-attention layer with multiple attention heads, query, key, and value are mapped to multiple heads through linear transformation, respectively”).
the query is computed based on the input embedding sequence, (Section III. Proposed Methodology (A. Transformer) describes deriving the query)
the key is computed based on the intermediate history features, and (Section III. Proposed Methodology (A. Transformer) describes deriving the key)
the value is computed based on the intermediate history features. (Section III. Proposed Methodology (A. Transformer) describes deriving the value)
Regarding claim 6, Zhiqiang teaches:
combining the history embedding sequence with positional embedding to generate intermediate embedding data; and (Section III. Proposed Methodology “the CNN embedding and the position embedding are combined as the input of the transformer layers”)
generating the intermediate history features by applying one or more self-attention networks to the intermediate embedding data, (Section III. Proposed Methodology “Since the self-attention mechanism is able to capture the relationship between any two moments, the output hidden state at the last time step of the transformer is kept and combined with the output of the highway network to obtain the prediction result through the fully connected linear layer.”)
the positional embedding relates to ordering and positional dependency of the history embedding sequence, and (Section III. Proposed Methodology “Since transformer contains no recurrence, it will lose timing information. Therefore, the position embedding is introduced to make full use of the order of the time series data”)
the one or more self-attention networks encode the intermediate embedding data to generate the intermediate history features. (Section III. Proposed Methodology “Since the self-attention mechanism is able to capture the relationship between any two moments, the output hidden state at the last time step of the transformer is kept and combined with the output of the highway network to obtain the prediction result through the fully connected linear layer.”)
Regarding claim 7, Zhiqiang teaches:
combining the input embedding sequence with positional embedding to generate predicted embedding data; and (Section III. Proposed Methodology “the CNN embedding and the position embedding are combined as the input of the transformer layers”)
generating the predicted measurement data by applying one or more cross-attention networks to the predicted embedding data alongside the history features, (Section III. Proposed Methodology “Since the self-attention mechanism is able to capture the relationship between any two moments, the output hidden state at the last time step of the transformer is kept and combined with the output of the highway network to obtain the prediction result through the fully connected linear layer.”)
the positional embedding relates to ordering and positional dependency of the input embedding sequence, and (Section III. Proposed Methodology “Since transformer contains no recurrence, it will lose timing information. Therefore, the position embedding is introduced to make full use of the order of the time series data”)
the one or more cross-attention networks decode the predicted embedding data alongside the history features to generate the predicted measurement data. (Section III. Proposed Methodology “Since the self-attention mechanism is able to capture the relationship between any two moments, the output hidden state at the last time step of the transformer is kept and combined with the output of the highway network to obtain the prediction result through the fully connected linear layer.”)
Regarding claim 8, Zhiqiang teaches:
generating a first set of embeddings, the first set of embeddings including (a) history measurement embeddings based on history measurement data, the history measurement data relating to attributes of one or more other parts that traversed the plurality of stations before the given part; (b) history part identifier embeddings based on one or more history part identifiers of the one or more other parts, and (c) history station identifier embeddings based on history station identifiers corresponding to the history measurement data; (Fig. 1 shows the historical data (on the left) going into the embedding layer. In Section III. Proposed Methodology “First, the gated CNN layer encodes the input data. The gated signal and the feature signal are generated by two types of convolution kernels, which are then multiplied positionwise to obtain the final CNN embedding”)
generating a second set of embeddings, the second set of embeddings including (a) measurement embeddings based on observed measurement data, the observed measurement data relating to attributes of the given part as obtained by one or more sensors at each station of a station subsequence of the station sequence; (b) part identifier embeddings based on a part identifier of the given part, and (c) station identifier embeddings based on station identifiers that corresponding to the observed measurement data; (Fig. 1 shows the observed data (X) going into the embedding layer. In Section III. Proposed Methodology “First, the gated CNN layer encodes the input data. The gated signal and the feature signal are generated by two types of convolution kernels, which are then multiplied positionwise to obtain the final CNN embedding”)
generating a history embedding sequence by concatenating the first set of embeddings; (Fig. 1 shows the embeddings being combined)
generating an input embedding sequence by concatenating the second set of embeddings; (Fig. 1 shows the embeddings being combined)
generating, via an encoding network, intermediate history features based on the history embedding sequence; and (Section III. Proposed Methodology (A Transformer) “The transformer adopts the encoder–decoder architecture, and its encoder and decoder are both stacked by multiple independent feature extractors”)
generating, via a decoding network, predicted measurement data based on the intermediate history features and the input embedding sequence, wherein the predicted measurement data includes next measurement data of the given part at a next station, the next station being after the station subsequence in the station sequence. (Introduction “A dynamic soft-sensing model is proposed to make full use of the historical information and the current observable state to predict key quality indicators at present or at a certain time in the future” and in Section C. Highway Network and Output Layer “Since the self-attention mechanism is able to capture the relationship between any two moments, the output hidden state at the last time step of the transformer [which contains a decoder] is kept and combined with the output of the highway network to obtain the prediction result through the fully connected linear layer”).
Zhiqiang does not teach:
A system comprising: a processor; and a memory in data communication with the processor, the memory having computer readable data including instructions stored thereon that, when executed by the processor, cause the processor to perform a method for predictive measurement monitoring, the method including: 
establishing a station sequence that includes a plurality of stations that a given part traverses;
However, Murai does:
A system comprising: a processor; and a memory in data communication with the processor, the memory having computer readable data including instructions stored thereon that, when executed by the processor, cause the processor to perform a method for predictive measurement monitoring, the method including: ([0059] “The term “computing system” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus may include, in addition to hardware, code that creates an execution environment for the computer program in question (e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or any appropriate combination of one or more thereof).” And [0062] “Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. Elements of a computer can 
Read full office action
SYSTEM AND METHOD WITH SEQUENCE MODELING OF SENSOR DATA FOR MANUFACTURING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

SYSTEM AND METHOD WITH SEQUENCE MODELING OF SENSOR DATA FOR MANUFACTURING

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email