DETAILED ACTION
This action is filed in response to the application filed on 10/16/2023.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim 1 is rejected under 35 U.S.C. 102(a)(1) as being anticipated by Lin (CN110363347A).
Regarding Claim 1, Lin discloses a method for automatically identifying emission sources in a source apportionment process of pollutant (e.g. see [0134] “In step (4), the category to which the given predicted sample observation sequence belongs is determined; based on the decision tree classification result, the DT-BP neural network is used to predict the given predicted sample observation sequence and the pollutant concentration value at time t is calculated,” and [0136] “The given prediction sample observation sequence can also be used to obtain real-time monitoring data of industrial emissions of urban air pollutants, including NO_NER93 emissions (kg/h), SO_NER94emissions (kg/h) and particulate matter emissions (kg/h)”), comprising:
integrating measured source profiles and factor profiles to generate a labeled data set (e.g. see [0085] “Establish time series datasets (i.e. labeled data) of relevant meteorological factors, air quality and atmospheric pollutant emissions,” and [0091] Examiner notes [0091] lists specific source data collected to create training data) and an unlabeled data set (e.g. see [0137] “For a given sequence of predicted samples (i.e. unlabeled data), its category is determined according to the decision tree classification set in advance by the model” and [0134] “In step (4), the category to which the given predicted sample observation sequence belongs is determined; based on the decision tree classification result, the DT-BP neural network is used to predict the given predicted sample observation sequence and the pollutant concentration value at time t is calculated,”), respectively; wherein the measured source profiles are priori knowledge, which are derived from actually measured samples of the emission sources and are configured for revealing physical and chemical features of the emission sources (e.g. see [0027] “1) Establish time series datasets (i.e. labeled data) of relevant meteorological factors, air quality and atmospheric pollutant emissions,” and [0069] “When λ is greater than the set value, all newly established time series datasets containing meteorological factors, air quality monitoring data and industrial emissions of air pollutants from the time the model was established to the current time are automatically loaded into the training database. Steps (2) and (3) are repeated to establish a new DT-BP neural network model,” Examiner notes [0069] explains in greater detail the process of obtaining the labeled data mentioned in [0027], which teaches that the data is from measured samples and reveal the features described in the claim limitation);
preprocessing the labeled data set to generate a continuous labeled data set (e.g. see [0049] “The acquired time series dataset is normalized so that the data is distributed between [0,1]”);
constructing a tree classification model based on the continuous labeled data set (e.g. see [0086] “The decision tree DT algorithm is used to classify the acquired training samples and generate the optimal tree structure Tα guided by air quality features and its corresponding classification results”);
optimizing the tree classification model to determine the optimized tree classification model (e.g. see [0043] “B. Selecting the optimal subtree: Using an independent validation dataset to test the squared error or Gini index of each subtree in the subtree sequence, the decision tree with the smallest squared error or Gini index is considered the optimal decision tree; each subtree corresponds to a parameter α. Once the optimal subtree T<sub>k</sub> is determined, α<sub>k</sub> is also determined, that is, the optimal subtree Tα”);
coupling the optimized tree classification model and a pseudo-labeling algorithm to generate an integrated model based on the unlabeled data set, so as to automatically identify the factor profiles in the unlabeled data set (e.g. see [0134] “In step (4), the category to which the given predicted sample observation sequence belongs is determined; based on the decision tree classification result, the DT-BP neural network is used to predict the given predicted sample observation sequence and the pollutant concentration value at time t is calculated,”); and
determining types of the emission sources based on the factor profiles (e.g. see [0136] “The given prediction sample observation sequence can also be used to obtain real-time monitoring data of industrial emissions of urban air pollutants, including NO_NER93 emissions (kg/h), SO_NER94emissions (kg/h) and particulate matter emissions (kg/h)”).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim 2 is rejected under 35 U.S.C. 103 as being unpatentable over Lin (CN110363347A) in view of Zheng (CN110334732A).
Regarding Claim 2, Lin teaches the limitations of Claim 1. Lin further discloses normalizing independent variables of the measured source profiles to generate normalized measured source profiles (e.g. see [0049] “The acquired time series dataset is normalized so that the data is distributed between [0,1]”); and
encoding dependent variables of the normalized measured source profiles to form the continuous labeled data set (e.g. see [0050] “The training data is input into the established neural network, and the network weight coefficients are corrected by using the error between the actual output and the expected output. The convergence condition is whether the training error reaches the set value. An air quality forecast model is then established”).
Lin does not explicitly disclose wherein preprocessing the labeled data set to generate the continuous labeled data set comprises: oversampling a measured source spectrum data in the labeled data set to generate oversampled measured source spectrum data.
In the same field of endeavor Zheng teaches wherein preprocessing the labeled data set to generate the continuous labeled data set comprises: oversampling a measured source spectrum data in the labeled data set to generate oversampled measured source spectrum data (e.g. see [0063] “Step 6: Resample the training sample files using the SMOTE method. Since air quality monitoring values typically follow a normal distribution, the amount of data corresponding to high concentration values is significantly less than the amount of data corresponding to other concentration values. This indicates that the probability of high concentration events is relatively low. This can lead to inaccuracies in predictions under conditions of heavy pollution. Therefore, the training sample files mentioned above need to be processed again. In this application, the SMOTE method is used for resampling. SMOTE (Synthetic Minority Oversampling Technique) is a synthetic minority oversampling technique”).
It would have been obvious to one of ordinary skill in the art before the effective filling date to combine the sample data with the oversampling techniques of Zheng for the purpose of identifying emissions sources with the advantage of ensuring the resultant dataset is balanced.
Claims 3 is rejected under 35 U.S.C. 103 as being unpatentable over Lin (CN110363347A) in view of Qi (CN114418110A).
Regarding Claim 3, Lin teaches the limitations of Claim 1. Lin further discloses training a plurality of machine learning models by using the training data set to generate a plurality of trained machine learning models (e.g. see [0065] “The training data is input into the established neural network, and the network weight coefficients are corrected using the error between the actual output and the expected output. The algorithm converges when the training error is less than the minimum expected error. The algorithm ends when the maximum number of iterations is reached, and the training of the neural network is completed,” and [0045] “In step (3) of the above technical solution, the number of BP neural network models is the same as the number of optimal classifications in the decision tree. Let the number of optimal classifications be m,and the number of BP neural network models be m,” Examiner notes [0045] teaches there can be more than one machine learning model).
Lin does not explicitly disclose wherein constructing the tree classification model based on the continuous labeled data set comprises: dividing the continuous labeled data set into a training data set and a testing data set; testing each of the trained machine learning models by using the testing data set to generate evaluation indexes, wherein the evaluation indexes comprise accuracy, a precision rate and a recall rate; and screening one of the machine learning models as the tree classification model based on all of the evaluation indexes.
In the same field of endeavor, Qi teaches wherein constructing the tree classification model based on the continuous labeled data set comprises: dividing the continuous labeled data set into a training data set and a testing data set (e.g. see [n0041] “Divide the standardized dataset into two parts: a training set and a test set”);
testing each of the trained machine learning models by using the testing data set to generate evaluation indexes, wherein the evaluation indexes comprise accuracy, a precision rate and a recall rate (e.g. see [n0046] “P33: Build a solid ash source tracing model on the entire training set using a machine learning algorithm with determined hyperparameters, and use the test set to judge the reliability of the model. Evaluation metrics include, but are not limited to, accuracy, precision, true positive rate (TPR), false positive rate (FPR), recall,”) and
screening one of the machine learning models as the tree classification model based on all of the evaluation indexes (e.g. see [n0067] “Specifically, this includes: training the optimal random forest using the entire training set and testing the performance of the random forest model on the test set. In this example, the prediction results from the test set are selected as the source tracing prediction results for this solid ash. The trained random forest model was used to predict the source of solid ash in the test set. The results are shown in Figure 2. On the training set, the accuracy, precision, recall and AUC area of the random forest predictions can reach 0.915, 0.925 and 0.986 respectively. It can be seen that the predicted value is very close to the actual value, which shows that it is feasible to use solid ash oxides as the influencing factor for tracing the source and to use machine learning to predict its source. (i.e. screening the model)”).
It would have been obvious to one of ordinary skill in the art before the effective filling date to combine the labeled data of Lin with the training and testing distinctions and evaluation indices of Qi for the purpose of identifying emission sources with the advantage of ensuring the reliability and accuracy of the model being utilized.
Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Lin (CN110363347A) in view of Zhao (CN115542429A) and in further view of Qi (CN114418110A).
Regarding Claim 4, Lin teaches the limitations of Claim 1. While Lin does teach optimizing the tree classification model (e.g. see [0028] “The decision tree DT algorithm is used to classify the acquired training samples and generate the optimal tree structure Tα guided by air quality features and its corresponding classification results”) and optimizing key parameters (e.g. see [0043] “B. Selecting the optimal subtree: Using an independent validation dataset to test the squared error or Gini index of each subtree in the subtree sequence, the decision tree with the smallest squared error or Gini index is considered the optimal decision tree; each subtree corresponds to a parameter α. Once the optimal subtree T<sub>k</sub> is determined, α<sub>k</sub> is also determined, that is, the optimal subtree Tα”), Lin does not explicitly disclose wherein optimizing the tree classification model to determine the optimized tree classification model comprises: traversing a gradient change of key parameters of the optimized tree classification model to determine optimal key parameters, wherein the key parameters comprise a number of decision trees and a maximum number of features; and optimizing the tree classification model based on the optimal key parameters to determine the optimized tree classification model.
In the same field of endeavor, Zhao teaches wherein optimizing the tree classification model to determine the optimized tree classification model comprises: traversing a gradient change of key parameters of the optimized tree classification model to determine optimal key parameters (e.g. see [e.g. see pg. 5 paragraph 3] “supposing XGBoost to generate k trees (k weak prediction model), for any input x, each tree has an output fx (x), the output of the XGBoost model is in the iteration process, using gradient descent idea, based on the generated tree, taking the optimized target function as the total target, iteratively generating a new tree. The target of each iterative learning is to reduce the loss value of the accumulated result of the generated model”),
optimizing the tree classification model based on the optimal key parameters to determine the optimized tree classification model (e.g. see [pg. 6 paragraphs 3.1-3.2] “according to the training data size and problem description, through model super parameter optimization process, determining the decision tree number of LightGBM. setting model hyper-parameter, training the XGBoost model by training set, and setting loss function as the evaluation index of the mean square error (Mean Squared Error, MSE) as training; determining model hyper-parameter Bayesian parameter optimization mode,”).
It would have been obvious to one of ordinary skill in the art before the effective filling date to combine the optimization of the tree classification model of Lin with the gradient change of key parameters of Zhao for the purpose of identifying emissions sources with the advantage of enhanced accuracy and precision.
Lin as modified by Zhao does not explicitly disclose wherein the key parameters comprise a number of decision trees and a maximum number of features. In the same field of endeavor, Qi teaches wherein the key parameters comprise a number of decision trees and a maximum number of features (e.g. see [n0065] “The optimized hyperparameters of the random forest in this example are as follows: the maximum number of features used per decision tree is set to max_features = 0.401499393, the number of decision trees is set to n_estimators = 91”).
It would have been obvious to one of ordinary skill in the art before the effective filling date to combine the optimization method of Lin as modified by Zhao with the specific key parameters of Qi for the purpose of identifying emissions sources with the advantage of enhanced accuracy and precision.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Lin (CN110363347A) in view of Qi (CN114418110A) and in further view of Boonphun, Jirat & Kaisornsawad, Chalat & Wongchaisuwat, Papis. (2019). Machine learning algorithms for predicting air pollutants. E3S Web of Conferences. 120. 03004. 10.1051/e3sconf/201912003004 (hereinafter “Boonphun et. al”) .
Regarding Claim 5, Lin and Qi teaches the limitations of Claim 3. Lin further discloses wherein coupling the optimized tree classification model and a pseudo-labeling algorithm to generate an integrated model based on the unlabeled data set, so as to automatically identify the factor profiles in the unlabeled data set (e.g. see [0134] “In step (4), the category to which the given predicted sample observation sequence belongs is determined; based on the decision tree classification result, the DT-BP neural network is used to predict the given predicted sample observation sequence and the pollutant concentration value at time t is calculated,”) comprises:
assigning pseudo labels to the screened factor profiles by using the pseudo-labeling algorithm (e.g. see [0127] “The output layer neurons are the predicted values that match the pollutants in the input layer, such as PM<sub>2.5</sub> concentration (μg/m<sup>3</sup>), PM<sub>10</sub> concentration(μg/m<sup>3</sup>), CO_ concentration (mg/m<sup>3</sup>), NO<sub>2</sub> concentration(μg/m<sup>3</sup>), SO<sub>2</sub> concentration (μg/m<sup>3</sup>), O<sub>3</sub>concentration (μg/m<sup>3</sup>), or they can be the AQI index directly”);
adding a data set of the factor profiles assigned with the pseudo labels to the training data set to form a new training data set; constructing a new tree classification model based on the new training data set to identify remaining factor profiles in the unlabeled data set (e.g. see [0068-0069] “The iterative strategy in step (5) of the above technical solution uses the predicted value at time t as the input value at time t+1 to predict the air quality at time t+1, thereby obtaining continuous air quality forecast results. In step (6) of the above technical solution, the number of times a dataset that does not meet the decision tree classification rules appears is recorded as λ. When λ is greater than the set value, all newly established time series datasets containing meteorological factors, air quality monitoring data and industrial emissions of air pollutants from the time the model was established to the current time are automatically loaded into the training database. Steps (2) and (3) are repeated to establish a new DT-BP neural network model”).
Lin does not explicitly disclose screening factor profiles with prediction probabilities greater than a predetermined probability from the unlabeled data set by using the integrated model.
In the same field of endeavor, Boonphun et al teaches (e.g. see [ Section 3.3] “The predicted probability associated with each class is achieved from the classification models. Multiple cut-off probabilities to identify the predicted class are also thoroughly investigated,” and [Section 5] “For the neural network classification model, 1 hidden layer with 512 hidden nodes are selected after extensively experiments. The cutoff probability of 0.5 is particularly chosen”).
It would have been obvious to one of ordinary skill in the art before the effective filling date to combine the pollution predictions of Lin with the threshold of Boonphun et. al for the purpose of determining pollutant emissions sources with the advantage of a uniform method of deciding whether a pollutant is included or excluded.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NYLA GAVIA whose telephone number is (703)756-1592. The examiner can normally be reached M-F 8:30-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Catherine Rastovski can be reached at 571-270-0349. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/NYLA GAVIA/ Examiner, Art Unit 2863
/Catherine T. Rastovski/ Supervisory Primary Examiner, Art Unit 2863