Prosecution Insights
Last updated: May 29, 2026
Application No. 17/703,569

INFORMATION PROCESSING APPARATUS, NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM, AND INFORMATION PROCESSING METHOD

Non-Final OA §103
Filed
Mar 24, 2022
Priority
Sep 30, 2019 — continuation of PCTJP2019038478
Examiner
HUANG, YAO D
Art Unit
2124
Tech Center
2100 — Computer Architecture & Software
Assignee
Mitsubishi Electric Corporation
OA Round
1 (Non-Final)
63%
Grant Probability
Moderate
1-2
OA Rounds
0m
Est. Remaining
95%
With Interview

Examiner Intelligence

Grants 63% of resolved cases
63%
Career Allowance Rate
80 granted / 127 resolved
+8.0% vs TC avg
Strong +32% interview lift
Without
With
+31.8%
Interview Lift
resolved cases with interview
Typical timeline
4y 0m
Avg Prosecution
16 currently pending
Career history
146
Total Applications
across all art units

Statute-Specific Performance

§101
2.6%
-37.4% vs TC avg
§103
92.9%
+52.9% vs TC avg
§102
2.4%
-37.6% vs TC avg
§112
2.1%
-37.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 127 resolved cases

Office Action

§103
Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. 1. Claims 1, 4-6, 8-9, 14-15, and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Shyr et al. (US 2015/0286704 A1) (“Shyr”) in view of Shao et al., “Feature selection for manufacturing process monitoring using cross-validation,” Journal of Manufacturing Systems 32 (2013) 550– 555 (“Shao”) and Amershi et al. (US 2016/0162803 A1) (“Amershi”). As to claim 1, Shyr teaches an information processing apparatus comprising: a storage device to store: [[0030]: “FIG. 2 depicts a block diagram, 200, of respective components of computing device(s) 110, 120, 130, 140, 150, 160 and 170, in accordance with an illustrative embodiment of the present invention.” [0032]: “Memory 206 and persistent storage 208 are computer-readable storage media.” Note that because the operations in this reference are performed on a computer, they are necessarily stored on a memory device of a computer.] a feature vector set including a plurality of feature vectors [[0088], table teaches that “xi = (xi1, …, xiK)” are feature vectors with K features] […]; […] and processing circuitry [[0033]: “Mapper program 115, reducer program 155, controller program 175, CF- tree data 156 and 157, cluster analysis 176, and data 192, 194, 196 and 198 are stored in persistent storage 208 for execution and/or access by one or more of the respective computer processors 204 via one or more memories of memory 206.”] to calculate an average clustering accuracy of each of the […] label sets to calculate a plurality of the average clustering accuracies corresponding to the […] label sets, the average clustering accuracy being an average value of a clustering accuracy of clustering performed on a subset by using the […] label set, the subset being obtained by dividing the feature vectors by each of multiple elements indicated by the respective […] labels; [[0055]: “In this embodiment, given that hierarchical clustering starts with leaf-entries, i.e. small sub-clusters, a new goodness measure is herein used and defined as the weighted average Silhouette coefficient over all starting sub-clusters. The average Silhouette coefficient ranges between −1 (indicating a very poor model) and +1 (indicating an excellent model). This measure avoids bias of variable selection criteria with respect to the number of variables.” Note that “Silhouette coefficient” is a measure of clustering accuracy/success, particularly in the form of whether the resulting model is poor (failure) or excellent (success), and that this clustering constitutes “dividing the feature vector by each of multiple elements...” See also [0223]: “Clustering model goodness is defined as the weighted average Silhouette coefficient over all starting sub-clusters in the final stage of regular HAC, and is given by equation 68, as follows.” In regards to the sets of data that are being used, the Algorithm in Table 10 (at [0244]) teaches that “Let F(keyr) be the set of all available features” and “Find Fα(keyr), the set of the most unimportant α features.” Which are analogous to the label (feature) sets of the instant claim. It is noted that the limitations of “quality” and “non-quality” are application-specific limitations that are addressed by a different reference, but the technique in this reference is relevant to such labels, since this reference mentions feature importance. See [0048]: “Embodiments of the present invention utilize a comprehensive method to remove the least important variables in the sequential backward variable selection.” Since the features are removed, there exists some plurality of sets of features that are unimportant in the form of feature sets that include features that are unimportant.] […] Shyr does not explicitly teach: (1) The plurality of feature vectors being “generated by extracting a predetermined feature from each of multiple pieces of digital data indicating measurement values obtained by measuring a target.” (2) “a quality label set including a plurality of quality labels corresponding to the multiple pieces of digital data and indicating quality of the target; and a plurality of non-quality label sets each including a plurality of non-quality labels, the non-quality labels corresponding to the multiple pieces of digital data and being of a type expected to be independent of the quality of the target” and the related limitation of the features being analyzed including such “quality” and “non-quality” labels.” (3) “to generate a screen image enabling identification of at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the average clustering accuracies.” Shao teaches a plurality of feature vectors being “generated by extracting a predetermined feature from each of multiple pieces of digital data indicating measurement values obtained by measuring a target” [§ 3.2 (“Feature extraction”): “Watt meter and microphone signals are employed for process monitoring of ultrasonic metal welding. Fig. 4, Fig. 5 show typical signals from these two sensors. In addition, several process data such as the total weld time, total energy, maximum power, tool displacement before vibration, and tool displacement after vibration, are recorded through the welding system without external sensors. These data actually indicate the process conditions and therefore are also included in the candidate feature set. Thus, in total 81 candidate features are extracted either from sensor signals or process data, and they are indexed from Feature 1 to Feature 81 accordingly.”] and “a quality label set including a plurality of quality labels corresponding to the multiple pieces of digital data and indicating quality of the target; and a plurality of non-quality label sets each including a plurality of non-quality labels, the non-quality labels corresponding to the multiple pieces of digital data and being of a type expected to be independent of the quality of the target” and the related limitation of the features being analyzed including such “quality” and “non-quality” labels. [Abstract: “Due to the recent development in sensing technology, many on-line signals are collected for manufacturing process monitoring and feature extraction is then performed to extract critical features related to product/process quality.” § 3.3, paragraph 1: “With the limited engineering knowledge about monitoring signals used in the ultrasonic welding operation, some previously defined features may contain little information about welding quality, so it is necessary to carry out feature screening prior to feature selection using cross-validation in order to reduce the extensive computations required in the next step of feature selection.” See also § 1, paragraph 4: “Thus, signal features without good physical understanding may be irrelevant or redundant. Under this circumstance, feature selection is commonly applied to pick a minimally sized subset of features for monitoring. By removing a large number of irrelevant and redundant features, feature selection is able to help avoid overfitting, improve model performance, provide more efficient and cost-effective process monitoring, and acquire better insights into the underlying processes that generated the data.” That is, “irrelevant and redundant features” are those that are “of a type expected to be independent of the quality of the target.”] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Shyr with the teachings of Shao by applying the technique of Shyr to the application of feature selection for manufacturing process monitoring (see Shao, title: “Feature selection for manufacturing process monitoring using cross-validation”) so as to arrive at the above-quoted features of the claimed invention, including the limitations of the labels being “quality” and “non-quality” as recited in the instant claim. The motivation would have been to determine features that are related to product or process quality in a problem that is known (see Shao, abstract: “feature extraction is then performed to extract critical features related to product/process quality”; § 1, paragraph 4 (part quoted above).). The combination of references thus far does not teach the limitation of “to generate a screen image enabling identification of at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the average clustering accuracies.” Amershi teaches “to generate a screen image enabling identification of at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the average clustering accuracies.” [[0065]: “The I/O controller 216 can provide an output to the user device 102 to cause the feature ideation user interface 132 to be displayed.” [0043]: “In some examples, the candidate features 344 can be rendered in a manner that indicates a ranking. The candidate features 344 can begin with candidate features that rank higher on some criteria than candidate features near the end of the rendered candidate features 344.” It is noted that the feature of “at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the average clustering accuracies” is already taught in the existing combination of references, because the base reference (Shyr) teaches labels that are unimportant.] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Amershi by implementing the technique of displaying an interface that indicates a ranking of features, so as to arrive at the claimed invention. The motivation would have been to present information to a user that enables configuration of potential candidate features (see Amershi, [0011]: “According to some examples, the user interface is designed to present candidate features for consideration by the user.”). As to claim 4, the combination of Shyr, Shao, and Amershi teaches the information processing apparatus according to claim 1, wherein the clustering accuracy is a success rate of clustering or a failure rate of clustering. [Shyr, [0055]: “In this embodiment, given that hierarchical clustering starts with leaf-entries, i.e. small sub-clusters, a new goodness measure is herein used and defined as the weighted average Silhouette coefficient over all starting sub-clusters. The average Silhouette coefficient ranges between −1 (indicating a very poor model) and +1 (indicating an excellent model). This measure avoids bias of variable selection criteria with respect to the number of variables.” Note that “Silhouette coefficient” is a measure of clustering success, particularly in the form of whether the resulting model is poor (failure) or excellent (success). See also Shyr, [0223]: “Clustering model goodness is defined as the weighted average Silhouette coefficient over all starting sub-clusters in the final stage of regular HAC, and is given by equation 68, as follows.”] As to claim 5, the combination of Shyr, Shao, and Amershi teaches the information processing apparatus according to claim 1, further comprising: a display device to display the screen image. [Shyr, [0037]: “Display 220 provides a mechanism to display data to a user and may be, for example, a computer monitor, or a television screen.” Furthermore, Amershi as discussed above, teaches displaying an interface for a user. Therefore, the instant limitation is met by the combination of references.] As to claim 6, Shyr teaches an information processing apparatus comprising: a storage device to store: [[0030]: “FIG. 2 depicts a block diagram, 200, of respective components of computing device(s) 110, 120, 130, 140, 150, 160 and 170, in accordance with an illustrative embodiment of the present invention.” [0032]: “Memory 206 and persistent storage 208 are computer-readable storage media.” Note that because the operations in this reference are performed on a computer, they are necessarily stored on a memory device of a computer.] a feature vector set including a plurality of feature vectors [[0088], table teaches that “xi = (xi1, …, xiK)” are feature vectors with K features] […]; […] and processing circuitry [[0033]: “Mapper program 115, reducer program 155, controller program 175, CF- tree data 156 and 157, cluster analysis 176, and data 192, 194, 196 and 198 are stored in persistent storage 208 for execution and/or access by one or more of the respective computer processors 204 via one or more memories of memory 206.”] to calculate, for a […] label set corresponding to […] labels of one type selected from the plurality of […] labels, a clustering accuracy of clustering performed on a subset by using the […] label set to calculate a plurality of the clustering accuracies, the subset being obtained by dividing the feature vectors by each of multiple elements indicated by the […] labels; [[0055]: “In this embodiment, given that hierarchical clustering starts with leaf-entries, i.e. small sub-clusters, a new goodness measure is herein used and defined as the weighted average Silhouette coefficient over all starting sub-clusters. The average Silhouette coefficient ranges between −1 (indicating a very poor model) and +1 (indicating an excellent model). This measure avoids bias of variable selection criteria with respect to the number of variables.” Note that “Silhouette coefficient” is a measure of clustering accuracy/success, particularly in the form of whether the resulting model is poor (failure) or excellent (success), and that this clustering constitutes “dividing the feature vector by each of multiple elements.... See also [0223]: “Clustering model goodness is defined as the weighted average Silhouette coefficient over all starting sub-clusters in the final stage of regular HAC, and is given by equation 68, as follows.” In regards to the sets of data that are being used, the Algorithm in Table 10 (at [0244]) teaches that “Let F(keyr) be the set of all available features” and “Find Fα(keyr), the set of the most unimportant α features.” Which are analogous to the label (feature) sets of the instant claim. It is noted that the limitations of “quality” and “non-quality” are application-specific limitations that are addressed by a different reference, but the technique in this reference is relevant to such labels, since this reference mentions feature importance. See [0048]: “Embodiments of the present invention utilize a comprehensive method to remove the least important variables in the sequential backward variable selection.” Since the features are removed, there exists some plurality of sets of features that are unimportant in the form of feature sets that include features that are unimportant.] […] Shyr does not explicitly teach: (1) The plurality of feature vectors being “generated by extracting a predetermined feature from each of multiple pieces of digital data indicating measurement values obtained by measuring a target.” (2) “a quality label set including a plurality of quality labels corresponding to the multiple pieces of digital data and indicating quality of the target; and a plurality of non-quality label sets each including a plurality of non-quality labels, the non-quality labels corresponding to the multiple pieces of digital data and being of a type expected to be independent of the quality of the target” and the related limitation of the features being analyzed including such “quality” and “non-quality” labels.” (3) “to generate a screen image enabling identification of at least one of the elements adversely affecting quality of the multiple pieces of digital data by using the clustering accuracies.” Shao teaches a plurality of feature vectors being “generated by extracting a predetermined feature from each of multiple pieces of digital data indicating measurement values obtained by measuring a target” [§ 3.2 (“Feature extraction”): “Watt meter and microphone signals are employed for process monitoring of ultrasonic metal welding. Fig. 4, Fig. 5 show typical signals from these two sensors. In addition, several process data such as the total weld time, total energy, maximum power, tool displacement before vibration, and tool displacement after vibration, are recorded through the welding system without external sensors. These data actually indicate the process conditions and therefore are also included in the candidate feature set. Thus, in total 81 candidate features are extracted either from sensor signals or process data, and they are indexed from Feature 1 to Feature 81 accordingly.”] and “a quality label set including a plurality of quality labels corresponding to the multiple pieces of digital data and indicating quality of the target; and a plurality of non-quality label sets each including a plurality of non-quality labels, the non-quality labels corresponding to the multiple pieces of digital data and being of a type expected to be independent of the quality of the target” and the related limitation of the features being analyzed including such “quality” and “non-quality” labels. [Abstract: “Due to the recent development in sensing technology, many on-line signals are collected for manufacturing process monitoring and feature extraction is then performed to extract critical features related to product/process quality.” § 3.3, paragraph 1: “With the limited engineering knowledge about monitoring signals used in the ultrasonic welding operation, some previously defined features may contain little information about welding quality, so it is necessary to carry out feature screening prior to feature selection using cross-validation in order to reduce the extensive computations required in the next step of feature selection.” See also § 1, paragraph 4: “Thus, signal features without good physical understanding may be irrelevant or redundant. Under this circumstance, feature selection is commonly applied to pick a minimally sized subset of features for monitoring. By removing a large number of irrelevant and redundant features, feature selection is able to help avoid overfitting, improve model performance, provide more efficient and cost-effective process monitoring, and acquire better insights into the underlying processes that generated the data.” That is, “irrelevant and redundant features” are those that are “of a type expected to be independent of the quality of the target.”] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Shyr with the teachings of Shao by applying the technique of Shyr to the application of feature selection for manufacturing process monitoring (see Shao, title: “Feature selection for manufacturing process monitoring using cross-validation”) so as to arrive at the above-quoted features of the claimed invention, including the limitations of the labels being “quality” and “non-quality” as recited in the instant claim. The motivation would have been to determine features that are related to product or process quality in a problem that is known (see Shao, abstract: “feature extraction is then performed to extract critical features related to product/process quality”; § 1, paragraph 4 (part quoted above).). The combination of references thus far does not teach the limitation of “to generate a screen image enabling identification of at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the clustering accuracies.” Amershi teaches “to generate a screen image enabling identification of at least one of the elements adversely affecting quality of the multiple pieces of digital data by using the clustering accuracies.” [[0065]: “The I/O controller 216 can provide an output to the user device 102 to cause the feature ideation user interface 132 to be displayed.” [0043]: “In some examples, the candidate features 344 can be rendered in a manner that indicates a ranking. The candidate features 344 can begin with candidate features that rank higher on some criteria than candidate features near the end of the rendered candidate features 344.” It is noted that the feature of “at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the clustering accuracies” is already taught in the existing combination of references, because the base reference (Shyr) teaches labels that are unimportant.] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Amershi by implementing the technique of displaying an interface that indicates a ranking of features, so as to arrive at the claimed invention. The motivation would have been to present information to a user that enables configuration of potential candidate features (see Amershi, [0011]: “According to some examples, the user interface is designed to present candidate features for consideration by the user.”). As to claim 8, the combination of Shyr, Shao, and Amershi teaches the information processing apparatus according to claim 6, wherein the clustering accuracy is a success rate of clustering or a failure rate of clustering. [Shyr, [0055]: “In this embodiment, given that hierarchical clustering starts with leaf-entries, i.e. small sub-clusters, a new goodness measure is herein used and defined as the weighted average Silhouette coefficient over all starting sub-clusters. The average Silhouette coefficient ranges between −1 (indicating a very poor model) and +1 (indicating an excellent model). This measure avoids bias of variable selection criteria with respect to the number of variables.” Note that “Silhouette coefficient” is a measure of clustering success, particularly in the form of whether the resulting model is poor (failure) or excellent (success). See also Shyr, [0223]: “Clustering model goodness is defined as the weighted average Silhouette coefficient over all starting sub-clusters in the final stage of regular HAC, and is given by equation 68, as follows.”] As to claim 9, the combination of Shyr, Shao, and Amershi teaches the information processing apparatus according to claim 6, further comprising: a display device configured to display the screen image. [Shyr, [0037]: “Display 220 provides a mechanism to display data to a user and may be, for example, a computer monitor, or a television screen.” Furthermore, Amershi as discussed above, teaches displaying an interface for a user. Therefore, the instant limitation is met by the combination of references.] As to claim 14, the rejection made to claim 1 is applied to claim 14. As to claim 15, the rejection made to claim 6 is applied to claim 15. As to claim 17, the rejection made to claim 1 is applied to claim 17. As to claim 18, the rejection made to claim 6 is applied to claim 18. 2. Claims 2 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Shyr in view of Shao and Amershi, and further in view of Forman (US 2004/0059697 A1) (“Forman”). As to claim 2, the combination of Shyr, Shao, and Amershi teaches the information processing apparatus according to claim 1, as set forth above. Amershi further teaches “wherein the processing circuitry generates, as the screen image, a label-type evaluation screen image indicating at least one of the non-quality label types” [[0065]: “The I/O controller 216 can provide an output to the user device 102 to cause the feature ideation user interface 132 to be displayed.” [0043]: “In some examples, the candidate features 344 can be rendered in a manner that indicates a ranking. The candidate features 344 can begin with candidate features that rank higher on some criteria than candidate features near the end of the rendered candidate features 344.”] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of the references combined thus far, including the above teachings of Amershi, so as to arrive at the above-discussed feature of the instant dependent claim. The motivation for doing so is covered by the motivation given for the teachings of Amershi in the rejection of the parent independent claim. The combination of references thus far does not explicitly teach “in a descending order of the average clustering accuracies.” Forman teaches “in a descending order of the average clustering accuracies.” [Claim 24: “prior to said selecting, sorting said features by respective binomial separation score in an ascending or descending order.” Claim 24: “selecting features best suited for said task according to said ascending or descending order.” Note that in this context, the binominal separation score (BNS) is analogous to a metric, such as the clustering accuracy metric that is already taught by the existing combination of references. It is noted that the instant claim only requires the screen image to “indicate” at least one label type in some ranked order, and does not require display of the value of the clustering accuracy on the screen.] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Forman by implementing the ranking in Amershi to be in descending order. The motivation would have been to implement a suitable manner of ranking or comparing features, as suggested by Forman (see part quoted above). As to claim 7, the combination of Shyr, Shao, and Amershi teaches the information processing apparatus according to claim 6, as set forth above. Amershi further teaches “wherein the processing circuitry generates, as the screen image, an accuracy-influence-element evaluation screen image indicating at least one of the elements […]” [[0065]: “The I/O controller 216 can provide an output to the user device 102 to cause the feature ideation user interface 132 to be displayed.” [0043]: “In some examples, the candidate features 344 can be rendered in a manner that indicates a ranking. The candidate features 344 can begin with candidate features that rank higher on some criteria than candidate features near the end of the rendered candidate features 344.”] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of the references combined thus far, including the above teachings of Amershi, so as to arrive at the above-discussed feature of the instant dependent claim. The motivation for doing so is covered by the motivation given for the teachings of Amershi in the rejection of the parent independent claim. The combination of references thus far does not explicitly teach “in an ascending order of the clustering accuracies.” Forman teaches “in a descending order of the clustering accuracies.” [Claim 24: “prior to said selecting, sorting said features by respective binomial separation score in an ascending or descending order.” Claim 24: “selecting features best suited for said task according to said ascending or descending order.” Note that in this context, the binominal separation score (BNS) is analogous to a metric, such as the clustering accuracy metric that is already taught by the existing combination of references. It is noted that the instant claim only requires the screen image to “indicate” at least one label type in some ranked order, and does not require display of the value of the clustering accuracy on the screen.] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Forman by implementing the ranking in Amershi to be in an ascending order. The motivation would have been to implement a suitable manner of ranking or comparing features, as suggested by Forman (see part quoted above). 3. Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Shyr in view of Shao and Amershi, and further in view of Forman and Ronen et al. (US 2015/0073931 A1) (“Ronen”) As to claim 3, the combination of Shyr, Shao, and Amershi teaches the information processing apparatus according to claim 1, wherein the processing circuitry to calculate a reference clustering accuracy, the reference clustering accuracy being a clustering accuracy of clustering performed on the feature vectors by using the quality label set, [[0053]: “In decision process 335, reducer program 155 determines whether the goodness measure is improved. If the goodness measure is improved (decision process, 335, yes branch), then reducer program 155 computes variable importance for the set of selected variables, in process 340.” Note that “improved” in this context refers to the previous goodness measure, which is a clustering accuracy as discussed in the rejection of the parent claim.] to calculate a plurality of improvement amounts by subtracting the reference clustering accuracy from the respective average clustering accuracies, [[0053]: “In decision process 335, reducer program 155 determines whether the goodness measure is improved. If the goodness measure is improved (decision process, 335, yes branch), then reducer program 155 computes variable importance for the set of selected variables, in process 340.” Note that “improved” in this context refers to the previous goodness measure, which is a clustering accuracy as discussed in the rejection of the parent claim.] Amershi further teaches “to generate, as the screen image, an accuracy-improvement-amount screen image indicating at least one of the non-quality label types […].” [[0077]: “At block 812, the feature ideator 114 ranks each list of candidate words by its accuracy improvement scores (for both errors and contrasts).” See also [0104]-[0106].] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of the references combined thus far, including the above teachings of Amershi, so as to arrive at the claimed invention of the instant dependent claim. The motivation for doing so is covered by the motivation given for the teachings of Amershi in the rejection of the parent independent claim. The combination of references thus far does not explicitly teach “in a descending order of the improvement amounts together with the corresponding improvement amount.” Forman teaches “in a descending order of the improvement amounts together.” [Claim 24: “prior to said selecting, sorting said features by respective binomial separation score in an ascending or descending order.” Claim 24: “selecting features best suited for said task according to said ascending or descending order.” Note that in this context, the binominal separation score (BNS) is analogous to a metric, such as the improvement amounts metric that is already taught by the existing combination of references.] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Forman by implementing the ranking in Amershi to be in descending order. The motivation would have been to implement a suitable manner of ranking or comparing features, as suggested by Forman (see part quoted above). The combination of references thus far does not teach the limitation of “together with the corresponding improvement amount.” Ronen teaches “together with the corresponding improvement amount.” [[0071]: “The results of the comparison are then provided on a graphical user interface such that the administrator can visually appreciate the value of specific features and how they influence recommendations received from the recommender system 100. This is illustrated at step 420. In one embodiment the graphical user interface displays the features ranked by their associated score. This may be presented in a table or other format that allows the administrator to understand the results. In another embodiment the graphical user provides the administrator with a graph or plot whereby the features are graphed by their score as against the Root Mean Squared Error measure of quality. An example of a graph that may be presented to the user is illustrated in FIG. 5.”] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Ronen by modifying the screen image in the combination of references thus far such that the label types in descending order are displayed together with their improvement amounts, as suggested by Ronen’s method of displaying scores on an interface. The motivation would have been to provide a comparison of features on a graphical user interface such that the administrator can visually appreciate the value of specific features and how they influence, as suggested by Ronen (see parts of [0071] quoted above). 4. Claims 10, 12-13, 16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Shyr in view of Shao, Jiang et al., “Error estimation based on variance analysis of k -fold cross-validation,” Pattern Recognition 69 (2017) 94–106 (“Jiang”) and Amershi. As to claim 10, Shyr teaches an information processing apparatus comprising: a storage device to store: [[0030]: “FIG. 2 depicts a block diagram, 200, of respective components of computing device(s) 110, 120, 130, 140, 150, 160 and 170, in accordance with an illustrative embodiment of the present invention.” [0032]: “Memory 206 and persistent storage 208 are computer-readable storage media.” Note that because the operations in this reference are performed on a computer, they are necessarily stored on a memory device of a computer.] a feature vector set including a plurality of feature vectors [[0088], table teaches that “xi = (xi1, …, xiK)” are feature vectors with K features] […]; […] and processing circuitry [[0033]: “Mapper program 115, reducer program 155, controller program 175, CF- tree data 156 and 157, cluster analysis 176, and data 192, 194, 196 and 198 are stored in persistent storage 208 for execution and/or access by one or more of the respective computer processors 204 via one or more memories of memory 206.”] to calculate, for each of the non-quality label sets, […] a clustering accuracy of clustering performed on a subset by using the […] label set to calculate a plurality of the […] corresponding to the […] label sets, the subset being obtained by dividing the feature vectors by each of multiple elements indicated by the […] labels; [[0055]: “In this embodiment, given that hierarchical clustering starts with leaf-entries, i.e. small sub-clusters, a new goodness measure is herein used and defined as the weighted average Silhouette coefficient over all starting sub-clusters. The average Silhouette coefficient ranges between −1 (indicating a very poor model) and +1 (indicating an excellent model). This measure avoids bias of variable selection criteria with respect to the number of variables.” Note that “Silhouette coefficient” is a measure of clustering accuracy/success, particularly in the form of whether the resulting model is poor (failure) or excellent (success), and that this clustering constitutes “dividing the feature vector by each of multiple elements.... See also [0223]: “Clustering model goodness is defined as the weighted average Silhouette coefficient over all starting sub-clusters in the final stage of regular HAC, and is given by equation 68, as follows.” In regards to the sets of data that are being used, the Algorithm in Table 10 (at [0244]) teaches that “Let F(keyr) be the set of all available features” and “Find Fα(keyr), the set of the most unimportant α features.” Which are analogous to the label (feature) sets of the instant claim. It is noted that the limitations of “quality” and “non-quality” are application-specific limitations that are addressed by a different reference, but the technique in this reference is relevant to such labels, since this reference mentions feature importance. See [0048]: “Embodiments of the present invention utilize a comprehensive method to remove the least important variables in the sequential backward variable selection.” Since the features are removed, there exists some plurality of sets of features that are unimportant in the form of feature sets that include features that are unimportant.] Shyr does not explicitly teach: (1) The plurality of feature vectors being “generated by extracting a predetermined feature from each of multiple pieces of digital data indicating measurement values obtained by measuring a target.” (2) “a quality label set including a plurality of quality labels corresponding to the multiple pieces of digital data and indicating quality of the target; and a plurality of non-quality label sets each including a plurality of non-quality labels, the non-quality labels corresponding to the multiple pieces of digital data and being of a type expected to be independent of the quality of the target” and the related limitation of the features being analyzed including such “quality” and “non-quality” labels.” (3) the limitation of the “variance” of the accuracy and use of such “variances corresponding to the non-quality label sets.” (4) “to generate a screen image enabling identification of at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the variances.” Shao teaches a plurality of feature vectors being “generated by extracting a predetermined feature from each of multiple pieces of digital data indicating measurement values obtained by measuring a target” [§ 3.2 (“Feature extraction”): “Watt meter and microphone signals are employed for process monitoring of ultrasonic metal welding. Fig. 4, Fig. 5 show typical signals from these two sensors. In addition, several process data such as the total weld time, total energy, maximum power, tool displacement before vibration, and tool displacement after vibration, are recorded through the welding system without external sensors. These data actually indicate the process conditions and therefore are also included in the candidate feature set. Thus, in total 81 candidate features are extracted either from sensor signals or process data, and they are indexed from Feature 1 to Feature 81 accordingly.”] and “a quality label set including a plurality of quality labels corresponding to the multiple pieces of digital data and indicating quality of the target; and a plurality of non-quality label sets each including a plurality of non-quality labels, the non-quality labels corresponding to the multiple pieces of digital data and being of a type expected to be independent of the quality of the target” and the related limitation of the features being analyzed including such “quality” and “non-quality” labels. [Abstract: “Due to the recent development in sensing technology, many on-line signals are collected for manufacturing process monitoring and feature extraction is then performed to extract critical features related to product/process quality.” § 3.3, paragraph 1: “With the limited engineering knowledge about monitoring signals used in the ultrasonic welding operation, some previously defined features may contain little information about welding quality, so it is necessary to carry out feature screening prior to feature selection using cross-validation in order to reduce the extensive computations required in the next step of feature selection.” See also § 1, paragraph 4: “Thus, signal features without good physical understanding may be irrelevant or redundant. Under this circumstance, feature selection is commonly applied to pick a minimally sized subset of features for monitoring. By removing a large number of irrelevant and redundant features, feature selection is able to help avoid overfitting, improve model performance, provide more efficient and cost-effective process monitoring, and acquire better insights into the underlying processes that generated the data.” That is, “irrelevant and redundant features” are those that are “of a type expected to be independent of the quality of the target.”] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Shyr with the teachings of Shao by applying the technique of Shyr to the application of feature selection for manufacturing process monitoring (see Shao, title: “Feature selection for manufacturing process monitoring using cross-validation”) so as to arrive at the above-quoted features of the claimed invention, including the limitations of the labels being “quality” and “non-quality” as recited in the instant claim. The motivation would have been to determine features that are related to product or process quality in a problem that is known (see Shao, abstract: “feature extraction is then performed to extract critical features related to product/process quality”; § 1, paragraph 4 (part quoted above).). The combination of references thus far does not teach the limitations (3) and (4) listed above. Jiang teaches a “variance” of the accuracy and use of such “variances corresponding to the non-quality label sets.” [§ 1, paragraph 2: “It is important to measure the uncertainty of prediction error estimators because the accuracy of the model selection is limited by the variance of error estimates [11], [12]. A model error estimation can be considered as a random variable as the variability in training or test set [13], [14], and its quality is usually measured by means of its bias and variance. The ideal estimator should be an efficient estimator which is unbiased and has the lowest variance. It is known that CV provides an unbiased estimate of the prediction error on the training set [1]. The variance is crucial for the accuracy of CV estimator. As well as being an important indicator for assessing estimators of prediction error, error estimators with low variance are quite interesting in model selection if we assume that the bias term is independent of the considered model [5].” § 1, paragraph 4: “Variance was estimated in different ways… Dietterich [15] and Alpaydin [16] employed the classical sample variance estimator to complete hypothesis tests for comparing classifiers, although this estimator is biased because of the overlap among training sets or test sets [1], [17]. Moreover, Bengio and Grandvalet [1] showed that the bias could not be ignored, otherwise the variance would be grossly underestimated. The approximate variance estimator presented by Markatou [18] identifies all first-order terms in the reciprocal of the size of the training set.”] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Jiang by modifying Shyr, as modified thus far, to determine the “variance” of the accuracy and use of such “variances corresponding to the non-quality label sets.” The motivation would have been to perform to use a metric for assessing an estimator of the generalization capability of a model, as suggested by Jiang (see abstract: “Cross-validation (CV) is often used to estimate the generalization capability of a learning model. The variance of CV error has a considerable impact on the accuracy of CV estimator and the adequacy of the learning model, so it is very important to analyze CV variance. The aim of this paper is to investigate how to improve the accuracy of the error estimation based on variance analysis.”). The combination of references thus far does not teach the limitation of “to generate a screen image enabling identification of at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the variances.” Amershi teaches “to generate a screen image enabling identification of at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the variances.” [[0065]: “The I/O controller 216 can provide an output to the user device 102 to cause the feature ideation user interface 132 to be displayed.” [0043]: “In some examples, the candidate features 344 can be rendered in a manner that indicates a ranking. The candidate features 344 can begin with candidate features that rank higher on some criteria than candidate features near the end of the rendered candidate features 344.” It is noted that the feature of “at least one non-quality label type adversely affecting quality of the multiple pieces of digital data by using the variances” is already taught in the existing combination of references, because the base reference (Shyr) teaches labels that are unimportant.] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Amershi by implementing the technique of displaying an interface that indicates a ranking of features, so as to arrive at the claimed invention. The motivation would have been to present information to a user that enables configuration of potential candidate features (see Amershi, [0011]: “According to some examples, the user interface is designed to present candidate features for consideration by the user.”). As to claim 12, the combination of Shyr, Shao, Jiang, and Amershi teaches the information processing apparatus according to claim 10, wherein the clustering accuracy is a success rate of clustering or a failure rate of clustering. [Shyr, [0055]: “In this embodiment, given that hierarchical clustering starts with leaf-entries, i.e. small sub-clusters, a new goodness measure is herein used and defined as the weighted average Silhouette coefficient over all starting sub-clusters. The average Silhouette coefficient ranges between −1 (indicating a very poor model) and +1 (indicating an excellent model). This measure avoids bias of variable selection criteria with respect to the number of variables.” Note that “Silhouette coefficient” is a measure of clustering success, particularly in the form of whether the resulting model is poor (failure) or excellent (success). See also Shyr, [0223]: “Clustering model goodness is defined as the weighted average Silhouette coefficient over all starting sub-clusters in the final stage of regular HAC, and is given by equation 68, as follows.”] As to claim 13, the combination of Shyr, Shao, Jiang, and Amershi teaches the information processing apparatus according to claim 10, further comprising: a display device to display the screen image. [Shyr, [0037]: “Display 220 provides a mechanism to display data to a user and may be, for example, a computer monitor, or a television screen.” Furthermore, Amershi as discussed above, teaches displaying an interface for a user. Therefore, the instant limitation is met by the combination of references.] As to claim 16, the rejection made to claim 1 is applied to claim 17. As to claim 19, the rejection made to claim 6 is applied to claim 18. 5. Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Shyr in view of Shao, Jiang, and Amershi, and further in view of Forman. As to claim 11, the combination of Shyr, Shao, Jiang, and Amershi teaches the information processing apparatus according to claim 10, wherein the processing circuitry generates, as the screen image, a label-type evaluation screen image indicating at least one of the non-quality label types in a descending order of the variances. Amershi further teaches “wherein the processing circuitry generates, as the screen image, a label-type evaluation screen image indicating at least one of the non-quality label types” [[0065]: “The I/O controller 216 can provide an output to the user device 102 to cause the feature ideation user interface 132 to be displayed.” [0043]: “In some examples, the candidate features 344 can be rendered in a manner that indicates a ranking. The candidate features 344 can begin with candidate features that rank higher on some criteria than candidate features near the end of the rendered candidate features 344.”] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have further combined the teachings of the references combined thus far, including the above teachings of Amershi, so as to arrive at the above-discussed feature of the instant dependent claim. The motivation for doing so is covered by the motivation given for the teachings of Amershi in the rejection of the parent independent claim. The combination of references thus far does not explicitly teach “in a descending order of the variances.” Forman teaches “in a descending order of the variances.” [Claim 24: “prior to said selecting, sorting said features by respective binomial separation score in an ascending or descending order.” Claim 24: “selecting features best suited for said task according to said ascending or descending order.” Note that in this context, the binominal separation score (BNS) is analogous to a metric, such as the clustering accuracy metric that is already taught by the existing combination of references. It is noted that the instant claim only requires the screen image to “indicate” at least one label type in some ranked order, and does not require display of the value of the clustering accuracy on the screen.] It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of the references combined thus far with the teachings of Forman by implementing the ranking in Amershi to be in descending order. The motivation would have been to implement a suitable manner of ranking or comparing features, as suggested by Forman (see part quoted above). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US20180300333A1 teaches displaying information about feature ranking in a feature selection process. Any inquiry concerning this communication or earlier communications from the examiner should be directed to YAO DAVID HUANG whose telephone number is (571)270-1764. The examiner can normally be reached Monday - Friday 9:00 am - 5:30 pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /Y.D.H./Examiner, Art Unit 2124 /MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124
Read full office action

Prosecution Timeline

Mar 24, 2022
Application Filed
Apr 24, 2026
Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12626122
METHODS OF PROVIDING TRAINED HYPERDIMENSIONAL MACHINE LEARNING MODELS HAVING CLASSES WITH REDUCED ELEMENTS AND RELATED COMPUTING SYSTEMS
5y 1m to grant Granted May 12, 2026
Patent 12626138
CAUSALITY DETECTION FOR OUTLIER EVENTS IN TELEMETRY METRIC DATA
4y 8m to grant Granted May 12, 2026
Patent 12619852
Method and System for Simulating, Predicting, Interpreting, Comparing, or Visualizing Complex Data
6y 11m to grant Granted May 05, 2026
Patent 12608604
METHOD AND APPARATUS FOR TRAINING ARTIFICIAL INTELLIGENCE BASED ON EPISODE MEMORY
4y 5m to grant Granted Apr 21, 2026
Patent 12536455
Method for Early Warning Brandish of Transmission Wire Based on Improved Bayes-Adaboost Algorithm
3y 8m to grant Granted Jan 27, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2
Expected OA Rounds
63%
Grant Probability
95%
With Interview (+31.8%)
4y 0m (~0m remaining)
Median Time to Grant
Low
PTA Risk
Based on 127 resolved cases by this examiner. Grant probability derived from career allowance rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month