DETAILED ACTION
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR
1.17(e), was filed in this application after final rejection. Since this application is eligible for continued
examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the
finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's
submission filed on 30 September 2025 has been entered.
Response to Amendment
The amendment filed on 10 September 2025 has been entered.
Claims 1-25 are pending.
Claims 1-2, 4-7, 9, 11-12, 14, 16-17, 19, 21- 22, 24 are amended.
Applicant’s amendments to the Claims have overcome each and every objection and rejection under 35 USC 112(b) previously set forth in the Final Office Action mailed 10 July 2025.
Response to Arguments
Applicant’s remarks, regarding the rejections of claims under 35 USC 101, have been fully considered.
Applicant submits the pending claims, as amended, are not directed to an abstract idea, but instead recite a specific improvement in computer technology, namely, a computer-implemented mechanism that programmatically generates and executes stratified testing data subsets, computes comparative distributions of heterogeneous code complexity metrics between subsets, automatically identifies structural code features and complexity thresholds correlated with systematic model errors, and outputs machine-readable guidance data usable to modify and retrain an artificial intelligence model.
Applicant submits the recited operations, automatic stratification of code samples by heterogeneous complexity metrics, execution of an artificial intelligence model on those subsets to obtain per-subset outputs in real time, computation of comparative distributions, and output of machine-readable retraining guidance, cannot be performed as mental steps.
Applicant submits the claimed system and method improve the functioning of computers in the field of artificial intelligence by providing a new mechanism for introspection of model learning behavior.
Applicant submits the claims recite a particular sequence of processor-implemented steps that improve how computers evaluate and refine artificial intelligence models, by producing actionable guidance for model retraining not achievable with conventional testing workflows.
Applicant submits the claims do not merely recite the result of "improving AI model training," but instead employ specific, rule-based techniques, heterogeneous complexity metrics, comparative distributions between correct and incorrect subsets, threshold detection, and structural feature identification, that constrain the processor's operation and preclude preemption of all model evaluation techniques and the ordered combination of elements constitutes significantly more than a generic computer.
Applicant submits the claims recite specific improvements to computer technology, including improved model training or error detection in artificial intelligence systems. Applicant submits the present claims fall squarely within that guidance, as they improve the functioning of a computer- implemented Al system itself.
Examiner notes Applicant’s arguments, as outlined above, are directed to newly amended claim limitations for which Examiner has not yet made a prima facie case for, rendering Applicant’s arguments moot.
Applicant’s remarks, regarding the rejections of claims under 35 USC 103, have been fully considered.
Applicant notes Claim 1 recites, inter alia, "a model introspection component that analyzes artificial intelligence model learning behavior for a code understanding task by: programmatically generating and executing multiple testing data subsets that are automatically stratified based on heterogeneous code complexity metrics computed from source code samples, and by collecting per-subset model prediction outputs in real time, wherein the model introspection component computes comparative distributions of the code complexity metrics between subsets that the artificial intelligence model predicted correctly versus subsets predicted incorrectly, and automatically identifies complexity thresholds and structural code features that cause systematic model prediction errors, and outputs machine-readable guidance data usable to modify training of the artificial intelligence model."
Applicant submits the cited art, either individually or in combination, fails to teach or suggest the specifically claimed features of Claim 1 and analogous Claims 6, 11, 16, 21.
Applicant’s arguments have been considered, but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 9 October 2025 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered and attached by the examiner.
Claim Objections
Claims 2, 4, 5, 12, 14, 15, 22, 24, 25 are objected to because of the following informalities: “the plurality of testing data subsets” should be “the multiple testing data subsets”. Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 2-5, 7-10, 12-15, 17-20, 22-25 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 recites the limitation "the output of the artificial intelligence model". There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the output of the artificial intelligence model" has been construed to be “an output of the artificial intelligence model”. Claims 3-5, which are dependent on claim 2, are similarly rejected.
Claim 7 recites the limitation "the performance of the artificial intelligence model". There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the performance of the artificial intelligence model" has been construed to be “a performance of the artificial intelligence model”. Claims 8-10, which are dependent on claim 7, are similarly rejected.
Claim 12 recites the limitation "the output of the artificial intelligence model". There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the output of the artificial intelligence model" has been construed to be “an output of the artificial intelligence model”. Claims 13-15, which are dependent on claim 12, are similarly rejected.
Claim 17 recites the limitation "the performance of the artificial intelligence model". There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the performance of the artificial intelligence model" has been construed to be “a performance of the artificial intelligence model”. Claims 18-20, which are dependent on claim 17, are similarly rejected.
Claim 22 recites the limitation "the output of the artificial intelligence model". There is insufficient antecedent basis for this limitation in the claim. For examination purposes, “the output of the artificial intelligence model" has been construed to be “an output of the artificial intelligence model”. Claims 23-25, which are dependent on claim 22, are similarly rejected.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception, abstract idea, without significantly more.
Step 1: This part of the eligibility analysis evaluates whether the claim(s) falls within any statutory
category. MPEP 2106.03:
According to the first part of the Alice analysis, in the instant case, the claims were determined
to be directed to one of the four statutory categories: an article of manufacture, a method/process (Claims 11-20), a machine/system/product (Claims 1-10, 21-25), and a composition of matter. Based on the claims being determined to be within of the four categories (i.e., process, machine, manufacture, or composition of matter), (Step 1), it must be determined if the claims are directed to a judicial exception (i.e., law of nature, natural phenomenon, and abstract idea).
Step 2A Prong One: This part of the eligibility analysis evaluates whether the claim(s) recites a
judicial exception.
Regarding independent claims 1, 6, 11, 16, 21, the claims recite a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG) without significantly more (Step-2A: Prong One). The applicant's claim limitations under broadest reasonable interpretation covers activities classified under mental processes - concepts performed in the human mind (including an observation, evaluation, judgment, opinion) (see MPEP § 2106.04(a)(2), subsection Ill) and the 2019 PEG. As evaluated below:
Claims 1, 6, 11, 16, 21:
“a model introspection component that analyzes artificial intelligence model learning behavior for a code understanding task” (mental process of judgement)
“automatically identifies complexity thresholds and structural code features that cause systematic model prediction errors” (mental process of judgement)
If the identified limitation(s) falls within at least one of the groupings of abstract ideas, it is
reasonable to conclude that the claim(s) recites an abstract idea in Step 2A Prong One.
Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim(s) as a whole integrates the recited judicial exception into a practical application of the exception. As evaluated below:
“a memory that stores computer executable components”
“a processor, operably coupled to the memory, and that executes the computer executable components stored in the memory, wherein the computer executable components comprise”
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
“programmatically generating and executing multiple testing data subsets that are automatically stratified based on heterogeneous code complexity metrics computed from source code samples”
“collecting per-subset model prediction outputs in real time”
“outputs machine-readable guidance data usable to modify training of the artificial intelligence model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
“wherein the model introspection component computes comparative distributions of the code complexity metrics between subsets that the artificial intelligence model predicted correctly versus subsets predicted incorrectly”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea when considered as an ordered combination and as a whole.
Step 2B: This part of the eligibility analysis evaluates whether the claim, as a whole, amounts to
significantly more than the recited exception, i.e., whether any additional element, or combination of
additional elements, adds an inventive concept to the claim. MPEP 2106.05.
First, the additional elements considered as part of the preamble and the additional elements
directed to the use of computer technology are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because they generally link the judicial exception to
the technology environment, see MPEP 2106.05(h).
Second, the additional elements mere application of the abstract idea or mere instructions to
implement an abstract idea on a computer are deemed insufficient to transform the judicial exception
to a patentable invention to a patentable invention because the limitations generally apply the use of a
generic computer and/or process with the judicial exception, see MPEP 2106.05(f).
Lastly, the claims directed to data gathering activity as noted above, are deemed directed to an insignificant extra-solution activity. The courts have found these types of limitations insufficient to
qualify as "significantly more", see MPEP 2106.05(g).
Furthermore, when considering evidence in view of Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018), see USPTO Berkheimer Memorandum (April 2018). Examiner notes Berkheimer: Option 2 - A citation to one or more of the court decisions discussed in MPEP § 2106.05(d}(II} as noting the well understood, routine, conventional nature of the additional element (s) (e.g., limitations directed to mere data gathering):
The courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity, see MPEP 2106.05(d).
The additional limitations, as analyzed, failed to integrate a judicial exception into a practical application at Step 2A and provide an inventive concept in Step 2B, per the analysis above. Thus, considering the additional elements individually and in combination and the claims as a whole, the additional elements do not provide significantly more than the abstract idea. This claim is not patent eligible. Therefore, in examining elements as recited by the limitations individually and as an ordered combination, as a whole, claims 1, 6, 11, 16, 21 do not recite what the courts have identified as "significantly more".
Furthermore, regarding dependent claims 2-5 which depend from claim 1, claims 7-10, which depend from claim 6, claims 12-15, which depend from claim 11, claims 17-20, which depend from claim 16, and claims 22-25, which depend from claim 21, the claims are directed to a judicial exception (i.e., an abstract idea enumerated in the 2019 PEG, a law of nature, or a natural phenomenon) without significantly more as highlighted below in the claim limitations by evaluating the claim limitations under the Step2A and 2B:
Claims 2, 7, 12, 17, 22:
Incorporates the rejection of claims 1, 6, 11, 16, 21, respectively.
“an extraction component that extracts one or more code complexity metrics for a plurality of code samples included in a testing dataset”
“a testing data subset component that generates the plurality of testing data subsets by grouping the plurality of code samples based on a performance metric that evaluates the output of the artificial intelligence model”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claims 3, 8, 13, 18, 23:
Incorporates the rejection of claims 2, 7, 12, 17, 22, respectively.
“wherein the plurality of code samples are source code samples”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
Limitations directed to mere instructions indicating a field of use or technological environment in which to apply a judicial exception cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claims 4, 9, 14, 19, 24:
Incorporates the rejection of claims 2, 7, 12, 17, 23, respectively.
“a distribution component that determines a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claim 5:
Incorporates the rejection of claim 4.
“a comparison component that compares a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets” (mental process of judgement)
“wherein the testing data subset component groups the plurality of code samples based on quantitative performance metrics derived from artificial intelligence model predictions utilizing prediction confidence scores and error classification” (mental process of judgement)
“identifies outlier patterns and complexity thresholds that affect artificial intelligence model prediction accuracy” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
“wherein the extraction component extracts a plurality of code complexity metrics from the code samples, including at least one of: cyclomatic complexity, Halstead complexity, maintainability index, or control flow complexity”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception, see MPEP 2106.05(h).
“wherein the comparison component generates a statistical distribution model of the extracted code complexity metrics across the testing data subsets”
These recitations are deemed insufficient to transform the judicial exception to a patentable invention because the recitation is directed to instructions for mere data gathering or data output, see MPEP 2106.05(g).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool or directed to instructions merely indicating a field of use or technological environment in which to apply a judicial exception or directed to instructions for mere data gathering or data output cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
Claims 10, 15, 20, 25:
Incorporates the rejection of claims 9, 14, 19, 24, respectively.
“a comparison component that compares a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets” (mental process of judgement)
The recitation is directed to mere instructions to implement an abstract idea on a computer, or
merely uses a computer as a tool to perform an abstract idea and are considered to adding the words "apply it" (or an equivalent) with the judicial exception, See MPEP 2106.05(f).
Limitations directed to mere instructions to implement an abstract idea on a computer/using computer as a tool cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
The dependent claims as analyzed above, do not recite limitations that integrated the judicial exception into a practical application. In addition, the claim limitations do not include additional elements that are sufficient to amount to significantly more than the judicial exception (Step-2B). Therefore, the claims do not recite any limitations, when considered individually or as a whole, that recite what have the courts have identified as "significantly more", see MPEP 2106.05; and therefore, as a whole the claims are not patent eligible. As shown above, the dependent claims do not provide any additional elements that when considered individually or as an ordered combination, amount to significantly more than the abstract idea identified. Therefore, as a whole, the dependent claims do not recite what have the courts have identified as "significantly more" than the recited judicial exception. Therefore, claims 2-5, 7-10, 12-15, 17-20, and 22-25 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception and does not recite, when claim elements are examined individually and as a whole, elements that the courts have identified as "significantly more" than the recited judicial exception.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-3, 6-8, 11-13, 16-18, 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Heinecke et al. (U.S. Pre-Grant Publication No. 20190317885, hereinafter ‘Heinecke'), in view of Bhandari et al. (NPL: "Measuring the fault predictability of software using deep learning techniques with software metrics", hereinafter 'Bhandari'), Dabkowski et al. (NPL: "Real time image saliency for black box classifiers", hereinafter 'Dabkowski'), and further in view of Warnecke et al. (NPL: "Don’t Paint It Black: White-Box Explanations for Deep Learning in Computer Security", hereinafter 'Warnecke').
Regarding claim 1 and analogous claims 6, 11, 16, 21, Heinecke teaches A system, comprising: a memory that stores computer executable components; and a processor, operably coupled to the memory, and that executes the computer executable components stored in the memory, wherein the computer executable components comprise ([0060] A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example system 100 of FIG. 1 is shown in FIG. 3. The machine readable instructions may be one or more executable programs or portion(s) of an executable program for execution by a computer processor such as the processor 1012 shown in the example processor platform 1000 discussed below in connection with FIG. 10.; The program may be embodied in software stored on a non-transitory computer readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associated with the processor 1012, but the entire program and/or parts thereof could alternatively be executed by a device other than the processor 1012 and/or embodied in firmware or dedicated hardware.):
a model introspection component that analyzes artificial intelligence model learning behavior for a code understanding task ([0038] In certain examples, continuous integration practices (e.g., Jenkins, Teamcity, Travis CI, etc.) help the software development process to prevent software integration problems. A continuous integration environment provides metrics such as automated code coverage for unit tests, integration tests and end-to-end tests, cyclomatic complexity metrics, and different metrics from static analysis tools (e.g., code style issues, automatic bug finders, etc.), for example. End-to-end test execution combined with metrics from the software usage analytics framework, for example, provide insight into an amount of test cases executed per feature in the continuous integration environment.; [0050] The a model introspection component that analyzes artificial intelligence model learning behavior for a code understanding task model comparator 240 compares the model of expected software application usage and the model of actual software application usage (both of which are constructed by the model tool 230) to identify a difference or gap between expected and actual usage of the software.; [0051] In certain examples, the example model tool 230 of the recommendation engine 140 implements the software usage models using artificial intelligence. Artificial intelligence (AI), including machine learning (ML), deep learning (DL), and/or other artificial machine-driven logic, enables machines (e.g., computers, logic circuits, etc.) to use a model to process input data to generate an output based on patterns and/or associations previously learned by the model via a training process.)
Heinecke fails to teach by: programmatically generating and executing multiple testing data subsets that are automatically stratified based on heterogeneous code complexity metrics computed from source code samples, and by collecting per-subset model prediction outputs in real time, wherein the model introspection component computes comparative distributions of the code complexity metrics between subsets that the artificial intelligence model predicted correctly versus subsets predicted incorrectly, and automatically identifies complexity thresholds and structural code features that cause systematic model prediction errors, and outputs machine-readable guidance data usable to modify training of the artificial intelligence model.
Bhandari teaches by: programmatically generating and executing multiple testing data subsets that are automatically stratified based on heterogeneous code complexity metrics computed from source code samples ([III. RESEARCH METHODOLOGY] In this section, we present the methodology that we implemented in this study to examine the capability of the software fault prediction using deep learning techniques. Firstly, we explain the procedure how we examined fault prediction models and how they were compared each other. based on heterogeneous code complexity metrics computed from source code samples Software metrics are extracted from software source code using some tools. CKJM tool is one of the frequently used software tool to extract the metrics. In this study, CK (Chidamber and Kemerer) metrics [12] are extracted from the software. programmatically generating and executing multiple testing data subsets that are automatically stratified Datasets are constructed combining metrics and the defect or true/false as a classifier. After data pre-processing and feature extraction have been applied, different deep learning techniques (as explained in Subsection III-B) are considered to filtered datasets to predict the software fault-proneness. To systematically perform the research work we follow certain steps, which are as follows (also shown in Fig. 1). 1) Input the software source code to predict the fault. 2) Extract the source code metrics of the software. 3) All extracted metrics are independent variables and the observed classifier as dependent variable. 4) Use data cleaning, segmentation, normalization, dimension reduction etc. as pre-processing process 5) Apply machine learning methods such as ANN, CNN, SOM, LVQ3 and MultiLVQ. 6) By computing the algorithms, calculate the performance measures and fault prediction accuracy.),
outputs machine-readable guidance data usable to modify training of the artificial intelligence model ([B. Classification techniques] Several deep learning techniques are applied to input variables to predict the defected class whether it is faulty case or non-faulty. Artificial neural network, Convolutional neural network (CNN), Self-Organizing Map (SOM), Learning Vector Quantization (LVQ)-3, MultiLVQ are applied to predict the software fault, are as follows.; [1) Artificial neural network] Artificial Neural Network (ANN) is a kind of machine learning method. It is an advancement to perceptron neural network model having one or more hidden layers. Multilayer Perceptron are usable to modify training of the artificial intelligence model feedforward neural networks trained with the outputs machine-readable guidance data back-propagation algorithm [13], error back-propagation learning consists of two passes; (a) forward pass and (b) backward pass. (a) An input is fed to the neural network; its effect is propagated through the network layer by layer and the weights of the network are all fixed in forward pass and (b) the weights are all updated and adjusted according to the error computed during the backward pass.).
Heinecke and Bhandari are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Heinecke, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Bhandari to Heinecke before the effective filing date of the claimed invention in order to predict software faults in identifying the location in the faulty modules for detailed testing to increase maintainability (cf. Bhandari, [Abstract] Minimization of failures is the major expectation from reliable software. Predicting the software faults supports in identifying the location in the faulty modules for detailed testing to increase the maintainability. This paper presents fault prediction using some of the deep learning techniques utilizing source code metrics of the software. Accuracy, fmeasure, recall, precision, receiver operating characteristic (ROC) curves and area under curve (AUC) values are considered to measure the performance of the deep learning methods.).
Dabkowski teaches and by collecting per-subset model prediction outputs in real time ([1 Introduction, pg. 1] In this paper we lay the groundwork for a new class of fast and accurate model-based saliency detectors, giving high pixel accuracy and sharp saliency maps (an example is given in figure 1). We propose a fast, model agnostic, saliency detection method. Instead of iteratively obtaining saliency maps for each input image separately, we train a collecting per-subset model prediction outputs model to predict such a map for any input image in a single feed-forward pass. We show that this approach is not only orders-of-magnitude faster than iterative methods, but it also produces higher quality saliency masks and achieves better localisation results. We assess this with standard saliency benchmarks and introduce a new saliency measure. Our proposed model is able to produce in real time real-time saliency maps, enabling new applications such as video-saliency which we comment on in our Future Research section.),
Heinecke, Bhandari, and Dabkowski are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Heinecke and Bhandari, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Dabkowski to Heinecke before the effective filing date of the claimed invention in order to produce higher quality saliency masks and achieves better localisation results (cf. Dabkowski, [1 Introduction, pg. 1] In this paper we lay the groundwork for a new class of fast and accurate model-based saliency detectors, giving high pixel accuracy and sharp saliency maps (an example is given in figure 1). We propose a fast, model agnostic, saliency detection method. Instead of iteratively obtaining saliency maps for each input image separately, we train a model to predict such a map for any input image in a single feed-forward pass. We show that this approach is not only orders-of-magnitude faster than iterative methods, but it also produces higher quality saliency masks and achieves better localisation results. We assess this with standard saliency benchmarks and introduce a new saliency measure. Our proposed model is able to produce real-time saliency maps, enabling new applications such as video-saliency which we comment on in our Future Research section.).
Warnecke teaches wherein the model introspection component computes comparative distributions of the code complexity metrics between subsets that the artificial intelligence model predicted correctly versus subsets predicted incorrectly ([Explainable learning., pg. 2] wherein the model introspection component computes comparative distributions of the code complexity metrics between subsets that the artificial intelligence model predicted correctly versus subsets predicted incorrectly Given a neural network N, an input vector x = (x1, . . . , xd ) and a prediction fN (x) = y, one aims at finding an explanation why the label y has been selected by the network. This explanation is typically represented as a vectorr = (r1, . . . ,rd ) that describes the relevance or importance of the different dimensions of x for the prediction. The computed relevance values can be overlayed with the input and thus enable highlighting relevant features, such as the tokens in the code snippet shown in Figure 1.; [5.2 Sparsity of Explanations, pg. 7] We continue to compare the two explanation methods and investigate the sparsity of the generated explanations. To this end, we normalize the explanations of all methods by dividing the relevance vectors by the maximum of their absolute values so that the values for every sample are between −1 and 1. While this normalization helps a human analyst to identify top-ranked features, the sheer amount of data in an explanation can still render an interpretation intractable. As a consequence, we expect a useful explanation method to assign high relevance scores to only a few features and keep the majority at a relevance of 0. To measure the sparsity we create a normalized histogram h of the relevance values to comparative distributions access the distribution of the values and calculate the mass around zero (MAZ) metric defined by MAZ(r) = ∫ r −r h(x)dx forr ∈ [0, 1]. Sparse explanations that assign many features with relevances close to 0 will have a steep rise in MAZ atr close to zero and a flat slope when r is close to 1 since only few features are assigned high relevance.), and
automatically identifies complexity thresholds and structural code features that cause systematic model prediction errors ([5.1 Conciseness of Explanations, pg. 6] We start by investigating the conciseness of the explanations generated for the four considered security systems. To this end, we stick to the approach by Guo et al. [20] for and automatically identifies complexity thresholds and structural code features measuring the impact of removing features on the classification result. In particular, we compute the average remaining accuracy (ARA). For some k = 1, . . . , F the ARA is calculated by removing the k most relevant features from a sample and running it through the neural network again. This procedure is performed on all samples and the average softmax probability of the original class of the samples is reported. How is the ARA expected to behave? If we successively remove relevant features, the that cause systematic model prediction errors ARA will decrease, as the neural network has less information for making a correct prediction. The better the explanation, the quicker the ARA will drop as more relevant information has been removed. In the long term, ARA converges to the probability score of the sample with no information, i.e. containing only zeros. Technically, we implement ARA by removing the specific features of the four systems under test as follows. For Drebin+ and Mimicus+, features are removed by setting the features to 0. For DAMD we replace the top instructions with the no-op byte code and for VulDeePecker we simply replace the top lexical tokens from the code with an embedding of only zeros. Moreover, we introduce a Brute-Force method as a baseline for this experiment. This method calculates the relevance ri by setting xi to zero and measuring the difference in the softmax probability, i.e. ri = fN (x) − fN (x |xi = 0). We call this method Brute-Force because a sample with d features has to be classified d times again, which can be time consuming for data with lots of features.),
Heinecke, Bhandari, Dabkowski, and Warnecke are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Heinecke, Bhandari, and Dabkowski, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Warnecke to Heinecke before the effective filing date of the claimed invention in order to systematically compare white-box explanations with current black-box approaches for deep learning (cf. Warnecke, [ABSTRACT, pg. 1] Deep learning is increasingly used as a basic building block of security systems. Unfortunately, deep neural networks are hard to interpret, and their decision process is opaque to the practitioner. Recent work has started to address this problem by considering black-box explanations for deep learning in computer security (CCS’18). The underlying explanation methods, however, ignore the structure of neural networks and thus omit crucial information for analyzing the decision process. In this paper, we investigate white-box explanations and systematically compare them with current black-box approaches. In an extensive evaluation with learning-based systems for malware detection and vulnerability discovery, we demonstrate that white-box explanations are more concise, sparse, complete and efficient than black-box approaches. As a consequence, we generally recommend the use of white-box explanations if access to the employed neural network is available, which usually is the case for stand-alone systems for malware detection, binary analysis, and vulnerability discovery.).
Regarding claim 2, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The system of claim 1.
Bhandari teaches further comprising: an extraction component that extracts one or more code complexity metrics for a plurality of code samples included in a testing dataset ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.); and
a testing data subset component that generates the plurality of testing data subsets by grouping the plurality of code samples based on a performance metric that evaluates the output of the artificial intelligence model ([III. Research Methodology] In this study, CK (Chidamber and Kemerer) metrics [12] are extracted from the software. Datasets are constructed combining metrics and the defect or true/false as a classifier. After data pre-processing and feature extraction have been applied, different deep learning techniques (as explained in Subsection III-B) are considered to filtered datasets to predict the software fault-proneness.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 3, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The system of claim 2.
Bhandari teaches wherein the plurality of code samples are source code samples ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 7, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The system of claim 6.
Bhandari teaches further comprising: an extraction component that extracts one or more code complexity metrics for a plurality of code samples included in a testing dataset ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.); and
a testing data subset component that generates the plurality of testing data subsets by grouping the plurality of code samples based on a performance metric that evaluates the performance of the artificial intelligence model ([III. Research Methodology] In this study, CK (Chidamber and Kemerer) metrics [12] are extracted from the software. Datasets are constructed combining metrics and the defect or true/false as a classifier. After data pre-processing and feature extraction have been applied, different deep learning techniques (as explained in Subsection III-B) are considered to filtered datasets to predict the software fault-proneness.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 8, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The system of claim 7.
Bhandari teaches wherein the plurality of code samples are source code samples ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 12, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The computer-implemented method of claim 11.
Bhandari teaches further comprising: extracting, by the system, one or more code complexity metrics for a plurality of code samples included in a testing dataset ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.); and
generating, by the system, the plurality of testing data subsets by grouping the plurality of code samples based on a performance metric that evaluates the output of the artificial intelligence model ([III. Research Methodology] In this study, CK (Chidamber and Kemerer) metrics [12] are extracted from the software. Datasets are constructed combining metrics and the defect or true/false as a classifier. After data pre-processing and feature extraction have been applied, different deep learning techniques (as explained in Subsection III-B) are considered to filtered datasets to predict the software fault-proneness.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 13, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The computer-implemented method of claim 12.
Bhandari teaches wherein the plurality of code samples are source code samples ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 17, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The computer-implemented method of claim 16.
Bhandari teaches further comprising: extracting, by the system, one or more code complexity metrics for a plurality of code samples included in a testing dataset ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.); and
generating, by the system, the plurality of testing data subsets by grouping the plurality of code samples based on a performance metric that evaluates the performance of the artificial intelligence model ([III. Research Methodology] In this study, CK (Chidamber and Kemerer) metrics [12] are extracted from the software. Datasets are constructed combining metrics and the defect or true/false as a classifier. After data pre-processing and feature extraction have been applied, different deep learning techniques (as explained in Subsection III-B) are considered to filtered datasets to predict the software fault-proneness.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 18, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The computer-implemented method of claim 17.
Bhandari teaches wherein the plurality of code samples are source code samples ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 22, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The computer program product of claim 21.
Bhandari teaches wherein the program instructions further cause the processor to: extract, by the processor, one or more code complexity metrics for a plurality of code samples included in a testing dataset ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.); and
generate, by the processor, the plurality of testing data subsets by grouping the plurality of code samples based on a performance metric that evaluates the output of the artificial intelligence model ([III. Research Methodology] In this study, CK (Chidamber and Kemerer) metrics [12] are extracted from the software. Datasets are constructed combining metrics and the defect or true/false as a classifier. After data pre-processing and feature extraction have been applied, different deep learning techniques (as explained in Subsection III-B) are considered to filtered datasets to predict the software fault-proneness.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Regarding claim 23, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The computer program product of claim 22.
Bhandari teaches wherein the plurality of code samples are source code samples ([1. Introduction] Deep learning has been applied for fault diagnosis and prediction in this study uses metrics that are extracted from the software source code.).
Heinecke, Bhandari, Dabkowski, and Warnecke are combinable for the same rationale as set forth above with respect to claim 1.
Claims 4, 9, 14, 19, 24 are rejected under 35 U.S.C. 103 as being unpatentable over Heinecke, in view of Bhandari, Dabkowski, Warnecke, and further in view of Tarlow et al. (U.S. Pre-Grant Publication No. 20150135166, hereinafter ‘Tarlow').
Regarding claim 4, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The system of claim 2.
Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, fails to teach further comprising: a distribution component that determines a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics.
Tarlow teaches further comprising: a distribution component that determines a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics ([0021] The source code error checking and/or error correcting component 106, the source code auto-complete component 108, and the source code generator 110, are each in communication with a trained probabilistic model 100 which is a type of machine learning system.; [0022] The probabilistic model 100 comprises a a distribution component that determines a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics plurality of probability distributions describing belief about structure (syntactic and/or semantic) of natural source code. It is also arranged to take into account source code analysis output of the source code analyzer 112 (or any other source code analyzer).; [0076] A source code analyser 606 is optionally present at the computing device. A source code auto-complete component 608 may be present. A source code generator 622 may be present. A source code error check and/or error correction component 624 may be present. A data store 610 holds data such as natural source code examples, probability distribution parameters, context data from the source code analyser 606, and other data.).
Heinecke, Bhandari, Dabkowski, Warnecke, and Tarlow are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Heinecke, Bhandari, Dabkowski, and Warnecke, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Tarlow to Heinecke before the effective filing date of the claimed invention in order to predict sequences of source code elements to generate source code, to auto-complete source code, to error check source code, to error correct source code (cf. Tarlow, [0006] Automated generation, or completion, or checking, or correcting of source code is described whereby a probabilistic model having been trained using a corpus of natural source code examples is used. In various examples the probabilistic model comprises probability distributions describing belief about structure of natural source code and takes into account source code analysis from a compiler or other source code analyzer. In various examples, source code analysis may comprise syntactic structure, type information of variables and methods in scope, variables which are currently in scope and other data about source code. In various examples, the trained probabilistic model is used to predict sequences of source code elements. For example, to generate source code, to auto-complete source code, to error check source code, to error correct source code or for other purposes.).
Regarding claim 9, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The system of claim 7.
Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, fails to teach further comprising: a distribution component that determines a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics.
Tarlow teaches further comprising: a distribution component that determines a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics ([0021] The source code error checking and/or error correcting component 106, the source code auto-complete component 108, and the source code generator 110, are each in communication with a trained probabilistic model 100 which is a type of machine learning system.; [0022] The probabilistic model 100 comprises a a distribution component that determines a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics plurality of probability distributions describing belief about structure (syntactic and/or semantic) of natural source code. It is also arranged to take into account source code analysis output of the source code analyzer 112 (or any other source code analyzer).; [0076] A source code analyser 606 is optionally present at the computing device. A source code auto-complete component 608 may be present. A source code generator 622 may be present. A source code error check and/or error correction component 624 may be present. A data store 610 holds data such as natural source code examples, probability distribution parameters, context data from the source code analyser 606, and other data.).
Heinecke, Bhandari, Dabkowski, Warnecke, and Tarlow are combinable for the same rationale as set forth above with respect to claim 4.
Regarding claim 14, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The computer-implemented method of claim 12.
Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, fails to teach further comprising determining, by the system, a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics.
Tarlow teaches further comprising determining, by the system, a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics ([0021] The source code error checking and/or error correcting component 106, the source code auto-complete component 108, and the source code generator 110, are each in communication with a trained probabilistic model 100 which is a type of machine learning system.; [0022] The probabilistic model 100 comprises a a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics plurality of probability distributions describing belief about structure (syntactic and/or semantic) of natural source code. It is also arranged to take into account source code analysis output of the source code analyzer 112 (or any other source code analyzer).; [0076] A source code analyser 606 is optionally present at the computing device. A source code auto-complete component 608 may be present. A source code generator 622 may be present. A source code error check and/or error correction component 624 may be present. A data store 610 holds data such as natural source code examples, probability distribution parameters, context data from the source code analyser 606, and other data.).
Heinecke, Bhandari, Dabkowski, Warnecke, and Tarlow are combinable for the same rationale as set forth above with respect to claim 4.
Regarding claim 19, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The computer-implemented method of claim 17.
Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, fails to teach further comprising determining, by the system, a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics.
Tarlow teaches further comprising determining, by the system, a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics ([0021] The source code error checking and/or error correcting component 106, the source code auto-complete component 108, and the source code generator 110, are each in communication with a trained probabilistic model 100 which is a type of machine learning system.; [0022] The probabilistic model 100 comprises a a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics plurality of probability distributions describing belief about structure (syntactic and/or semantic) of natural source code. It is also arranged to take into account source code analysis output of the source code analyzer 112 (or any other source code analyzer).; [0076] A source code analyser 606 is optionally present at the computing device. A source code auto-complete component 608 may be present. A source code generator 622 may be present. A source code error check and/or error correction component 624 may be present. A data store 610 holds data such as natural source code examples, probability distribution parameters, context data from the source code analyser 606, and other data.).
Heinecke, Bhandari, Dabkowski, Warnecke, and Tarlow are combinable for the same rationale as set forth above with respect to claim 4.
Regarding claim 24, Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, teaches The computer program product of claim 23.
Heinecke, as modified by Bhandari, Dabkowski, and Warnecke, fails to teach wherein the program instructions further cause the processor to: determine, by the processor, a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics.
Tarlow teaches wherein the program instructions further cause the processor to: determine, by the processor, a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics ([0021] The source code error checking and/or error correcting component 106, the source code auto-complete component 108, and the source code generator 110, are each in communication with a trained probabilistic model 100 which is a type of machine learning system.; [0022] The probabilistic model 100 comprises a a distribution of the plurality of code samples within the plurality of testing data subsets based on the one or more code complexity metrics plurality of probability distributions describing belief about structure (syntactic and/or semantic) of natural source code. It is also arranged to take into account source code analysis output of the source code analyzer 112 (or any other source code analyzer).; [0076] A source code analyser 606 is optionally present at the computing device. A source code auto-complete component 608 may be present. A source code generator 622 may be present. A source code error check and/or error correction component 624 may be present. A data store 610 holds data such as natural source code examples, probability distribution parameters, context data from the source code analyser 606, and other data.).
Heinecke, Bhandari, Dabkowski, Warnecke, and Tarlow are combinable for the same rationale as set forth above with respect to claim 4.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Heinecke, in view of Bhandari, Dabkowski, Warnecke, Tarlow, and further in view of Sharma et al. (NPL: "A Survey on Machine Learning Techniques for Source Code Analysis", hereinafter 'Sharma') and Sirianni et al. (U.S. Patent No. 10949338, hereinafter 'Sirianni').
Regarding claim 5, Heinecke, as modified by Bhandari, Dabkowski, Warnecke, and Tarlow, teaches The system of claim 4.
Sharma teaches wherein the testing data subset component groups the plurality of code samples based on quantitative performance metrics derived from artificial intelligence model predictions utilizing prediction confidence scores and error classification ([Feature extraction:, pg. 14-15] The majority of studies utilize similarity metrics to extract similar bug patterns and, respectively, correct bug fixes.; information (as source) and the Delta changes that resolved the diagnostic (as target) into a Neural Machine Translation network for training. Furthermore, Li et al. [182] used the prior bug fixes and the surrounding code contexts of the fixes for code transformation learning. Saha et al. [265] developed a ml model that relies on four features derived from a program’s context, i.e., the source-code surrounding the potential repair location, and the bug report. Finally, Bader et al. [36] utilized a wherein the testing data subset component groups the plurality of code samples based on quantitative performance metrics derived from artificial intelligence model predictions ranking technique that also considers the context of a code change, and selects the most appropriate fix for a given bug.; [ML model training:, pg. 12] To train their ml models with the selected datasets and features, researchers used various algorithms. Most of them [37, 94, 114, 167] employed utilizing prediction confidence scores and error classification Naive Bayes, Multilayer Perceptron, K Nearest Neighbors, Random Forest, Decision Tree, Support Vector Machine, and Linear Regression. Two studies [94, 114] concluded that Support Vector Machine offers the best results for testing effort prediction. Tripathi and Dabkowski [316] pointed out that AdaBoost and Random Forest are the best performing algorithms in this context.);
wherein the extraction component extracts a plurality of code complexity metrics from the code samples, including at least one of: cyclomatic complexity, Halstead complexity, maintainability index, or control flow complexity ([Feature extraction:, pg. 10] The most common features to train a defect prediction model are the source code Halstead complexity metrics introduced by Halstead [126], Chidamber and Kemerer [76], and McCabe [213]. Most of the examined studies [59, 63, 66, 67, 78, 108, 155, 159, 204, 208–211, 276, 295, 338] used a large number of metrics such as Lines of Code, Number of Children, Coupling Between Objects, and including at least one of: cyclomatic complexity Cyclomatic Complexity. In addition to the above, some authors [46, 64, 89, 248] suggested the use of dimensional space reduction techniques—such as Principal Component Analysis (pca)—to limit the number of features. Pandey and Gupta [237] used Sequential Forward Search (sfs) to extract relevant source code metrics. Dos Santos et al. [92] suggested a sampling-based approach to extract source code metrics to train defect prediction models. Kaur et al. [154] suggested an approach to fetch entropy of change metrics. Bowes et al. [51] introduced a novel set of metrics constructed in terms of mutants and the test cases that cover and detect them.; [4.9 Refactoring, pg. 28] Refactoring transformations are intended to improve code quality (specifically maintainability index maintainability), while preserving the program behavior (functional requirements) from users’ perspective [304]. This section summarizes the studies that identify refactoring candidates by analyzing source code and by applying ml techniques on code. A process pipeline typically adopted by the studies in this category can be viewed as a three step process. In the first step, the source code of the projects is used to prepare a dataset for training. Then, individual wherein the extraction component extracts a plurality of code complexity metrics from the code samples samples (i.e., either a method, class, or a file) is processed to extract relevant features. The extracted features are then fed to an ml model for training. Once trained, the model is used to predict whether an input sample is a candidate for refactoring or not. [Feature extraction:, pg. 30] Authors used static source code metrics, cfgs, asts, source code tokens, and word embeddings as features. Source code metrics: A set of studies [26, 79, 83, 100, 120, 215, 246, 254] used more than 20 static source code metrics (such as cyclomatic complexity, maximum depth of class in inheritance tree, number of statements, and number of blank lines). control flow complexity Data/control flow and ast: Bilgin et al. [48], Kim et al. [156], Kronjee et al. [163], Ma et al. [204] used cfgs, asts, or data flow analysis as features. More specifically, Ma et al. [205] extracted the api calls from the cfgs of their dataset and collected information such as the usage of apis (which apis the application uses), the api frequencies (how many times the application uses apis) and api sequence (the order the application uses apis). Kim et al. [156] extracted asts and gfcs which they tokenized and fed into ml models, while Bilgin et al. [48] extracted asts and translated their representation of source code into a one-dimensional numerical array to fed them to a model. Kronjee et al. [163] used data-flow analysis to extract features, while Spreitzenbarth et al. [298] used static, dynamic analysis, and information collected from ltrace to collect features and train a linear vulnerability detection model.); and
wherein the comparison component generates a statistical distribution model of the extracted code complexity metrics across the testing data subsets and identifies outlier patterns and complexity thresholds that affect artificial intelligence model prediction accuracy ([Feature extraction:, pg. 14] The majority of studies utilize wherein the comparison component generates a statistical distribution model of the extracted code complexity metrics across the testing data subsets similarity metrics to extract similar bug patterns and, respectively, correct bug fixes. These studies mostly employ word embeddings for code representation and abstraction. In particular, Amorim et al. [29], Santos et al. [270], Svyatkovskiy et al. [306], and Chen et al. [73], leveraged source-code naturalness and applied nlp-based metrics. Tian et al. [314] employed different representation learning approaches for code changes to derive embeddings for similarity computations. Ahmed et al. [7] used similar metrics for fixing compile-time errors. Additionally, Saha et al. [266] leveraged a code similarity analysis, which compares both syntactic and semantic features, and the revision history of a software project under examination, from Defects4J, for fixing multi-hunk bugs, i.e., bugs that require applying a substantially similar patch to different locations. Furthermore, Wang et al. [337] investigated, using similarity metrics, how these machine-generated correct patches can be semantically equivalent to human patches, and how bug characteristics affect patch generation. Sakkas et al. [268] also applied similarity metrics.; [Probabilistic predictions:, pg. 15-16] Here, we list papers that use probabilistic learning and ml approaches such as association rules, Decision Tree, and Support Vector Machine to predict bug locations and fixes for automated program repair. Long and Rinard [197] introduced a repair tool called Prophet, which uses a set of successful manual patches from open-source software repositories, to learn a probabilistic model of correct code, and generate patches. Soto and Le Goues [297] conducted a granular analysis using different statement kinds to identify those statements that are more likely to be modified than others during bug fixing. For this, they used simplified syntax trees and association rules. Gopinath et al. [110] presented a data-driven approach for fixing of bugs in database statements. For predicting the correct behavior for defect-inducing data, this study uses Support Vector Machine and Decision Tree. Saha et al. [265] developed Elixir repair approach that uses Logistic Regression models and similarity-score metrics. Bader et al. [36] developed a repair approach called Getafix that uses identifies outlier patterns and complexity thresholds that affect artificial intelligence model prediction accuracy hierarchical clustering to summarize fix patterns into a hierarchy ranging from general to specific patterns. Xiong et al. [346] introduced L2S that uses ml to estimate conditional probabilities for the candidates at each search step, and search algorithms to find the best possible solutions. Gopinath et al. [111] used Support Vector Machine and ID3 with path exploration to repair bugs in complex data structures. Le et al. [172] conducted an empirical study on the capabilities of program repair tools, and applied Random Forest to predict whether using genetic programming search in apr can lead to a repair within a desired time limit.).
Heinecke, Bhandari, Dabkowski, Warnecke, Tarlow, and Sharma are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Heinecke, Bhandari, Dabkowski, Warnecke, and Tarlow, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Sharma to Heinecke before the effective filing date of the claimed invention in order to consolidate and summarize the techniques, resources, and challenges of applied machine learning for source code analysis (cf. Sharma, [I Introduction, pg. 2] The goal of this study is to summarize the current knowledge of applied machine learning for source code analysis. We also aim to collate and consolidate available resources (in the form of datasets and tools) that researchers have used in similar studies. Additionally, we aim to identify challenges in the domain and present them in a synthesized form. We believe that our efforts to consolidate and summarize the techniques, resources, and challenges will help the community to not only understand the state-of-the-art, but also to focus their efforts on tackling the identified challenges.).
Sirianni teaches further comprising: a comparison component that compares a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets ([Col. 4, Lines 7-16] The techniques described herein provide an automated software testing toolset for analyzing potential release candidate builds before deployment. The techniques described herein provide the tools to identify source code that has the potential to be a source of latent errors that may cause problems. The techniques described herein analyze data obtained from software testing, discovers bugs using a machine-learning bug detector, automatically assesses the severity of the potential bug, and displays the results to the user.; [Col. 2, Lines 27-34] The method also includes analyzing, by the computing device and using a machine learning model, the path logs and the output logs stored in the bug detection database to identify an abnormality indicative of a potential bug in the source code. The method further includes outputting, by the computing device and for display, a graphical representation of the abnormality.; [Col. 18, Lines 45-55] For example, a comparison component that compares a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets analysis module 240 may determine, using standard deviations of values in the output log, whether the output log contains a statistical outlier compared to every other output log stored in the bug detection database. In other examples, for each input in the respective set of one or more inputs in the test log, analysis module 240 may determine, using N-grams in the path log, whether one or more of a line of the source code before the respective input in the path log or a line of the source code after the respective input in the path log is the outlier compared to every other path log.; [Col. 18, Lines 56-64] In the instances where execution statistics are tracked by testing module 238, for each execution statistic of the one or more execution statistics tracked for each execution, analysis module 240, in the analyzing portion of the bug detection process, may create a distribution of the respective execution statistic from each execution and analyze the respective execution statistic from each execution to identify a statistical outlier in the distribution of the respective execution statistic.).
Heinecke, Bhandari, Dabkowski, Warnecke, Tarlow, Sharma, and Sirianni are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Heinecke, Bhandari, Dabkowski, Warnecke, Tarlow, and Sharma, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Sirianni to Heinecke before the effective filing date of the claimed invention in order to efficiently and effectively identify bugs that would be invisible or otherwise undetectable to a regular unit tester, resulting in higher quality applications, assuring privacy, security, and reliability of these applications (cf. Sirianni, [Col. 2, Lines 5-8] As such, the device may efficiently and effectively identify bugs that would be invisible or otherwise undetectable to a regular unit tester, resulting in higher quality applications that execute as intended. By providing higher quality applications, the users of these applications are further assured that the privacy, security, and reliability of these applications are at the highest possible quality given the constraints of the application itself.).
Claims 10, 15, 20, 25 are rejected under 35 U.S.C. 103 as being unpatentable over Heinecke, in view of Bhandari, Dabkowski, Warnecke, Tarlow, and further in view of Sirianni et al. (U.S. Patent No. 10949338, hereinafter 'Sirianni').
Regarding claim 10, Heinecke, as modified by Bhandari, Dabkowski, Warnecke, and Tarlow, teaches The system of claim 9.
Heinecke, as modified by Bhandari, Dabkowski, Warnecke, and Tarlow, fails to teach further comprising: a comparison component that compares a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets.
Sirianni teaches further comprising: a comparison component that compares a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets ([Col. 4, Lines 7-16] The techniques described herein provide an automated software testing toolset for analyzing potential release candidate builds before deployment. The techniques described herein provide the tools to identify source code that has the potential to be a source of latent errors that may cause problems. The techniques described herein analyze data obtained from software testing, discovers bugs using a machine-learning bug detector, automatically assesses the severity of the potential bug, and displays the results to the user.; [Col. 2, Lines 27-34] The method also includes analyzing, by the computing device and using a machine learning model, the path logs and the output logs stored in the bug detection database to identify an abnormality indicative of a potential bug in the source code. The method further includes outputting, by the computing device and for display, a graphical representation of the abnormality.; [Col. 18, Lines 45-55] For example, a comparison component that compares a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets analysis module 240 may determine, using standard deviations of values in the output log, whether the output log contains a statistical outlier compared to every other output log stored in the bug detection database. In other examples, for each input in the respective set of one or more inputs in the test log, analysis module 240 may determine, using N-grams in the path log, whether one or more of a line of the source code before the respective input in the path log or a line of the source code after the respective input in the path log is the outlier compared to every other path log.; [Col. 18, Lines 56-64] In the instances where execution statistics are tracked by testing module 238, for each execution statistic of the one or more execution statistics tracked for each execution, analysis module 240, in the analyzing portion of the bug detection process, may create a distribution of the respective execution statistic from each execution and analyze the respective execution statistic from each execution to identify a statistical outlier in the distribution of the respective execution statistic.).
Heinecke, Bhandari, Dabkowski, Warnecke, Tarlow, and Sirianni are considered to be analogous to the claimed invention because they are in the same field of machine learning. In view of the teachings of Heinecke, Bhandari, Dabkowski, Warnecke, and Tarlow, it would have been obvious for a person of ordinary skill in the art to apply the teachings of Sirianni to Heinecke before the effective filing date of the claimed invention in order to efficiently and effectively identify bugs that would be invisible or otherwise undetectable to a regular unit tester, resulting in higher quality applications, assuring privacy, security, and reliability of these applications (cf. Sirianni, [Col. 2, Lines 5-8] As such, the device may efficiently and effectively identify bugs that would be invisible or otherwise undetectable to a regular unit tester, resulting in higher quality applications that execute as intended. By providing higher quality applications, the users of these applications are further assured that the privacy, security, and reliability of these applications are at the highest possible quality given the constraints of the application itself.).
Regarding claim 15, Heinecke, as modified by Bhandari, Dabkowski, Warnecke, and Tarlow, teaches The computer-implemented method of claim 14.
Heinecke, as modified by Bhandari, Dabkowski, Warnecke, and Tarlow, fails to teach further comprising: comparing, by the system, a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets.
Sirianni teaches further comprising: comparing, by the system, a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets ([Col. 4, Lines 7-16] The techniques described herein provide an automated software testing toolset for analyzing potential release candidate builds before deployment. The techniques described herein provide the tools to identify source code that has the potential to be a source of latent errors that may cause problems. The techniques described herein analyze data obtained from software testing, discovers bugs using a machine-learning bug detector, automatically assesses the severity of the potential bug, and displays the results to the user.; [Col. 2, Lines 27-34] The method also includes analyzing, by the computing device and using a machine learning model, the path logs and the output logs stored in the bug detection database to identify an abnormality indicative of a potential bug in the source code. The method further includes outputting, by the computing device and for display, a graphical representation of the abnormality.; [Col. 18, Lines 45-55] For example, comparing, by the system, a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets analysis module 240 may determine, using standard deviations of values in the output log, whether the output log contains a statistical outlier compared to every other output log stored in the bug detection database. In other examples, for each input in the respective set of one or more inputs in the test log, analysis module 240 may determine, using N-grams in the path log, whether one or more of a line of the source code before the respective input in the path log or a line of the source code after the respective input in the path log is the outlier compared to every other path log.; [Col. 18, Lines 56-64] In the instances where execution statistics are tracked by testing module 238, for each execution statistic of the one or more execution statistics tracked for each execution, analysis module 240, in the analyzing portion of the bug detection process, may create a distribution of the respective execution statistic from each execution and analyze the respective execution statistic from each execution to identify a statistical outlier in the distribution of the respective execution statistic.).
Heinecke, Bhandari, Dabkowski, Warnecke, Tarlow, and Sirianni are combinable for the same rationale as set forth above with respect to claim 10.
Regarding claim 20, Heinecke, as modified by Bhandari, Dabkowski, Warnecke, and Tarlow, teaches The computer-implemented method of claim 19.
Heinecke, as modified by Bhandari, Dabkowski, Warnecke, and Tarlow, fails to teach further comprising: comparing, by the system, a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets.
Sirianni teaches further comprising: comparing, by the system, a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets ([Col. 4, Lines 7-16] The techniques described herein provide an automated software testing toolset for analyzing potential release candidate builds before deployment. The techniques described herein provide the tools to identify source code that has the potential to be a source of latent errors that may cause problems. The techniques described herein analyze data obtained from software testing, discovers bugs using a machine-learning bug detector, automatically assesses the severity of the potential bug, and displays the results to the user.; [Col. 2, Lines 27-34] The method also includes analyzing, by the computing device and using a machine learning model, the path logs and the output logs stored in the bug detection database to identify an abnormality indicative of a potential bug in the source code. The method further includes outputting, by the computing device and for display, a graphical representation of the abnormality.; [Col. 18, Lines 45-55] For example, comparing, by the system, a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets analysis module 240 may determine, using standard deviations of values in the output log, whether the output log contains a statistical outlier compared to every other output log stored in the bug detection database. In other examples, for each input in the respective set of one or more inputs in the test log, analysis module 240 may determine, using N-grams in the path log, whether one or more of a line of the source code before the respective input in the path log or a line of the source code after the respective input in the path log is the outlier compared to every other path log.; [Col. 18, Lines 56-64] In the instances where execution statistics are tracked by testing module 238, for each execution statistic of the one or more execution statistics tracked for each execution, analysis module 240, in the analyzing portion of the bug detection process, may create a distribution of the respective execution statistic from each execution and analyze the respective execution statistic from each execution to identify a statistical outlier in the distribution of the respective execution statistic.).
Heinecke, Bhandari, Dabkowski, Warnecke, Tarlow, and Sirianni are combinable for the same rationale as set forth above with respect to claim 10.
Regarding claim 25, Heinecke, as modified by Bhandari, Dabkowski, Warnecke, and Tarlow, teaches The computer program product of claim 24.
Heinecke, as modified by Bhandari, Dabkowski, Warnecke, and Tarlow, fails to teach wherein the program instructions further cause the processor to: compare, by the processor, a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets.
Sirianni teaches wherein the program instructions further cause the processor to: compare, by the processor, a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets ([Col. 4, Lines 7-16] The techniques described herein provide an automated software testing toolset for analyzing potential release candidate builds before deployment. The techniques described herein provide the tools to identify source code that has the potential to be a source of latent errors that may cause problems. The techniques described herein analyze data obtained from software testing, discovers bugs using a machine-learning bug detector, automatically assesses the severity of the potential bug, and displays the results to the user.; [Col. 2, Lines 27-34] The method also includes analyzing, by the computing device and using a machine learning model, the path logs and the output logs stored in the bug detection database to identify an abnormality indicative of a potential bug in the source code. The method further includes outputting, by the computing device and for display, a graphical representation of the abnormality.; [Col. 18, Lines 45-55] For example, compare, by the processor, a first distribution of code samples associated with a first testing data subset from the plurality of testing data subsets with a second distribution of code samples associated with a second testing data subset from the plurality of testing data subsets analysis module 240 may determine, using standard deviations of values in the output log, whether the output log contains a statistical outlier compared to every other output log stored in the bug detection database. In other examples, for each input in the respective set of one or more inputs in the test log, analysis module 240 may determine, using N-grams in the path log, whether one or more of a line of the source code before the respective input in the path log or a line of the source code after the respective input in the path log is the outlier compared to every other path log.; [Col. 18, Lines 56-64] In the instances where execution statistics are tracked by testing module 238, for each execution statistic of the one or more execution statistics tracked for each execution, analysis module 240, in the analyzing portion of the bug detection process, may create a distribution of the respective execution statistic from each execution and analyze the respective execution statistic from each execution to identify a statistical outlier in the distribution of the respective execution statistic.).
Heinecke, Bhandari, Dabkowski, Warnecke, Tarlow, and Sirianni are combinable for the same rationale as set forth above with respect to claim 10.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Baniecki et al. (NPL: “dalex: Responsible Machine Learning with Interactive Explainability and Fairness in Python”) teaches dalex, a Python package which implements a model-agnostic interface for interactive explainability and fairness.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAGGIE MAIDO whose telephone number is (703) 756-1953. The examiner can normally be reached M-Th: 6am - 4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached on (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MM/Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129