Prosecution Insights
Last updated: April 19, 2026
Application No. 18/426,290

MACHINE LEARNING BASED STATIC PROFILING

Non-Final OA §101§103
Filed
Jan 29, 2024
Examiner
DUAN, VIVIAN WEIJIA
Art Unit
2191
Tech Center
2100 — Computer Architecture & Software
Assignee
Oracle International Corporation
OA Round
1 (Non-Final)
70%
Grant Probability
Favorable
1-2
OA Rounds
2y 9m
To Grant
99%
With Interview

Examiner Intelligence

Grants 70% — above average
70%
Career Allow Rate
7 granted / 10 resolved
+15.0% vs TC avg
Strong +52% interview lift
Without
With
+52.4%
Interview Lift
resolved cases with interview
Typical timeline
2y 9m
Avg Prosecution
28 currently pending
Career history
38
Total Applications
across all art units

Statute-Specific Performance

§101
27.2%
-12.8% vs TC avg
§103
40.8%
+0.8% vs TC avg
§102
7.6%
-32.4% vs TC avg
§112
20.9%
-19.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 10 resolved cases

Office Action

§101 §103
DETAILED ACTION This action is in response to the claims filed January 29, 2024. Claims 1-20 are pending. Claims 1, 19, and 20 are independent claims. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claim Objections Claims 4, 5, and 10 are objected to because of the following informalities: - Claim 4 reads “based on the corresponding branch being to a loop body”. This should likely read “based on the corresponding branch being a loop body”. - Claim 4 recites “wherein changing the branch frequency value is to a revised value”. This should likely read “wherein the branch frequency value is changed to a revised value”. - Claim 5 recites “based on the corresponding branch being to a program end”. This should likely read “based on the corresponding branch being a program end”. - Claim 5 recites “wherein changing the branch frequency value is to the maximum threshold”. This should likely read “wherein the branch frequency value is changed to the maximum threshold”. - Claim 10 reads “The method of claim 1,” this should likely read “The method of claim 1, further comprising:”. Appropriate correction is required. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Regarding claim 1, 19, and 20, the limitations “extracting a plurality of control flow split node features of a control flow split node in the IR graph”, “processing,…, the plurality of control flow split node features to generate a branch frequency value of a branch from the control flow split node”, “adding the branch frequency value to a profile for the program”, and “generate optimized code” as drafted, are functions that, under their broadest reasonable interpretation, recite the abstract idea of a mental process. The limitation encompasses a human mind carrying out the function through observation, evaluation, judgement, and/or opinion, or even with the aid of pen and paper. Thus, these limitations recite and fall under the “Mental Processes” grouping of abstract ideas under Prong 1. Under Prong 2, this judicial exception is not integrated into a practical application. The additional limitations “by a regression machine learning model”, “executing an optimizer on the program according to the profile…”, “a system comprising: memory; and at least one computer processor for executing computer readable program code to perform operations comprising:”, and “a non-transitory computer readable medium comprising computer readable program code for causing a computer system to perform operations comprising” are recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using a generic computer, and/or mere computer components. See MPEP 2106.05(f). The limitation “obtaining an intermediate representation (IR) graph of source code of a program” does nothing more than add the insignificant extra solution activity of merely gathering and transmitting data to the judicial exception. See 2106.05(g). Accordingly, the additional elements do not integrate the judicial exception into a practical application and the claim is therefore directed to the judicial exception. Under Step 2B, the claims do not recite additional elements which are sufficient to amount to significantly more. As discussed above with respect to integration of the abstract idea into a practical application, the limitations “by a regression machine learning model”, “executing an optimizer on the program according to the profile…”, “a system comprising: memory; and at least one computer processor for executing computer readable program code to perform operations comprising:”, and “a non-transitory computer readable medium comprising computer readable program code for causing a computer system to perform operations comprising” amount to no more than mere instructions, or generic computer/computer components to carry out the exception. Mere instructions to apply an exception cannot provide an inventive concept. See MPEP 2106.05(f). For the limitation “obtaining an intermediate representation (IR) graph of source code of a program”, the courts have identified mere data gathering and transmitting as well-understood, routine, and conventional activity. See MPEP 2106.05(d). Accordingly, the claims are not patent eligible under 35 U.S.C. 101. Claim 2 does not recite additional mental steps. Under Prong 2, the limitation “executing the optimized code” is directed to the insignificant extra solution activity of insignificant application. See MPEP 2106.05(g). Under Step 2B, executing optimized code is a well-understood, routine, and conventional activity. The claim recites insignificant extra-solution activity which also does not amount to significantly more. The background of US 5613121 A (“Method and System of Generating Combined Storage References” states “It is known to perform optimizations on the object code produced by a compiler to achieve improved execution speeds of the optimized object code…”. The background of US 5655122A (“Optimizing Compiler with Static Prediction of Branch Probability, Branch Frequency and Function Frequency” states “The branch frequencies, derived from the branch probabilities, may then be a more direct bases for storing the object code in a given order to optimize program execution”. The Background of US 5875317 A (“Boosting Control Method and Processor Apparatus Having Boosting Control Portion”) states, “To satisfy such a requirement, a compiler that generates execution code for the processor analyzes the dependency of data for a corresponding software and changes the sequence of the operations according to the analyzed result, without changing the purpose of the software, so as to optimize the execution code”. These indicate that it is well-known, routine, and conventional to execute optimized code. Thus, executing optimized code does not amount to significantly more than the judicial exception and cannot provide an inventive concept. Regarding claim 3, the limitation “changing the branch frequency value of a corresponding branch in the profile based on a heuristic rule” is an additional mental step. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 4, the limitation “based on the corresponding branch being to a loop body, comparing the branch frequency value to a minimum threshold, wherein changing the branch frequency value is to a revised value responsive to the branch frequency value being less than the minimum threshold” is an additional mental step. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 5, the limitation “based on the corresponding branch being to a program end, comparing the branch frequency value to a maximum threshold, wherein changing the branch frequency value is to the maximum threshold responsive to the branch frequency value being greater than the maximum threshold” is an additional mental step. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 6, the limitations “extracting a first set of features from the control flow split node”, “extracting a second set of features from a hosting block of a control flow graph, wherein the hosting block comprises the control flow split node, and wherein the control flow graph is generated from the IR graph”, and “combining the first set of features and the second set of features into the plurality of control flow split node features” are additional mental steps. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 7 the limitations “extracting a first set of features from the control flow split node”, “extracting a second set of features from a hosting block of a control flow graph, wherein the hosting block comprises the control flow split node, and wherein the control flow graph is generated from the IR graph”, “extracting a third set of features from at least one predecessor block of the hosting block”, and “combining the first set of features, the second set of features, and the third set of features into the plurality of control flow split node features” are additional mental steps. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 8, the limitations “extracting a first set of features from the control flow split node”, “extracting a second set of features from a hosting block of a control flow graph, wherein the hosting block comprises the control flow split node, and wherein the control flow graph is generated from the IR graph”, “extracting a third set of features from at least one successor block of the hosting block”, and “combining the first set of features, the second set of features, and the third set of features into the plurality of control flow split node features” are additional mental steps. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 9, the limitations “extracting a first set of features from the control flow split node”, “extracting a second set of features from a hosting block of a control flow graph, wherein the hosting block comprises the control flow split node, and wherein the control flow graph is generated from the IR graph”, “extracting a third set of features from at least one predecessor block of the hosting block”, “extracting a fourth set of features from at least one successor block of the hosting block”, and “combining the first set of features, the second set of features, the third set of features, and the fourth set of features into the plurality of control flow split node features” are additional mental steps. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 10, the limitation “concatenating the plurality of control flow split node features into a feature vector” is an additional mental step. The limitation “wherein processing the plurality of control flow split node features comprises processing the feature vector” is a mere instruction to apply “processing” to a feature vector, which does not amount to practical application under Prong 2, nor to significantly more under Step 2B. See MPEP 2106.05(f). Regarding claim 11, the limitations “estimating, based on a set of nodes in a block of a control flow graph, an assembly size of the block, wherein the control flow graph is generated from the IR graph” and “adding the assembly size to the plurality of control flow split node features” are additional mental processes. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 12, the limitations “summing, across a set of nodes in a block of a control flow graph, an estimated number of processing cycles of each node in the set of nodes based on a node type of the node to generate an estimated number of processing cycles of the block, wherein the control flow graph is generated from the IR graph” and “adding the estimated number of processing cycles of the block to the plurality of control flow split node features” are additional mental steps. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 13, the limitations “totaling, across a set of nodes in a block of a control flow graph, a number of processing cheap nodes in the set of nodes based on a node type of each node in the set of nodes, wherein the number of processing cheap nodes are estimated to use less than a first threshold number of processing cycles, wherein the control flow graph is generated from the IR graph” and “adding the number of processing cheap nodes of the block to the plurality of control flow split node features” are additional mental steps. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 14, the limitations “totaling, across a set of nodes in a block of a control flow graph, a number of processing expensive nodes in the set of nodes based on a node type of each node in the set of nodes, wherein the number of processing expensive nodes are estimated to use more than a second threshold number of processing cycles, and wherein the control flow graph is generated from the IR graph” and “adding the number of processing expensive nodes of the block to the plurality of control flow split node features” are additional mental steps. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 15, the limitations “partitioning a plurality of nodes in a block of a control flow graph according to a node type of each node in the plurality of nodes to obtain a plurality of partitions, wherein the control flow graph is generated from the IR graph” and “generating a control flow split node feature of the plurality of control flow split node features based on a number of nodes in each partition of the plurality of partitions” are additional mental steps. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 16, the limitations “counting floating nodes in a set of nodes in a block of a control flow graph to obtain a count, wherein the control flow graph is generated from the IR graph” and “adding the count as a control flow split node feature of the plurality of control flow split node features” are additional mental steps. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Regarding claim 17, the limitations “generating a training IR graph of training source code of a training program”, “extracting a training plurality of control flow split node features of a training control flow split node in the training IR graph”, “processing,…, the training plurality of control flow split node features to generate a predicted branch frequency value of a second branch from the training control flow split node”, “adding the predicted branch frequency value to a predicted profile for the training program”, “instrumenting the training program to obtain an instrumented program”, “calculating a loss between the dynamic profile and the predicted profile”, and “updating the regression machine learning model according to the loss” are additional mental steps. The additional limitations “by the regression machine learning model”, and “executing the instrumented program to generate a dynamic profile of the instrumented program” merely applies a generic computer/computer components to the judicial exception, which does not amount to practical application under Prong 2, nor to significantly more under Step 2B, as explained above. Regarding claim 18, the limitation “wherein calculating the loss comprises: calculating a difference between an execution frequency for the control flow split node obtained from dynamic profile and the predicted branch frequency value for the control flow split node” is an additional mental step. The claim does not recite additional elements which may integrate the judicial exception into a practical application under Prong 2, nor to significantly more under Step 2B. Allowable Subject Matter Claims 4-5 and 13-14 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 101, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-3, 6, 8, 10-12, 15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over “VESPA: Static Profiling for Binary Optimization” by Moreira et. al, (hereinafter “Moreira”), in view of “Dominance-based Duplication Simulation (DBDS): Code Duplication to Enable Compiler Optimizations” by Leopoldseder et. al, (hereinafter “Leopoldseder”). Regarding claim 1, Moreira discloses: A method comprising: … - extracting a plurality of [branch] features of a [branch] in the [graph] (Page 3, “A program is formed by basic blocks, which are arranged in a control flow graph (CFG). A basic block is a sequence of non-branch instructions, except for the last one; thus, if we disregard preemption at the OS level, these instructions will be executed in sequence. The instructions in a basic block are naturally placed together in memory. A CFG is a directed graph where each vertex is a basic block. An edge between two blocks, e.g. BB𝑖 to BB𝑗, indicates that control may flow from BB𝑖 to BB𝑗. In this case, we say that BB𝑗 is a successor of BBi”; Page 6, “Example 3.1. Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function”; Page 7, “Example 3.3 (Static Program Feature). Consider the instruction “branch 𝑝0 ℓexit” in Figure 1(a). Several program features are associated with this instruction. A first feature is the opcode used to implement this instruction, e.g., in x86’s machine code: jne, je, jg, jle, jl, and jge. A second feature is the opcode used to produce the value 𝑝0. This opcode could be any arithmetic or logic operation. A third feature is the direction of the branch: forward or backward. In this example, we have a forward branch”) [Examiner’s remarks: An IR branch contain nodes with split off due to branch instructions. Features that are related to that branch (node in IR graph) are extracted.]; - processing, by a regression machine learning model, the plurality of [branch] features to generate a branch frequency value of a branch from the control flow split node (Page 2, “To overcome this limitation, we adapt ESP to estimate how frequently each branch will be executed. We call this adaptation VESPA - short for Vintage ESP Amended. VESPA consists of training and prediction phases. During training, we collect an assortment of static features from programs. By observing the execution of a corpus of representative benchmarks, we associate these features with the probability that a branch is taken. During prediction, we build a static profile for an unknown program. To this end, the model predicts the outcome of this program’s branches based on the features that characterize them”; Page 5, “A static profiler, like its dynamic counterpart, builds a map that associates control flow edges with their execution frequencies. The vast majority of static techniques depart from branch probabilities. In other words, they seek to estimate the chance that a conditional branch can be taken”; Page 25, “However, whereas Wu and Larus use nine features, we use 56, and whereas they combine heuristics via Dempster-Shafer formula, we use a regression model powered by a neural network [processing, by a regression machine learning model, the plurality of [branch] features to generate a branch frequency value of a branch from the control flow split node]”) [Examiner’s remarks: Nodes in the graph may split due to branch instructions. Features extracted from the branch are extracted and used to branch probabilities (branch frequencies).]; - adding the branch frequency value to a profile for the program (Page 2, “During prediction, we build a static profile for an unknown program. To this end, the model predicts the outcome of this program’s branches based on the features that characterize them”; Page 5, “A static profiler, like its dynamic counterpart, builds a map that associates control flow edges with their execution frequencies. The vast majority of static techniques depart from branch probabilities. In other words, they seek to estimate the chance that a conditional branch can be taken”) [Examiner’s remarks: Static profile for the program is build using branch frequency estimates from the model.]; and - executing an optimizer on the program according to the profile to generate optimized code (Page 2, “During prediction, we build a static profile for an unknown program. To this end, the model predicts the outcome of this program’s branches based on the features that characterize them. These probabilities are then passed to a binary optimizer to guide its optimization decisions”) [Examiner’s remarks: The probabilities (frequencies) estimated are used by the binary optimizer to optimize code.]. Moreira does not explicitly disclose: - obtaining an intermediate representation (IR) graph of source code of a program; - IR graph - control flow split node However, Leopoldseder discloses: - obtaining an intermediate representation (IR) graph of source code of a program (Pages 129, “Graal IR is a sea-of-nodes-based [9] directed graph in SSA form [10]. Each IR node produces at most one value. Data flow is represented with upward edges and control flow with downward edges. The IR is a superposition of the control-flow and the data-flow graph”); - IR graph (Pages 129, “Graal IR is a sea-of-nodes-based [9] directed graph in SSA form [10]. Each IR node produces at most one value. Data flow is represented with upward edges and control flow with downward edges. The IR is a superposition of the control-flow and the data-flow graph”) [Examiner’s remarks: Moreira discloses having a control flow graph. Moreira does not disclose the graph being an IR graph. However, Leopoldseder discloses having the graph be derived from an intermediate representation, as is common in Graal IR. Therefore, one of ordinary skill in the art may replace the control flow graph of Moreira with the IR graph of Leopoldseder which also represents control-flow to achieve the present limitation.] - control flow split node (Page 132, “Additionally, we use probability information at each control-flow split to determine relative basic block execution frequencies. The probability information is generated from HotSpot’s [26] profiling information”) [Examiner’s remarks: Moreira discloses having nodes which contain branches as the last instruction to form a control flow graph. Thus, each node splits due to the branch. Moreira further discloses deriving features from the branches and surrounding blocks. Moreira does not explicitly state using control flow split nodes. However, Leopoldseder explicitly states that control flow split nodes may be used for information collection.] Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Leopoldseder into the teachings of Moreira to include “obtaining an intermediate representation (IR) graph of source code of a program”, “IR graph”, and “control flow split node”. As stated in Leopoldseder, “We present a duplication optimization cost model that tries to maximize peak performance while minimizing code size and compilation time” (Page 127). Leopoldseder discloses using information to achieve program optimization statically. This optimization minimizes code while maximizing time saved. Usage of graphs allows for easy visualization of the flow of a program for analysis. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with control flow graphs. Regarding claim 2, the rejection of claim 1 is incorporated; and Moreira further discloses: - executing the optimized code (Page 2, “Summary of Results. We have implemented our static profiler on top of the BOLT binary optimizer [Panchenko et al. 2019]. This new version of BOLT has been evaluated on four large executables: clang, GCC, MySQL and PostgreSQL. Experiments reported in Section 5 show that the binaries generated by BOLT fed with the static profiles inferred by VESPA are on average around 6% faster than the same program compiled with clang-O3”). Regarding claim 3, the rejection of claim 1 is incorporated; and Moreira further discloses: - changing the branch frequency value of a corresponding branch in the profile based on a heuristic rule (Page 15: “Classic Predictors. The other branch prediction heuristic that we have added to BOLT is the static profiler proposed by Wu and Larus [1994]. As mentioned in Section 1, Wu and Larus use Dempster Shafer Theory [Dempster 1967] to combine nine branch-characterization features proposed by Ball and Larus [1993]. Table 6 lists these features. This table also shows the probability that a branch with the given feature is taken. We have computed such probabilities over a training corpus yet to be described in Section 5. In that section, we shall refer to this heuristic as Wu-Larus”) [Examiner’s remarks: Heuristics are used to change the probabilities (frequencies) of a branch).]. Regarding claim 6, the rejection of claim 1 is incorporated; and Moreira further discloses: - extracting a first set of features from the [branch] (Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 1, “OPCODE: the opcode of the branch instruction”) [Examiner’s remarks: Features for the branch (opcode of the branch) is extracted.]; - extracting a second set of features from a hosting block of a control flow graph, wherein the hosting block comprises the [branch], … (Page 3, “A program is formed by basic blocks, which are arranged in a control flow graph (CFG). A basic block is a sequence of non-branch instructions, except for the last one; thus, if we disregard preemption at the OS level, these instructions will be executed in sequence”; Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 6, “LOOP_HEADER: whether the basic block containing the branch is a loop header”) [Examiner’s remarks: Features from the hosting block (basic block) of the branch is extracted including whether the block has a loop header.]; and - combining the first set of features and the second set of features into the plurality of [branch] features (Page 6, “Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function”) [Examiner’s remarks: The features are combined into a vector for prediction.]. Moreira does not explicitly disclose: - control flow split node - … and wherein the control flow graph is generated from the IR graph; and However, Leopoldseder discloses: - control flow split node (Page 132, “Additionally, we use probability information at each control-flow split to determine relative basic block execution frequencies. The probability information is generated from HotSpot’s [26] profiling information”) [Examiner’s remarks: Leopoldseder discloses extracting profiling information from a control flow split node to determine frequencies. Control flow split nodes are equivalent to branches in the program. Therefore, gathering features at a branch as described by Moreira may be combined with the information gathering at control-flow splits of Leopoldseder.] - … and wherein the control flow graph is generated from the IR graph (Pages 129, “Graal IR is a sea-of-nodes-based [9] directed graph in SSA form [10]. Each IR node produces at most one value. Data flow is represented with upward edges and control flow with downward edges. The IR is a superposition of the control-flow and the data-flow graph”); and Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Leopoldseder into the teachings of Moreira to include “control flow split node” and “and wherein the control flow graph is generated from the IR graph”. As stated in Leopoldseder, “We present a duplication optimization cost model that tries to maximize peak performance while minimizing code size and compilation time” (Page 127). Leopoldseder discloses using information to achieve program optimization statically. This optimization minimizes code while maximizing time saved. Usage of graphs allows for easy visualization of the flow of a program for analysis. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with control flow graphs. Regarding claim 8, the rejection of claim 1 is incorporated; and Moreira further discloses: - extracting a first set of features from the [branch] (Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 1, “OPCODE: the opcode of the branch instruction”) [Examiner’s remarks: Features for the branch (opcode of the branch) is extracted.]; - extracting a second set of features from a hosting block of a control flow graph, wherein the hosting block comprises the [branch], … (Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 6, “LOOP_HEADER: whether the basic block containing the branch is a loop header”); - extracting a third set of features from at least one successor block of the hosting block (Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 12, “TS_LOOP_HEADER: whether the taken successor basic block is a loop header”) [Examiner’s remarks: A feature is extracted for the successor block of the hosting block, e.g., whether the successor block is a loop header.]; and - combining the first set of features, the second set of features, and the third set of features into the plurality of [graph] features (Page 6, “Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function”). Moreira does not explicitly disclose: - control flow split node - … and wherein the control flow graph is generated from the IR graph; and However, Leopoldseder discloses: - control flow split node (Page 132, “Additionally, we use probability information at each control-flow split to determine relative basic block execution frequencies. The probability information is generated from HotSpot’s [26] profiling information”) [Examiner’s remarks: Leopoldseder discloses extracting profiling information from a control flow split node to determine frequencies. Control flow split nodes are equivalent to branches in the program. Therefore, gathering features at a branch as described by Moreira may be combined with the information gathering at control-flow splits of Leopoldseder.] - … and wherein the control flow graph is generated from the IR graph (Pages 129, “Graal IR is a sea-of-nodes-based [9] directed graph in SSA form [10]. Each IR node produces at most one value. Data flow is represented with upward edges and control flow with downward edges. The IR is a superposition of the control-flow and the data-flow graph”); and Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Leopoldseder into the teachings of Moreira to include “control flow split node” and “and wherein the control flow graph is generated from the IR graph”. As stated in Leopoldseder, “We present a duplication optimization cost model that tries to maximize peak performance while minimizing code size and compilation time” (Page 127). Leopoldseder discloses using information to achieve program optimization statically. This optimization minimizes code while maximizing time saved. Usage of graphs allows for easy visualization of the flow of a program for analysis. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with control flow graphs. Regarding claim 10, the rejection of claim 1 is incorporated; and Moreira further discloses: - concatenating the plurality of control flow split node features into a feature vector (Page 6, “Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function. The value that results from applying this function onto the feature vector is the probability that a branch is taken. Figure 4(d) shows a very simple linear predictor. The goal of the training phase is to build an accurate prediction function. The models that we explore in Section 3.5 are non-linear”) [Examiner’s remarks: The features for the branch (control flow split node) are concatenated into a vector for prediction.]; and - wherein processing the plurality of control flow split node features comprises processing the feature vector (Page 6, “Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other- -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function. The value that results from applying this function onto the feature vector is the probability that a branch is taken. Figure 4(d) shows a very simple linear predictor. The goal of the training phase is to build an accurate prediction function. The models that we explore in Section 3.5 are non-linear”). Regarding claim 11, the rejection of claim 1 is incorporated; and Moreira further discloses: - adding the [feature] to the plurality of control flow split node features (Page 6, “Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function. The value that results from applying this function onto the feature vector is the probability that a branch is taken. Figure 4(d) shows a very simple linear predictor. The goal of the training phase is to build an accurate prediction function. The models that we explore in Section 3.5 are non-linear”). Moreira does not explicitly disclose: - estimating, based on a set of nodes in a block of a control flow graph, an assembly size of the block, wherein the control flow graph is generated from the IR graph; and - assembly size However, Leopoldseder discloses: - estimating, based on a set of nodes in a block of a control flow graph, an assembly size of the block, wherein the control flow graph is generated from the IR graph (Page 130, “The performance estimator returns a numeric run time estimation (called cycles) as well as a code size estimation for each IR node. We compute which nodes are new or deleted and can therefore compute a cycles saved (CS) measurement which tells us if a given optimization might increase peak performance. We compute code size increase in a similar fashion. The performance estimator is based on a performance cost model for IR nodes. Each IR node is given a cycle and size estimation that allows us to compare two nodes (instructions) with each other (see Section 5.3 for details)”; Page 132, “We express the meta information as Java Annotations1 per node class with enumeration values for the estimated cycles and code size. The costs are designed to be platform agnostic. Listing 7 shows the AbstractNewObjectNodeclass with an annotation specifying the cycles and code size of this IR node. We implemented the node cost annotations for over 400 different nodes in the Graal compiler. Based on these annotations, we can compute the costs and benefits of (sub)graphs. Additionally, we use probability information at each control-flow split to determine relative basic block execution frequencies”); and - assembly size (Page 132, “We express the meta information as Java Annotations1 per node class with enumeration values for the estimated cycles and code size. The costs are designed to be platform agnostic. Listing 7 shows the AbstractNewObjectNodeclass with an annotation specifying the cycles and code size of this IR node. We implemented the node cost annotations for over 400 different nodes in the Graal compiler. Based on these annotations, we can compute the costs and benefits of (sub)graphs. Additionally, we use probability information at each control-flow split to determine relative basic block execution frequencies”)[Examiner’s remarks: Assembly size may be treated as a gathered feature with numerical value and inserted into the machine learning training as is done with all other features.] Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Leopoldseder into the teachings of Moreira to include “estimating, based on a set of nodes in a block of a control flow graph, an assembly size of the block, wherein the control flow graph is generated from the IR graph” and “assembly size”. As stated in Leopoldseder, “We present a duplication optimization cost model that tries to maximize peak performance while minimizing code size and compilation time” (Page 127). Leopoldseder discloses using information to achieve program optimization statically. This optimization minimizes code while maximizing time saved. Usage of such parameters in training may further help to further the optimization. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with assembly size. Regarding claim 12, the rejection of claim 1 is incorporated; and Moreira further disclose: - adding the [feature] of the block to the plurality of control flow split node features (Page 6, “Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function. The value that results from applying this function onto the feature vector is the probability that a branch is taken. Figure 4(d) shows a very simple linear predictor. The goal of the training phase is to build an accurate prediction function. The models that we explore in Section 3.5 are non-linear”). Moreira does not explicitly disclose: - summing, across a set of nodes in a block of a control flow graph, an estimated number of processing cycles of each node in the set of nodes based on a node type of the node to generate an estimated number of processing cycles of the block, wherein the control flow graph is generated from the IR graph; and - estimated number of processing cycles However, Leopoldseder discloses: - summing, across a set of nodes in a block of a control flow graph, an estimated number of processing cycles of each node in the set of nodes based on a node type of the node to generate an estimated number of processing cycles of the block, wherein the control flow graph is generated from the IR graph (Page 130, “The performance estimator returns a numeric run time estimation (called cycles) as well as a code size estimation for each IR node. We compute which nodes are new or deleted and can therefore compute a cycles saved (CS) measurement which tells us if a given optimization might increase peak performance. We compute code size increase in a similar fashion. The performance estimator is based on a performance cost model for IR nodes. Each IR node is given a cycle and size estimation that allows us to compare two nodes (instructions) with each other (see Section 5.3 for details)”; Page 131, “Figure 3d shows the steps of the algorithm during the tra versal. We save value and type information for each involved IR node and update it regularly via the synonym mapping. Eventually, we iterate the division operation (x / φ) inbm2 and apply a strength reduction [1] AC on it. It returns true and we perform the action step. The action step returns a new instruction (x >> 1), which we save as a synonym for the division node. Our static performance estimator yields that the original division needs 32 cycles to execute while the shift only takes 1 cycle. Therefore, the cycles saved (CS) is computed as 32−1 = 31, i.e., we estimate that performing the duplication and subsequent optimizations reduces the run time of the program by 31 cycles”); and - estimated number of processing cycles (Page 132, “We express the meta information as Java Annotations1 per node class with enumeration values for the estimated cycles and code size. The costs are designed to be platform agnostic. Listing 7 shows the AbstractNewObjectNodeclass with an annotation specifying the cycles and code size of this IR node. We implemented the node cost annotations for over 400 different nodes in the Graal compiler. Based on these annotations, we can compute the costs and benefits of (sub)graphs. Additionally, we use probability information at each control-flow split to determine relative basic block execution frequencies”)[Examiner’s remarks: Number of cycles may be treated as a gathered feature with numerical value and inserted into the machine learning training as is done with all other features.] Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Leopoldseder into the teachings of Moreira to include “summing, across a set of nodes in a block of a control flow graph, an estimated number of processing cycles of each node in the set of nodes based on a node type of the node to generate an estimated number of processing cycles of the block, wherein the control flow graph is generated from the IR graph” and “estimated number of processing cycles”. As stated in Leopoldseder, “We present a duplication optimization cost model that tries to maximize peak performance while minimizing code size and compilation time” (Page 127). Leopoldseder discloses using information to achieve program optimization statically. This optimization minimizes code while maximizing time saved. Usage of such parameters in training may further help to further the optimization. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with estimated processing cycles as a feature. Regarding claim 15, the rejection of claim 1 is incorporated; and Moreira further discloses: - partitioning a plurality of nodes in a block of a control flow graph according to a node type of each node in the plurality of nodes to obtain a plurality of partitions, wherein the control flow graph is generated from the IR graph (Page 3, “A program is formed by basic blocks, which are arranged in a control flow graph (CFG). A basic block is a sequence of non-branch instructions, except for the last one; thus, if we disregard preemption at the OS level, these instructions will be executed in sequence. The instructions in a basic block are naturally placed together in memory. A CFG is a directed graph where each vertex is a basic block”; Table 2, ID 22, “NUM_STORES: number of store instructions in the basic block containing the branch”; Table 2, ID 26, “BASIC BLOCK SIZE: number of instructions in the basic block containing the branch”; Table 2, ID 27, “NUM_BASIC_BLOCKS: number of basic blocks in the function containing the branch”; Page 8, “Tables 2, 3 and 4 show the new set of features that we use to extend Calder et al.’s work. The motivation for these inclusions is empirical, meaning that we have evaluated several different features, which were, in the end, not chosen to compose the final feature set.”) [Examiner’s remarks: the blocks are partitioned based on type, including for example, the type being a store.]; and - generating a control flow split node feature of the plurality of control flow split node features based on a number of nodes in each partition of the plurality of partitions (Page 3, “A program is formed by basic blocks, which are arranged in a control flow graph (CFG). A basic block is a sequence of non-branch instructions, except for the last one; thus, if we disregard preemption at the OS level, these instructions will be executed in sequence. The instructions in a basic block are naturally placed together in memory. A CFG is a directed graph where each vertex is a basic block”; Table 2, ID 22, “NUM_STORES: number of store instructions in the basic block containing the branch”; Table 2, ID 26, “BASIC BLOCK SIZE: number of instructions in the basic block containing the branch”; Table 2, ID 27, “NUM_BASIC_BLOCKS: number of basic blocks in the function containing the branch”; Page 8, “Tables 2, 3 and 4 show the new set of features that we use to extend Calder et al.’s work. The motivation for these inclusions is empirical, meaning that we have evaluated several different features, which were, in the end, not chosen to compose the final feature set.”) [Examiner’s remarks: The feature may be, for example, a count of the type store]. Regarding claim 17, the rejection of claim 1 is incorporated; and Moreira further discloses: - generating a training … of training source code of a training program (Page 16, “To train our model, we used the eleven programs in SPEC CINT2006 plus 226 programs from the LLVM test-suite, for which we collected execution profiles using instrumentation. We have used 80% of the branches from these programs to train the prediction model”); - extracting a training plurality of control flow split node features of a training control flow split node in the training IR graph (Page 18, “Therefore, we trained two versions of the Decision Tree classifier proposed in the original paper: one using Calder et al.’s static features, and another using our extended set of program features. We then compared their branch prediction performance on the test set of programs. Since VESPA uses regression, we also trained two versions of a Decision Tree regressor with the same sets of features”) [Examiner’s remarks: A plurality of features is extracted during training form the branches.]; - processing, by the regression machine learning model, the training plurality of control flow split node features to generate a predicted branch frequency value of a second branch from the training control flow split node (Page 6, “The first, “Training”, consists in building the model that approximates the behavior of programs”; Page 19, “A metric typically used in the evaluation of regression models is the distance (error) between the observed and the predicted values. One such metric is the Root-Mean-Square Error (RMSE), which is the square root of the average of squared errors. The values used to compute this metric is the error measured in comparison with the probability of each branch being taken during dynamic execution. The lower is the error, the more accurate is the static predictor, as compared to a dynamic profiler. Table 7 summarizes the performance of these regressors. The regressor trained with VESPA’s feature set achieves lower prediction errors for all binaries in our evaluation set. This result shows that improvements due to our selection of features remain positive in decision trees.”) [Examiner’s Remarks: The model is run using the feature set to generate a predicted branch frequency (static prediction).]; - adding the predicted branch frequency value to a predicted profile for the training program (Page 19, “A metric typically used in the evaluation of regression models is the distance (error) between the observed and the predicted values. One such metric is the Root-Mean-Square Error (RMSE), which is the square root of the average of squared errors. The values used to compute this metric is the error measured in comparison with the probability of each branch being taken during dynamic execution. The lower is the error, the more accurate is the static predictor, as compared to a dynamic profiler. Table 7 summarizes the performance of these regressors. The regressor trained with VESPA’s feature set achieves lower prediction errors for all binaries in our evaluation set. This result shows that improvements due to our selection of features remain positive in decision trees.”) [Examiner’s remarks: A predicted branch frequency is added to the predicted profile (static predictor) to compare with the instrumented code during training.]; - instrumenting the training program to obtain an instrumented program (Page 16, “Benchmarks Used in the Development of the Model: To train our model, we used the eleven programs in SPEC CINT2006 plus 226 programs from the LLVM test-suite, for which we collected execution profiles using instrumentation”); - executing the instrumented program to generate a dynamic profile of the instrumented program (Page 16, “Benchmarks Used in the Development of the Model: To train our model, we used the eleven programs in SPEC CINT2006 plus 226 programs from the LLVM test-suite, for which we collected execution profiles using instrumentation”; Page 19, “A metric typically used in the evaluation of regression models is the distance (error) between the observed and the predicted values. One such metric is the Root-Mean-Square Error (RMSE), which is the square root of the average of squared errors. The values used to compute this metric is the error measured in comparison with the probability of each branch being taken during dynamic execution. The lower is the error, the more accurate is the static predictor, as compared to a dynamic profiler”) [Examiner’s remarks: An instrumented program is run in order to collect dynamic profile data for comparison with the static predictor.]; - calculating a loss between the dynamic profile and the predicted profile (Page 19, “A metric typically used in the evaluation of regression models is the distance (error) between the observed and the predicted values. One such metric is the Root-Mean-Square Error (RMSE), which is the square root of the average of squared errors. The values used to compute this metric is the error measured in comparison with the probability of each branch being taken during dynamic execution. The lower is the error, the more accurate is the static predictor, as compared to a dynamic profiler”); and - updating the regression machine learning model according to the loss (Page 19, “We trained two versions of a Decision-Tree based regressor: one using Calder et al.’s original features, and another using our proposed feature set. Their performances have been evaluated on the four binaries that we use for validation. A metric typically used in the evaluation of regression models is the distance (error) between the observed and the predicted values. One such metric is the Root-Mean-Square Error (RMSE), which is the square root of the average of squared errors. The values used to compute this metric is the error measured in comparison with the probability of each branch being taken during dynamic execution. The lower is the error, the more accurate is the static predictor, as compared to a dynamic profiler”). Moreira does not explicitly disclose: - IR graph However, Leopoldseder discloses: - IR graph (Pages 129, “Graal IR is a sea-of-nodes-based [9] directed graph in SSA form [10]. Each IR node produces at most one value. Data flow is represented with upward edges and control flow with downward edges. The IR is a superposition of the control-flow and the data-flow graph”) Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Leopoldseder into the teachings of Moreira to include “IR graph”. As stated in Leopoldseder, “We present a duplication optimization cost model that tries to maximize peak performance while minimizing code size and compilation time” (Page 127). Leopoldseder discloses using information to achieve program optimization statically. This optimization minimizes code while maximizing time saved. Usage of graphs allows for easy visualization of the flow of a program for analysis. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with control flow graphs. Regarding claim 18, the rejection of claim 17 is incorporated; and Moreira further discloses: - calculating a difference between an execution frequency for the control flow split node obtained from dynamic profile and the predicted branch frequency value for the control flow split node (Page 19, “A metric typically used in the evaluation of regression models is the distance (error) between the observed and the predicted values. One such metric is the Root-Mean-Square Error (RMSE), which is the square root of the average of squared errors. The values used to compute this metric is the error measured in comparison with the probability of each branch being taken during dynamic execution. The lower is the error, the more accurate is the static predictor, as compared to a dynamic profiler”) [Examiner’s remarks: The loss is calculated between probability during dynamic execution (dynamic profile) and the predicted value using the static predictor.]. Claim 19 is a system claim corresponding to the method claim hereinabove (claim 1). Therefore, claim 19 is rejected for the same reasons as set forth in the rejection of claim 1. Claim 20 is a non-transitory computer readable medium claim corresponding to the method claim hereinabove (claim 1). Therefore, claim 20 is rejected for the same reasons as set forth in the rejection of claim 1. Claims 7 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over “VESPA: Static Profiling for Binary Optimization” by Moreira et. al, (hereinafter “Moreira”), in view of “Dominance-based Duplication Simulation (DBDS): Code Duplication to Enable Compiler Optimizations” by Leopoldseder et. al, (hereinafter “Leopoldseder”), further in view of “Profile Guided Optimization Without Profiles: A Machine Learning Approach” by Rotem and Cummins (hereinafter “Rotem”). Regarding claim 7, the rejection of claim 1 is incorporated; and Moreira further discloses: - extracting a first set of features from the [branch] (Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 1, “OPCODE: the opcode of the branch instruction”); - extracting a second set of features from a hosting block of a control flow graph, wherein the hosting block comprises the [branch], …(Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 6, “LOOP_HEADER: whether the basic block containing the branch is a loop header”); … - combining the first set of features, the second set of features, and the third set of features into the plurality of [branch] features (Page 6, “Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function”). Moreira does not explicitly disclose: - control flow split node - … and wherein the control flow graph is generated from the IR graph; and However, Leopoldseder discloses: - control flow split node (Page 132, “Additionally, we use probability information at each control-flow split to determine relative basic block execution frequencies. The probability information is generated from HotSpot’s [26] profiling information”) [Examiner’s remarks: Leopoldseder discloses extracting profiling information from a control flow split node to determine frequencies. Control flow split nodes are equivalent to branches in the program. Therefore, gathering features at a branch as described by Moreira may be combined with the information gathering at control-flow splits of Leopoldseder.] - … and wherein the control flow graph is generated from the IR graph (Pages 129, “Graal IR is a sea-of-nodes-based [9] directed graph in SSA form [10]. Each IR node produces at most one value. Data flow is represented with upward edges and control flow with downward edges. The IR is a superposition of the control-flow and the data-flow graph”); and Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Leopoldseder into the teachings of Moreira to include “control flow split node” and “and wherein the control flow graph is generated from the IR graph”. As stated in Leopoldseder, “We present a duplication optimization cost model that tries to maximize peak performance while minimizing code size and compilation time” (Page 127). Leopoldseder discloses using information to achieve program optimization statically. This optimization minimizes code while maximizing time saved. Usage of graphs allows for easy visualization of the flow of a program for analysis. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with control flow graphs. The combination of Moreira and Leopoldseder does not explicitly disclose: - extracting a third set of features from at least one predecessor block of the hosting block; However, Rotem discloses: - extracting a third set of features from at least one predecessor block of the hosting block (Page 2, “Just like LLVM’s BPI analysis we collects many features using C++ code. We ask questions such as, "how many instructions are in each one of the basic blocks that the branch points to", and "does the right destination block dominate the branch". We also collect information about the presence of certain instructions such as return and exception handling instructions. We also record information about the loop nest, information about the parent block and position within the function”); and Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Rotem into the combined teachings of Moreira and Leopoldseder to include “extracting a third set of features from at least one predecessor block of the hosting block”. As stated in Rotem, “We develop a novel statistical approach for predicting branch probabilities from static program features without access to profile information” (Page 2). Rotem discloses predicting branch frequencies using static information without profiling. Use of information including predecessor block information allows for prediction of how code will flow at branches. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with predecessor block information as a feature. Regarding claim 9, the rejection of claim 1 is incorporated; and Moreira further discloses: - extracting a first set of features from the [branch] (Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 1, “OPCODE: the opcode of the branch instruction”); - extracting a second set of features from a hosting block of a control flow graph, wherein the hosting block comprises the [branch], … (Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 6, “LOOP_HEADER: whether the basic block containing the branch is a loop header”); … - extracting a fourth set of features from at least one successor block of the hosting block (Page 7, “On The Choice of Program Features. The original work of Calder et al. defined 30 static program features. We could reuse 21 of them, which are listed in Table 1”; Table 1 contains feature with ID 12, “TS_LOOP_HEADER: whether the taken successor basic block is a loop header”); and - combining the first set of features, the second set of features, the third set of features, and the fourth set of features into the plurality of [branch] features (Page 6, “Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function”). Moreira does not explicitly disclose: - control flow split node - … and wherein the control flow graph is generated from the IR graph; and However, Leopoldseder discloses: - control flow split node (Page 132, “Additionally, we use probability information at each control-flow split to determine relative basic block execution frequencies. The probability information is generated from HotSpot’s [26] profiling information”) [Examiner’s remarks: Leopoldseder discloses extracting profiling information from a control flow split node to determine frequencies. Control flow split nodes are equivalent to branches in the program. Therefore, gathering features at a branch as described by Moreira may be combined with the information gathering at control-flow splits of Leopoldseder.] - … and wherein the control flow graph is generated from the IR graph (Pages 129, “Graal IR is a sea-of-nodes-based [9] directed graph in SSA form [10]. Each IR node produces at most one value. Data flow is represented with upward edges and control flow with downward edges. The IR is a superposition of the control-flow and the data-flow graph”); and Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Leopoldseder into the teachings of Moreira to include “control flow split node” and “and wherein the control flow graph is generated from the IR graph”. As stated in Leopoldseder, “We present a duplication optimization cost model that tries to maximize peak performance while minimizing code size and compilation time” (Page 127). Leopoldseder discloses using information to achieve program optimization statically. This optimization minimizes code while maximizing time saved. Usage of graphs allows for easy visualization of the flow of a program for analysis. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with control flow graphs. The combination of Moreira and Leopoldseder does not explicitly disclose: - extracting a third set of features from at least one predecessor block of the hosting block; and However, Rotem discloses: - extracting a third set of features from at least one predecessor block of the hosting block (Page 2, “Just like LLVM’s BPI analysis we collects many features using C++ code. We ask questions such as, "how many instructions are in each one of the basic blocks that the branch points to", and "does the right destination block dominate the branch". We also collect information about the presence of certain instructions such as return and exception handling instructions. We also record information about the loop nest, information about the parent block and position within the function”); and Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Rotem into the combined teachings of Moreira and Leopoldseder to include “extracting a third set of features from at least one predecessor block of the hosting block”. As stated in Rotem, “We develop a novel statistical approach for predicting branch probabilities from static program features without access to profile information” (Page 2). Rotem discloses predicting branch frequencies using static information without profiling. Use of information including predecessor block information allows for prediction of how code will flow at branches. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with predecessor block information as a feature. Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over “VESPA: Static Profiling for Binary Optimization” by Moreira et. al, (hereinafter “Moreira”), in view of “Dominance-based Duplication Simulation (DBDS): Code Duplication to Enable Compiler Optimizations” by Leopoldseder et. al, (hereinafter “Leopoldseder”), further in view of “Compilation Forking: A Fast and Flexible Way of Generating Data for Compiler-Internal Machine Learning Tasks” by Mosaner et. al, (hereinafter “Mosaner”). Regarding claim 16, the rejection of claim 1 is incorporated; and Moreira further discloses: - adding the [feature] as a control flow split node feature of the plurality of control flow split node features (Page 6, “Figure 4 illustrates how prediction works for a given branch in the target program. Prediction starts with feature extraction. Features are mined from the target program syntax. In this example, we assume a set of four features (whose descriptions appear in Table 1). Three of them assume binary values; the other -CMP_OPCODE- ranges over a category of values. Once features are extracted, they are arranged into a feature vector, which Figure 4(c) shows. Said vector is passed to a “prediction” function. The value that results from applying this function onto the feature vector is the probability that a branch is taken. Figure 4(d) shows a very simple linear predictor. The goal of the training phase is to build an accurate prediction function. The models that we explore in Section 3.5 are non-linear”). The combination of Moreira and Leopoldseder does not explicitly disclose: - counting floating nodes in a set of nodes in a block of a control flow graph to obtain a count, wherein the control flow graph is generated from the IR graph; and - the count However, Mosaner discloses: - counting floating nodes in a set of nodes in a block of a control flow graph to obtain a count, wherein the control flow graph is generated from the IR graph (Page 1, “We implemented compilation forking in the GraalVM compiler in a programming-language-agnostic way. To assess the quality of the generated data, we trained several machine learning models to replace compiler heuristics for loop-related optimizations”; Page 10, “Nodes in the graph can either have a fixed position in the control flow (fixed nodes) or can be executed anywhere as long as data dependencies are met (floating nodes)”; Page 15, Table 1 under Category Nodes contains “#floatingNodes”); and - the count (Page 1, “We implemented compilation forking in the GraalVM compiler in a programming-language-agnostic way. To assess the quality of the generated data, we trained several machine learning models to replace compiler heuristics for loop-related optimizations”; Page 10, “Nodes in the graph can either have a fixed position in the control flow (fixed nodes) or can be executed anywhere as long as data dependencies are met (floating nodes)”; Page 15, Table 1 under Category Nodes contains “#floatingNodes”) [Examiner’s remarks: A count may be treated as a gathered feature with numerical value and inserted into the machine learning training as is done with all other features.]. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teachings of Mosaner into the combined teachings of Moreira and Leopoldseder to include “counting floating nodes in a set of nodes in a block of a control flow graph to obtain a count, wherein the control flow graph is generated from the IR graph” and “the count”. As stated in Mosaner, “The trained models perform equally well to the highly-tuned compiler heuristics when comparing the geometric means of benchmark suite performances” (Page 1). Mosaner discloses a model for profiling which uses number of floating nodes as a feature for prediction. Knowing the nodes which may be re-ordered in compilation can aid in optimization by helping with reordering. Therefore, it would be obvious to one of ordinary skill in the art to combine program optimization with floating nodes as a feature. Pertinent Prior Art The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. - “CrystalBall: Statically Analyzing Runtime Behavior via Deep Sequence Learning” by Zekany et. al, describes static analysis of a program for branch prediction using an intermediate representation. - “Evidence-based Static Branch Prediction using Machine Learning” by Calder et. al discloses using regression models to perform static branch prediction using program features. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to VIVIAN WEIJIA DUAN whose telephone number is (703)756-5442. The examiner can normally be reached Monday-Friday 8:30AM-5PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wei Y Mui can be reached at (571) 272-3708. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /V.W.D./Examiner, Art Unit 2191 /WEI Y MUI/Supervisory Patent Examiner, Art Unit 2191
Read full office action

Prosecution Timeline

Jan 29, 2024
Application Filed
Apr 02, 2026
Non-Final Rejection — §101, §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12541357
Operating System Upgrading Method, Electronic Device, Storage Medium, and Chip System
2y 5m to grant Granted Feb 03, 2026
Patent 12536005
TRANSFORMING A JAVA PROGRAM USING A SYMBOLIC DESCRIPTION LANGUAGE MODEL
2y 5m to grant Granted Jan 27, 2026
Patent 12498914
ORCHESTRATION OF SOFTWARE RELEASES ON A CLOUD PLATFORM
2y 5m to grant Granted Dec 16, 2025
Patent 12481483
AUTOMATED GENERATION OF WEB APPLICATIONS BASED ON WIREFRAME METADATA GENERATED FROM USER REQUIREMENTS
2y 5m to grant Granted Nov 25, 2025
Patent 12474910
MULTI-VARIANT IMAGE CONTAINER WITH OPTIONAL TAGGING
2y 5m to grant Granted Nov 18, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

1-2
Expected OA Rounds
70%
Grant Probability
99%
With Interview (+52.4%)
2y 9m
Median Time to Grant
Low
PTA Risk
Based on 10 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month