Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed August 5, 2025 have been fully considered but they are not persuasive.
In page 1 of the remarks, Applicant states that claim 3 has been amended to remove the use of the word “frequently”, and requests withdrawal of the rejection under 35 U.S.C. 112(b) (“112(b)”). Examiner states that claim 3’s rejection for use of the term “frequently” has been withdrawn as the term is no longer used in the claim.
In page 2 of the remarks, Applicant states that independent claims 1 and 15 were rejected under 35 U.S.C. 103 as being unpatentable over Smith et al. (US 20200097389), hereinafter “Smith389”, in view of Barr Group (“How and When to Use C’s assert() Macro”), hereinafter “Barr Group”, stated in page 1 of the remarks, and states that neither reference suggests, alone or in combination, teaches “calculate a method usage index for a method used in the at least one expression […]”, which has been added in the amendments to the independent claims 1 and 15. Furthermore, Applicant states that with respect to claim 3, Smith389’s disclosure of extracting function names from a portion of code as a feature used by a model to predict whether a portion of code has an error, and is cited in paragraph [0055] of Smith389 as a claimed method usage index is based on usages of certain code elements in the codebase, not based on feature existing in the portion of code suspecting of having an error.
Applicant’s arguments with respect to claims 1 and 15 have been considered but are moot because the new ground of rejection (in particular, the “calculate a method usage index for a method used in the at least one expression […]” from dependent claim 4) does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Furthermore, the limitation of “determine that the at least one expression is not a false positive when a number of occurrences of the at least one expression […] exceeds the first threshold” is disclosed in the reference of Smith389, with paragraph [0055] of Smith389 describing “the error prediction may comprise a probability that an error will occur and an identification of the type of the likely error”, with the error prediction probability exceeding a threshold indicating an error being present in a code expression, corresponding to a code expression not being a false positive.
Next, in pages 2-3 of the remarks, Applicant states that neither Smith389 nor Barr Group suggest or teach, as the limitation of “determine that the at least one expression is a false positive when a number of occurrences […] exceeds a first threshold” as amended in independent claims 1 and 15, with the Applicant stating that the Office states that Smith389 teaches “determine that the at least one expression has the software vulnerability […] exceeds a first threshold” using paragraph [0055] of Smith389 to teach the limitation. Applicant states that the amended claimed “number of occurrences of the at least one expression in the codebase exceeding a threshold” differs from Smith389’s error prediction probability.
Examiner disagrees, as stated earlier, the limitation of “determine that the at least one expression is not a false positive when a number of occurrences of the at least one expression […] exceeds the first threshold” is disclosed in the reference of Smith389, with paragraph [0055] of Smith389 describing “the error prediction may comprise a probability that an error will occur and an identification of the type of the likely error”, with the error prediction probability exceeding a threshold indicating an error being present in a code expression, corresponding to a code expression not being a false positive. Examiner further states that the limitation of “number of occurrences of the at least one expression in the codebase” is, in fact, disclosed by Smith389’s error prediction probability in paragraph [0055], where the code portion is pre-processed to extract features, including a value in a vector at each position where the number of times the token or name of the code portion occurs in the code is utilized. Paragraph [0053] of Smith389 further expands that the number of times a corresponding word occurs in step 602 of Fig. 6, which is then followed in step 603 to generate an error prediction, which predicts the error being in the code based on the number of times the token of the code portion appears in the code.
Next, in pages 4-5 of the remarks, Applicant states that neither Smith389 nor Barr Group suggest or teach, “search the plurality of files of the codebase for occurrences of the at least one expression outside of the error-checking marcos”, and states that Smith389’s rule-based heuristics search a communication channel for a common error string in paragraph [0051] teaches this limitation, and further states that paragraph [0054] of Smith389 where “some or all of the codebase that contains the portion of the code […] codebase as a whole may be used in the prediction process”, is vague. In particular, it fails to indicate how the whole codebase is used and as such fails to teach the above-cited claim element of “search[ing] the plurality of files of the codebase […]”, and the prediction fails to teach searching for a code element, such as “occurrences of the at least one expression outside of the error-checking macros”. Applicant further states that one of ordinary skill in the art would not be motivated to combine the macros of Barr Group into Smith389 as it would have no reasonable expectation of success and the combination would not yield nothing more than predictable results. Smith389 teaching prediction of whether a portion of code contains an error by extracting features from the portion of code, with features being inputted into a machine learning model (“ML model”) that predicts the existence and type of error (Smith389 [0007], [0055]). Finally, Applicant states that utilization of the error-checking macros of Barr Group into the system of Smith389, and that the Office Action does not indicate how the macros would be incorporated into the system of Smith389, and states that claims 1 and 15 should be allowable, along with claims that depend on claims 1 and 15.
Examiner disagrees, as the reference of Smith389 discloses the limitation of “search the plurality of files of the codebase for occurrences […]”, as in paragraph [0051] of Smith389, the communication channels are monitored, where a stream that includes standard outputs and errors can also include a local log file, remote server log files, thread dumps, or application profiler, for instance, can each include potential error events that the code portion being determined if it is an error. Furthermore, paragraph [0054] of Smith389’s statement regarding the use of the codebase for the prediction process in Fig. 6, which is utilized for searching of the code portion. This is expanded in paragraph [0054], with the codebase being used as an input to the error prediction system 344, which can predict one or more types of errors that can occur, and be used to predict a location in the code portion that causes an error. Paragraph [0053] of Smith389 further expands that the number of times a corresponding word occurs in step 602 of Fig. 6, which is then followed in step 603 to generate an error prediction, which predicts the error being in the code based on the number of times the token of the code portion appears in the code. Finally, the error checking macros are not suggested by Smith389, but the reference of Barr Group teaches the limitation of “extracting a first plurality of expressions used as arguments in error-checking macros […]”, and states that the “assert() macro checks expressions are true as long as the program runs correctly”, and if there is a fault in the expression, the assert() function returns false, and is used as a sanity check, as stated in page 3 of Barr Group, attached document.
In pages 6-7 of the remarks, Applicant states that claim 8 has been rejected under 35 U.S.C. 103 as being unpatentable over Smith389 in view of Barr Group and Alt, and that none of the references teach the limitations of independent claim 8, in particular, Smith389 and Barr Group fail to teach “extracting a first plurality of expressions […]”, “extracting a second plurality of expressions […]”, and “forming a fine-tuning dataset including the first plurality […] and the second plurality of expressions […]”. Applicant states that Smith389’s disclosure of extracting feature and Barr Group’s disclosure of assert macros teaches “extracting a first plurality of expressions used as arguments in error-checking macros […]”, with Smith389’s teaching in paragraph [0007] is for a model to predict whether the portion of code contains an error, while Barr Group merely shows an expression in an assert macro. Applicant states that Smith389’s disclosure of extraction is not from a plurality of sources code files, but rather from a code portion, and further states that it fails to teach “extracting a second plurality of expressions […]”, where Smith389’s paragraph [0007] pertains to extracting features from source code to predict whether the source code contains an error. Furthermore, the combination of Smith389 and Alt fails to teach “obtaining a pre-trained neutral classifier model trained on source code snippets […]”. Applicant then recites paragraph [0038] of the Specification discussing neural classifier models, with Alt’s GPT model being a transformer decoder differing from a neural classifier model, being trained on natural language text, not on source code snippets. Applicant states that the combination of Alt’s GPT model and Smith389’s training fails to teach the limitation of “fine-tuning the pre-trained neural classifier model […]”, and requests withdrawal of the rejection for independent claim 8 and its dependent claims.
Examiner disagrees with the Applicant regarding the argument of claim 8, as the limitations of “extracting a first plurality of expressed used as arguments […]” is suggested in Smith389’s paragraph [0007], that describes that the source code’s plurality of features, such as an expressions, can be extracted, and Smith389’s paragraph [0030] describing how source code is stored in code storage, comprising a codebase, “extracting a second plurality […]” is also suggested by Smith389’s paragraph [0007], which is further expanded in that the source code file may contain hundreds of thousands of various expressions, which the other hundreds or thousands of various expressions that are not part of the source code’s expressions are not part of the error text and error context that is stated in paragraph [0006] of Smith389. Smith389 does not suggest, but the reference of Barr Group teaches the limitation of “extracting a first plurality of expressions used as arguments in error-checking macros […]”, and states that the “assert() macro checks expressions are true as long as the program runs correctly”, and if there is a fault in the expression, the assert() function returns false, and is used as a sanity check, as stated in page 3 of Barr Group, attached document. When the extraction of “first plurality” and “second plurality” is combined with the assert() function as the error checking macro, and the combination is made to insert a sanity check when writing code, and lessen the time and effort required to identify potential software vulnerabilities and errors, as stated in page 3 of Barr Group, which is an attached document, and Smith389’s extraction is from source code files that contain the code portion. Furthermore, the combination of Smith389 and Alt et al (“Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction”), hereinafter Alt, teaches the combination for “obtaining a pre-trained neural classifier model trained on source code snippets […]”, as the combination, as the “GPT (Generated Pre-Trained Transformer) are classified as transformer neural networks, using self-attention mechanisms, as understood by one of ordinary skill in the art” in the claimed limitation rejection by Alt, and in combination with Smith389’s training of an ML model trained on source code snippets and labels indicative of errors, in Smith389 [0007], teaches a pre-trained neural classifier model trained on source code snippets. Alt further goes into this as the fine-tuning of a pre-trained model is via a predetermined dataset, and Smith389 is also used as the model is trained to detect the existence of errors, such as vulnerabilities, as per the Abstract of Smith389. The motivation for combining the references is to integrate the neural classifiers of Alt, which includes the GPT models described in Alt, to the system of Smith389 that extracts features of expressions in source code, and Barr Group’s assert() function for integrating sanity checks in the code, to describe fine-tuning a pre-trained model, as per Alt, Abstract.
Claim Rejections - 35 USC § 112(a)
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
Claims 1-7, and 15-20 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
In particular, claim 1 recites the terms of “first threshold” and “second threshold” as described in independent claim 1 was not previously described in the Specification, and in the context of the “determine that the at least one expression is not a false positive when a number of occurrences […] exceeds a first threshold and the method usage index […] exceeds a second threshold”, although Fig. 6 of the Specification of the Applicant describes process 600, in particular, blocks 614 and 618 stating “number of usages of the expression […] less than a usage threshold?” and “is the method usage index less than a threshold?”, respectively, paragraphs [00094]-[00096] only describe “a threshold”/”the threshold”, which is only one threshold being compared, not a first and second thresholds. As a result, the use of the terms “first threshold” and “second threshold” constitute new matter, as they were not previously in the Specification or the claims of the Applicant. Furthermore, the “usage threshold” described in block 614 of Fig. 6 is not described anywhere in the Specification. As a result, claim 1 is rejected for lack of written description regarding the term “usage threshold” in the claims not described in the Specification.
Dependent claims 2-7 inherit the rejections of their independent claims, and as a result, are also rejected for the same reasons as independent claim 1 above.
Claim 15 is an apparatus claim that is drawn to the use of the system described in claim 1. Therefore, the apparatus claim 15 corresponds to system claim 1, and is likewise rejected for the same reasons of obviousness as stated above.
Dependent claims 16-20 inherit the rejections of their independent claims, and as a result, are also rejected for the same reasons as independent claim 15 above.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 1–4, 6, 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Smith et al (US 2020/0097389 A1), hereinafter Smith 389, in view of Barr Group (“How and When to Use C’s assert() Macro”, 2001), and Smith et al (US 2020/0117446 A1), hereinafter Smith 446.
Regarding Claim 1, Smith 389 discloses “A system comprising: one or more processors; and a memory that stores one or more programs that are configured to be executed by the one or more processors, the one or more programs including instructions that perform acts to: receive a source code file having at least one expression” (Smith 389: Figure 11 depicts an exemplary system to implement the methods taught in the reference, including memory (reference character 1106) and processor (reference character 1102). [0007] describes how source code may be provided, where a plurality of features are extracted, such as expressions used in said source code);
“wherein the source code file is associated with a codebase” (Smith 389: [0030] describes how the source code may be stored in some form of code storage, which may comprise a codebase);
“wherein the codebase includes a plurality of files” (Smith 389: [0030] describes how the aforementioned codebase may comprise a code repository, which keeps track of all changes and versions of all code files in the repository);
“cause a neural classifier given the at least one expression, to determine that the at least one expression has a software vulnerability” (Smith 389: [0037] describes how a neural network may be used as a machine learning model for code analysis. [0044] (and Figure 2B) show the use of said machine learning model (i.e., neural classifier) to perform inference on an input (i.e., source code containing expression(s)). The machine learning model generates some output which comprises information such as predicted errors (i.e., vulnerabilities), predicted fix, or other data);
“wherein the neural classifier is trained to identify the software vulnerability from sample expressions used as arguments” (Smith 389: [0036], which recites “Machine learning model 200 may be, for example, a neural network…”, and [0051], which recites “For example, a RNN, CNN, or other machine learning algorithm capable of reading sequence data may be applied to the communication channels and trained to identify messages that indicate errors”; [0065], which states “The training data generator may create additional training examples comprising corresponding sets of code samples with errors, additional error information, and corrected code”. As understood by one of ordinary skill in the art, a machine learning model (such as a neural classifier) ingests training data and uses it to perform an action, such as identifying or classifying error-riddled code);
“determine that the at least one expression is not a false positive when a number of occurrences of the at least one expression in the codebase exceeds a first threshold” (Smith 389: [0055] describes how no error may be predicted if the error probability falls below a certain threshold value. This implies that an error will be predicted if the probability exceeds said threshold value. [0055] “the error prediction may comprise a probability that an error will occur and an identification of the type of the likely error”. This error prediction is based on the input of code (i.e., “[t]he code portion may comprise a few lines of code, a class or method definition, an entire file, an entire project of multiple related files, or other code portions”, [0055]), which would include the code expression in question that may contain the error. The error prediction probability is based upon on the input of these code expressions, and corresponds to a first threshold of at least one expression not being a false positive. Presence of an error in a code expression corresponds to an expression not being a false positive.)
“search the plurality of files of the codebase for occurrences of the at least one expression outside of the error-checking macros” (Smith 389: [0051] describes how rule-based heuristics may be used in the machine learning model to evaluate error status. Source code is searched for keywords (i.e., keywords of expressions) or evaluating some other component of the source code to determine error occurrences and error context. [0051] additionally states that communication channels are monitored for potential error events, where these may comprise “a stream such as standard output or standard error, a local log file or remote server log file, a mechanism for viewing program execution state such as an application profiler, thread dump, heap dump, or stack trace, or another communication channel”);
“assemble a plurality of repair code candidates from occurrences of the at least one expression found in the codebase” (Smith 389: [0079] describes how predicted fixes are assembled and presented to the user);
“and output the plurality of repair code candidates as suggestions to fix the potential software vulnerability” (Smith 389: [0076] describes how a user console is provided to the user, with errors and fixes to said errors are presented to the user. The code fixes are presented to a user in the form of a dialog box).
Smith 389 teaches the above subject matter content, but fails to expressly disclose an error-checking macro. However, analogous art from the same field of endeavor, Barr Group, teaches this: “The assert() macro is used to check expressions that ought to be true as long as the program is running correctly”. In the C programming language, the assert() macro is used to ensure whether a software expression, passed as an argument, is free of error (i.e., returns “true”). If there is a fault in the expression, then the assert() function returns a value of “false” (Barr Group, pg. 3 of attached document).
Therefore, based on Smith 389 in view of Barr Group, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of error-checking macros of Barr Group to the system of Smith 389 in order to provide a convenient, effective way to insert a “sanity check” when writing code (Barr Group, pg. 3 of attached document). By integrating these “sanity checks” into written code, it lessens the time and effort required when attempting to identify potential software vulnerabilities and errors.
Smith 389 and Barr Group fail to expressly disclose, but Smith 446 teaches the limitation of “calculate a method usage index for a method used in the at least one expression, wherein the method usage index is based on a number of uses of the method used in the at least one expression in the error-checking macros in the plurality of files of the codebase and a number of times the method used in the at least one expression is used in the codebase” (Smith 446: [0109] describes how a set of code entities may be returned to a user following some search. Usages of these code entities may be calls to a function (i.e., methods), and are returned to the user. These code snippets (containing the code entities) are indexed according to an indexing method, as described in [0112]. [0109], which states “the search results may further include usages of the code entities or links to usages of the code entities….in addition to returning a code entity as a responsive search result, a collections of various usages of that code entity including a snippet of context may also be returned as a part of the search results….usages may include calls to a function, use as a parameter, use in an expression or statement, use in an assignment, or other uses”. [0112] states “A database or corpus of code snippets is first indexed according to one or more indexing methods as described above. a user may then execute search queries against the database to search for code snippets according to a search query”. [0138] describes how the frequency or count of usages of a particular term (i.e., method/expression) is recorded for a corpus of code. These are scored based on their reputation or popularity of use and search. [0138] recites “code entities and code snippets are scored based on the number of occurrences or the context of occurrence in different codebases”. [0140] recites “a code entity or code snippet may be ranked according to the context provided by a local development environment. for example, code entities or code snippets may be scored based on appearing in a similar context in their codebase or usage context or may be scored based on having similar code to the programmer’s current development environment… a presence or absence of code entities or code snippets in a user’s local code repository may be used to rank search results”. The recitations cited above in Smith449 describe ranking the search results based on the usage of the specified method.).
and “the method usage index for the method associated with the at least one expression exceeds a second threshold” (Smith 446: [0118] Fig. 11, step 1104, Code snippet embeddings within threshold distance of a search query is selected as responsive to the search query. [0095] Fig. 7, similarity measure 705 measures distance (also known as similarity) between embeddings, and in combination with the search query of paragraph [0118] of Smith 446, which states that similarity of the search query and code snippets within a certain distance as considered as responsive to the search query, corresponding to at least one expression exceeding a second threshold, which is measured by similarity in Smith 446.).
Therefore, based on Smith 389 in view of Barr Group, and in further view of Smith 446, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of a method usage index of Smith 446 to the system of Smith 389 and Barr Group in order to provide access to code entities and use machine learning models to analyze these code entities (Smith 446, [0008]). By using specific algorithms for usage analysis, it provides an effective, quick way for a developer to identify potentially vulnerable functions.
Regarding Claim 2, Smith 389 further discloses “The system of claim 1, wherein the one or more programs include instructions that perform acts to: determine that the at least one expression is a false positive when a number of occurrences of the at least one expression in the codebase is less than the first threshold” (Smith 389: [0055] describes how no error may be predicted if the error probability falls below a certain threshold value, corresponding to at least one expression is a false positive when a number of occurrences is less than the first threshold. [0055] “the error prediction may comprise a probability that an error will occur and an identification of the type of the likely error”. This error prediction is based on the input of code (i.e., “[t]he code portion may comprise a few lines of code, a class or method definition, an entire file, an entire project of multiple related files, or other code portions”, [0055]), which would include the code expression in question that may contain the error. The error prediction probability is based upon on the input of these code expressions.).
Regarding Claim 3, Smith 389 further discloses “The system of claim 2, wherein the one or more programs include instructions that perform acts to: determine that the at least one expression is a false positive a number of occurrences of the at least one expression in the codebase exceeds the first threshold” (Smith 389: [0055] describes how no error may be predicted if the error probability falls below a certain threshold value. This implies that an error will be predicted if the probability exceeds said threshold value. The error prediction probability is based upon on the input of these code expressions, and corresponds to a first threshold of at least one expression not being a false positive. Presence of an error in a code expression corresponds to an expression not being a false positive.).
Smith389 in view of Barr Group does not appear to disclose, but Smith 446 teaches “and the method usage index is less than the second threshold” (Smith 446: [0118] Fig. 11, step 1104, Code snippet embeddings within threshold distance of a search query is selected as responsive to the search query. [0095] Fig. 7, similarity measure 705 measures distance (also known as similarity) between embeddings, and in combination with the search query of paragraph [0118] of Smith 446, which states that similarity of the search query and code snippets, corresponding to at least one expression exceeding a second threshold, which is measured by similarity in Smith 446. This implies that when similarity measure is not responsive to the search query in step 1104 of Fig. 11 in paragraph [0118], the method usage index is less than the second threshold, and the code snippet cannot be searched for.).
Therefore, based on Smith 389 in view of Barr Group, and in further view of Smith 446, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of a method usage index of Smith 446 to the system of Smith 389 and Barr Group in order to provide access to code entities and use machine learning models to analyze these code entities (Smith 446, [0008]). By using specific algorithms for usage analysis, it provides an effective, quick way for a developer to identify potentially vulnerable functions.
Regarding Claim 4, Smith 389 and Barr Group disclose the above subject matter content, but fail to expressly disclose “The system of claim 1, wherein the one or more programs include instructions that perform acts to: compute a method usage index for each method of each expression”. However, analogous art from the same field of endeavor, Smith 446, teaches this: [0109] describes how a set of code entities may be returned to a user following some search. Usages of these code entities may be calls to a function (i.e., methods), and are returned to the user. These code snippets (containing the code entities) are indexed according to an indexing method, as described in [0112]. [0109], which states “the search results may further include usages of the code entities or links to usages of the code entities….in addition to returning a code entity as a responsive search result, a collections of various usages of that code entity including a snippet of context may also be returned as a part of the search results….usages may include calls to a function, use as a parameter, use in an expression or statement, use in an assignment, or other uses”. [0112] states “A database or corpus of code snippets is first indexed according to one or more indexing methods as described above. a user may then execute search queries against the database to search for code snippets according to a search query”);
“wherein the method usage index is the ratio of a number of times a method is used in an error-checking signal over a number of times the method is used in the plurality of files in the code” ([0138] describes how the frequency or count of usages of a particular term (i.e., method/expression) is recorded for a corpus of code. These are scored based on their reputation or popularity of use and search. [0138] recites “code entities and code snippets are scored based on the number of occurrences or the context of occurrence in different codebases”. [0140] recites “a code entity or code snippet may be ranked according to the context provided by a local development environment. for example, code entities or code snippets may be scored based on appearing in a similar context in their codebase or usage context or may be scored based on having similar code to the programmer’s current development environment….a presence or absence of code entities or code snippets in a user’s local code repository may be used to rank search results”. The recitations cited above in Smith449 describe ranking the search results based on the usage of the specified method).
Therefore, based on Smith 389 in view of Barr Group, and in further view of Smith 446, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of a method usage index of Smith 446 to the system of Smith 389 and Barr Group in order to provide access to code entities and use machine learning models to analyze these code entities (Smith 446, [0008]). By using specific algorithms for usage analysis, it provides an effective, quick way for a developer to identify potentially vulnerable functions.
Regarding Claim 6, the combination of Smith 389 and Barr Group further discloses “The system of claim 1, wherein each of the plurality of repair code candidates includes an error-checking macro” (Smith 389: Para. 0069 describes how change sequence code (i.e., error-checking macro) may be determined from the machine learning model, where these sequences may be corrections to erroneous code; Barr Group: “The assert() macro is used to check expressions that ought to be true as long as the program is running correctly”. In the C programming language, the assert() macro is used to ensure whether a software expression, passed as an argument, is free of error (i.e., returns “true”). If there is a fault in the expression, then the assert() function returns a value of “false”).
Therefore, based on Smith 389 in view of Barr Group, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of error-checking macros of Barr Group to the system of Smith 389 in order to provide a convenient, effective way to insert a “sanity check” when writing code (Barr Group, pg. 3 of attached document). By integrating these “sanity checks” into written code, it lessens the time and effort required when attempting to identify potential software vulnerabilities and errors.
Regarding Claim 15, the claim is an apparatus claim that is drawn to the use of the system described in claim 1. Therefore, the apparatus claim 15 corresponds to system claim 1, and is likewise rejected for the same reasons of obviousness as stated above.
Regarding Claim 17, the combination of Smith 389 and Barr Group further discloses “The one or more hardware storage devices of claim 15 having stored thereon computer executable instructions that are structured to be executable by one or more processors of a computing device to thereby cause the computing device to perform actions that: output each of the plurality of repair code candidates to a source code editor” (Smith 389: [0076] describes how a user console is provided to the user, with errors and fixes to said errors are presented to the user. The code fixes are presented to a user in the form of a dialog box. [0116] describes how the code editor is displayed and monitored for user input and source code modifications).
Claim(s) 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Smith et al (US 2020/0097389 A1), hereinafter Smith 389, in view of in view of Barr Group (“How and When to Use C’s assert() Macro”, 2001), in further view of Smith et al (US 2020/0117446 A1), hereinafter Smith 446, in further view of Li (CN 111124478 A).
Regarding Claim 5, Smith 446 further discloses “The system of claim 1, wherein the one or more programs include instructions that perform acts to: rank the plurality of repair code candidates…” (Smith 446: [0137] – [0138] describe ranking search results of code entity embeddings (i.e., repair codes) to determine an order to present to the user. Several ranking factors may be taken into consideration).
The combination of Smith 389, Barr Group, and Smith 446 discloses the above subject content matter, but fails to expressly disclose “based on each repair code candidate closely matching a directory and file name of the source code program having the software vulnerability”. However, analogous art from the same field of endeavor, Li, teaches this: [0022] of the translated reference describes how a compiled file (i.e., repair code file) may match the same name and directory as a source code file (i.e., source code program with vulnerability), in the context of version management to determine whether the files are the same or not.
Therefore, based on Smith 389 in view of Barr Group, further in view of Smith 446, and further in view of Li, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of Li to the system of Smith 389, Barr Group, and Smith 446 in order to provide version management of compiled source code and previously existing source code (Li, Abstract). Version management is an important and widely-used mechanism for source code comparison and protection.
Regarding Claim 16, it is an apparatus claim that is drawn to the use of the system described in claim 5. Therefore, the apparatus claim 16 corresponds to system claim 5, and is likewise rejected for the same reasons of obviousness as stated above.
Claims 7 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Smith et al (US 2020/0097389 A1), hereinafter Smith 389, in view of Barr Group (“How and When to Use C’s assert() Macro”, 2001), in further view of Vaswani et al (“Attention Is All You Need”), hereinafter Vaswani.
Regarding Claim 7, the combination of Smith 389 and Barr Group discloses the above subject matter content, but fails to expressly disclose “The system of claim 1, wherein the neural classifier includes a neural encoder transformer with attention”. However, analogous art from the same field of endeavor, Vaswani, teaches this: The Abstract describes how an encoder and decoder mechanism of a transformer neural network is connected with an attention mechanism. Page 5, section 3.2.3 “Applications of Attention in our Model” describes the varieties of utilization of attention in the transformer model.
Therefore, based on Smith 389 in view of Barr Group, and in further view of Vaswani, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of Vaswani to the system of Smith 389 and Cobb in order to provide a model that relies solely on an attention mechanism for a robust neural network encoder-decoder transformer model (Vaswani, p. 2, Section 1 “Introduction”).
Regarding Claim 20, it is an apparatus claim that is drawn to the use of the system described in claim 7. Therefore, the apparatus claim 20 corresponds to system claim 7, and is likewise rejected for the same reasons of obviousness as stated above.
Claims 8 – 10, 12 – 14, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Smith et al (US 2020/0097389 A1), hereinafter Smith 389, in view of Barr Group (“How and When to Use C’s assert() Macro”, 2001), in further view of Alt et al (“Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction”), hereinafter Alt.
Regarding Claim 8, Smith 389 discloses “extracting a first plurality of expressions used as arguments…from a plurality of source code files of a codebase” (Smith 389: [0007] describes how source code may be provided, where a plurality of features are extracted, such as expressions used in said source code. [0030] describes how the source code may be stored in some form of code storage, which may comprise a codebase);
“extracting a second plurality of expressions used… from the plurality of source code files of the codebase” (Smith 389: [0007] describes how source code may be provided, where a plurality of features are extracted, such as expressions used in said source code. A source code file may contain hundreds or thousands of various expressions);
“forming a…dataset including the first plurality of expressions and the second plurality of expressions” (Smith 389: ”[0065], which states “The training data generator may create additional training examples comprising corresponding sets of code samples with errors, additional error information, and corrected code”);
“wherein each expression of the first plurality of expressions includes a first label indicating a software vulnerability” (Smith 389: [0053] describes how errors in the source code (i.e., vulnerabilities) may be labeled and classified by an error classification system, where the examples may contain an error indication, error content, and optionally an error type label for the error indication);
“wherein each expression of the second plurality of expressions includes a second label indicating no software vulnerability” (Smith 389: [0055] describes how some examples of source code, previously analyzed by the error classification system, may have a label indicating there was no error predicted in said source code piece);
“and deploying the…neural classifier model in a source code repair system to identify a software vulnerability in an input source code program of the codebase” ((Smith 389: [0076] describes how a user console is provided to the user, with errors and fixes to said errors are presented to the user. The code fixes are presented to a user in the form of a dialog box).
Smith 389 teaches the above subject matter content, but fails to expressly disclose an error-checking macro. However, analogous art from the same field of endeavor, Barr Group, teaches this: “The assert() macro is used to check expressions that ought to be true as long as the program is running correctly”. In the C programming language, the assert() macro is used to ensure whether a software expression, passed as an argument, is free of error (i.e., returns “true”). If there is a fault in the expression, then the assert() function returns a value of “false” (Barr Group, pg. 3 of attached document).
Therefore, based on Smith 389 in view of Barr Group, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of error-checking macros of Barr Group to the system of Smith 389 in order to provide a convenient, effective way to insert a “sanity check” when writing code (Barr Group, pg. 3 of attached document). By integrating these “sanity checks” into written code, it lessens the time and effort required when attempting to identify potential software vulnerabilities and errors.
The combination of Smith 389 and Barr Group discloses the above subject matter content, but fails to expressly disclose “obtaining a pre-trained neural classifier model” and fine-tuned neural classifiers. However, analogous art from the same field of endeavor, Alt, teaches this: the Abstract describes how a pre-trained language model (i.e., neural classifier model) is used and then fine-tuned; Alt’s GPT (Generative Pre-Trained Transformer) model is a neural network, as GPT models are classified as transformer neural networks that use self-attention mechanisms, as understood by one of ordinary skill in the art). Smith 389 teaches “trained on source code snippets of a programming language of the plurality of source code files of the codebase”: “[t]he features may be input into a machine learning model that has been trained on one or more labeled training examples, where the training examples comprise code portions and labels indicative of errors” (Smith389, [0007]). Combining the neural classifier model trained on source code snippets of Smith389 with the pre-trained GPT model of Alt produces a pre-trained neural classifier model trained on source code snippets;
“fine-tuning the pre-trained neural classifier model with the fine-tuning dataset to learn to predict whether an expression of a source code program contains a software vulnerability” (Alt: the Abstract describes how the pre-trained model is fine-tuned on a predetermined dataset (i.e., fine-tuning dataset). Smith 389: the model is trained to detect and predict the existence of errors (i.e., vulnerabilities) in source code, as described in the Abstract).
Therefore, based on Smith 389 in view of Barr Group, and in further view of Alt, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of pre- and fine-tuned neural classifiers of Alt to the system of Smith 389 and Barr Group in order to describe the act and process of fine-tuning a pre-trained model (Alt, Abstract).
Regarding Claim 9, Smith 389 further discloses “The computer-implemented method of claim 8, wherein the error-checking macros accept error codes or types that correspond to error codes and alter flow of a source code program based on a value of an error code” (Smith 389: [0057] describes how the execution of a program is monitored, where the monitoring system may detect a thrown error, exception, or corresponding error code. Some errors may be attributed to a breaking change, which can cause breaking of the program. The lack of an error code may indicate that the program is error-free, or a return code may be used to indicate the correct execution of the program). Barr Group: “if the expression passed to the macro [i.e., assert()] is false [i.e., an error], output an error message that includes the file name and line number, and then exit”. This teaches the use of an error code with the error-checking macro, as well as altering the flow of the source code program (i.e., exiting the program based on the value of the code).
Therefore, based on Smith 389 in view of Barr Group, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of error-checking macros of Barr Group to the system of Smith 389 in order to provide a convenient, effective way to insert a “sanity check” when writing code (Barr Group, pg. 3 of attached document). By integrating these “sanity checks” into written code, it lessens the time and effort required when attempting to identify potential software vulnerabilities and errors.
Regarding Claim 10, Smith 389 and Barr Group further discloses “The computer-implemented method of claim 8, wherein an error-checking macro invokes a second error-checking macro that accepts an error code or type that corresponds to an error code and alters flow of a source code program based on a value of an error code” (Barr Group discusses the use of multiple versions of the assert() macro. Smith 389 describes the monitoring of an execution of a program, where errors may be detected, as described above [i.e., “[0057] describes how the execution of a program is monitored, where the monitoring system may detect a thrown error, exception, or corresponding error code. Some errors may be attributed to a breaking change, which can cause breaking of the program. The lack of an error code may indicate that the program is error-free, or a return code may be used to indicate the correct execution of the program)”]).
Regarding Claim 12, Smith 389 and Alt further discloses “The computer-implemented method of claim 8, wherein the fine-tuned neural classifier model is deployed in an integrated development environment” (Smith 389: [0045] describes how the system (including neural network) is deployed in a programming environment, which allows interactive editing of the source code by the user. An editor is included, which can be an IDE. Alt discloses the use of the fine-tuned transformer decoder (i.e., neural classifier)).
Therefore, based on Smith 389 in view of Barr Group, and in further view of Alt, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of pre- and fine-tuned neural classifiers of Alt to the system of Smith 389 and Barr Group in order to describe the act and process of fine-tuning a pre-trained model (Alt, Abstract).
Regarding Claim 13, Alt further discloses “The computer-implemented method of claim 8, wherein the pre-trained neural classifier model includes a neural encoder transformer with attention” (Alt: The Abstract describes how a pre-trained GPT model is used; Alt’s GPT (Generative Pre-Trained Transformer” model is a neural network, as GPT models are classified as transformer neural networks that use self-attention mechanisms, as understood by one of ordinary skill in the art. Page 2, column 1 describes how the standard transformer architecture is extended by a selective attention mechanism).
Therefore, based on Smith 389 in view of Barr Group, and in further view of Alt, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of pre- and fine-tuned neural classifiers of Alt to the system of Smith 389 and Barr Group in order to describe the act and process of fine-tuning a pre-trained model (Alt, Abstract).
Regarding Claim 14, the combination of Smith 389, Cobb, and Alt further discloses “The computer-implemented method of claim 8, wherein the fine-tuned neural classifier model is a neural encoder transformer model with attention” (Alt: The pre-trained model, as described in the Abstract, is fine-tuned, as described in page 2, column 1. The particular pre-trained transformer model with attention is fine-tuned directly on its assigned task. Alt’s GPT (Generative Pre-Trained Transformer” model is a neural network, as GPT models are classified as transformer neural networks that use self-attention mechanisms, as understood by one of ordinary skill in the art).
Therefore, based on Smith 389 in view of Barr Group, and in further view of Alt, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of pre- and fine-tuned neural classifiers of Alt to the system of Smith 389 and Barr Group in order to describe the act and process of fine-tuning a pre-trained model (Alt, Abstract).
Regarding Claim 18, Smith 389 further discloses “The one or more hardware storage devices of claim 15 having stored thereon computer executable instructions that are structured to be executable by one or more processors of a computing device to thereby cause the computing device to perform actions that: eliminate the first expression as having the software vulnerability based on the first expression occurring less than a threshold number of occurrences in the codebase” (Smith 389: [0069] describes how a line of code, potentially containing an expression with an error, is analyzed for a relevance score. If this line of code falls below the threshold value for relevance, it is removed. [0064] describes how the machine learning model may operate on code portions in the codebase. This implies that the machine learning model may be able to identify the particular expression, and, using rule-based heuristics, determine the relevance score of said expression).
Regarding Claim 19, Smith 389 further discloses “The one or more hardware storage devices of claim 15 having stored thereon computer executable instructions that are structured to be executable by one or more processors of a computing device to thereby cause the computing device to perform actions that: eliminate the first expression as having the software vulnerability based on a method used in the first expression being invoked less than a threshold number of invocations in the codebase” (Smith 389: [0069] describes how a line of code, potentially containing an expression with an error, is analyzed for a relevance score. If this line of code falls below the threshold value for relevance, it is removed. [0064] describes how the machine learning model may operate on code portions in the codebase. This implies that the machine learning model may be able to identify the particular expression, and, using rule-based heuristics, determine the relevance score of said expression. An expression may be contained within a particular method of the code, as understood by one of ordinary skill in the art).
Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Smith et al (US 2020/0097389 A1), hereinafter Smith 389, in view of (“How and When to Use C’s assert() Macro”, 2001), in further view of Alt et al (“Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction”), hereinafter Alt, in further view of Hu et al (CN 114968809 A), hereinafter Hu.
Regarding Claim 11, the combination of Smith 389, Barr Group, and Alt discloses the above subject content matter, but fails to expressly disclose “The computer-implemented method of claim 8, wherein the fine-tuned neural classifier model is deployed in a version-controlled software hosting service”. However, analogous art from the same field of endeavor, Hu, teaches this: Page 4 of the translated document describes a version-controlled software hosting service, namely Gerrit, which is based on the Git version control system.
Therefore, based on Smith 389 in view of Barr Group, in further view of Alt, and in further view of Hu, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to utilize the teaching of Hu to the system of Smith 389, Barr Group, and Alt in order to provide a lightweight framework for reviewing each and every commit prior to code being committed to the repository (Hu, pages 4 – 5).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Khan et al. (US 20230237161 A1, “DETECTION OF AND PROTECTION AGAINST CROSS-SITE SCRIPTING VULNERABILITIES IN WEB APPLICATION CODE”)
Tripp et al. (US 11630919 B1, “Management Of Sensitive Data Using Static Code Analysis”)
Olson et al. (US 20210056211 A1, “SYSTEM AND METHOD FOR AUTOMATICALLY DETECTING A SECURITY VULNERABILITY IN A SOURCE CODE USING A MACHINE LEARNING MODEL”)
Ionescu et al. (US 20180349614 A1, “SYSTEM AND METHOD FOR APPLICATION SECURITY PROFILING”)
Groth et al. (US 20210192651 A1, “System & Method For Analyzing Privacy Policies”)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TOMMY MARTINEZ whose telephone number is (703)756-5651. The examiner can normally be reached Monday thru Friday 8AM-4PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jorge L. Ortiz-Criado can be reached at (571) 272-7624 on Monday thru Friday, 7AM-7PM ET. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/T.M./ Examiner, Art Unit 2496
/JORGE L ORTIZ CRIADO/ Supervisory Patent Examiner, Art Unit 2496