Last updated: May 29, 2026
Application No. 18/591,510
VALIDATION OF A SOFTWARE DOCUMENT

Non-Final OA §103
Filed
Feb 29, 2024
Examiner
RIVERA, ANIBAL
Art Unit
2192
Tech Center
2100 — Computer Architecture & Software
Assignee
International Business Machines Corporation
OA Round
1 (Non-Final)
Interview Optional

— +12.0% interview lift. Interview lift (+12.0%) is below the 15.0% threshold. A written response is recommended.
Based on 749 resolved cases, 2023–2026
Examiner Intelligence

RIVERA, ANIBAL View full profile →
Grants 91% — above average
Career Allowance Rate
680 granted / 749 resolved
+35.8% vs TC avg
Moderate +12% lift
Without
With
+12.0%
Interview Lift
resolved cases with interview
Typical timeline
2y 3m
Avg Prosecution
18 currently pending
Career history
771
Total Applications
across all art units
Statute-Specific Performance

§101
3.1%
-36.9% vs TC avg
§103
78.4%
+38.4% vs TC avg
§102
14.4%
-25.6% vs TC avg
§112
1.1%
-38.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 749 resolved cases
Office Action

§103
DETAILED ACTION
This action is responsive to application filed on February 29, 2024. 
Claims 1-20 are pending and are presented to examination.
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Examiner Notes
Examiner cites particular columns, paragraphs, figures and line numbers in the references as applied to the claims below for the convenience of the applicant. Although the specified citations are representative of the teachings in the art and are applied to the specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested that, in preparing responses, the applicant fully consider the references in their entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the examiner. 
Drawings 
The drawings filed on February 29, 2024 are acceptable for examination purposes.

Information Disclosure Statement
As required by M.P.E.P. 609, the applicant’s submission of the Information Disclosure Statement dated February 29, 2024 is acknowledged by the examiner and the cited references have been considered in the examination of the claims now pending. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

   	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-7, 10, 12-16 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Balasubramanian et al (US Pub. No. 2022/0276862 – hereinafter Balasubramanian) in view of Polk (US Pat. No. 7,100,150 – IDS 02/29/2024).
  	With respect to claim 1, Balasubramanian teaches a computer-implemented method (CIM), the CIM comprising:   	validating code blocks in a software document by performing a first validation process (See figures 3-5 (and related text), abstract and paragraphs [0017], [0028], [0058], [0070], [0078], “analyzing the open source software project code and the documentation, parsing the open source software project documentation into sections, validating the sections of the documentation with project or stack metrics, assessing a quality of sections of the documentation, assessing a quality of language of the documentation;”. Examiner notes: validation of code), wherein the first validation process includes:   		iterating through lines in the software document to identify the code   	blocks, extracting the identified code blocks (See paragraphs [0006], [0017], [0028] “…the open source software project documentation is parsed into sections.”. See paragraph [0009, “In an embodiment, parsing the open source software project documentation into sections comprises using natural language processing techniques to identify section headings and a section content of the documentation and mapping the sections to a system defined standard structure model for the open source software project's identified technology stack.”. See figure 3 (and related text) and paragraphs [0056], [0069]-[0070], “The doc quality scorer 104 is responsible for calculating the quality score of the documentation. To calculate the quality score of the documentation, the doc quality scorer 104 first parses the documentation and creates a structure of the documentation by dividing the documentation into standardized sections. Thereafter, the doc quality scorer 104 calls individual services to assess quality of each section to determine a section quality score of each section. The doc quality scorer 104 thereafter saves the section quality score to the database 114. Once the section quality scores for all the sections are calculated, the doc quality scorer 104 applies a weighted importance function to calculate the quality score of documentation.”. Examiner notes: fetching, parsing and extracting code),	  		determining an amount of a codebase that the code blocks cover,   	wherein the codebase is associated with the software document (See paragraphs [0020]-[0022], [0024], “extracting a section heading of a section of the documentation and map the section heading to the system defined standard structure model to identify the section heading; and validating the content of the section to map relevance of content to the mapped section's expected content coverage.”. See paragraph [0032], “In some embodiments, validating the document sections with project or stack metrics comprises: fetching the system defined standard structure model for the open source software project's identified technology stack; comparing the parsed document sections to a standard list; identifying mapping compliance; and scoring the open source project documentation for compliance to expected sections as per the system defined standard structure model.”. See paragraphs [0052], [0057], [0059], “The present subject matter discloses a system and method that evaluates a quality of a documentation of an open source software project and creates a quality score for the same. The present subject matter uses machine learning models and natural language processing models for evaluating different sections of the documentation to determine relevance, completeness, and ease of understanding of the documentation. Further, a source code and tech stack information of the documentation is determined. The tech stack information include details, such as size of the source code and number of the application technology stacks, their type and counts. These details are used as major features to compare them against similar tech domain projects to assess the documentation's quality. With the machine learning models, the expected coverage of documentation topics and the depth of explanation is obtained. Similar tech domain open source software projects are pre-selected for their established quality of documentation and are used to train the machine learning models for evaluating the documentation for other open source software projects. The documentation flow is checked for coherence and consistency of technology terminologies used to refer the project's subsystems.” – emphasis added. See paragraph [0061], “The project metrics doc builder 109 creates machine learning models trained with pre-selected project's documentation and its source code. The training data is from the projects which are validated for their good documentation quality. In an example, the project metrics builder 109 extracts details from various documentation having good quality scores. The details may be about the structure of the documentation, language used in the documentation, clarity of the documentation, ease of the understanding of the documentation etc. Such details may comprises the training data for generating machine learning models. The training data is prepared with the parsed documentation to the system defined standard structure model for each technology stack and the open source software projects source code metrics.”. Examiner notes: metrics and coverage, use of pre-validated data).  	generating a report characterizing the software document (See figure 3 (and related text)), wherein the report includes:   		a project coverage metric that indicates the amount of the codebase   	that the code blocks cover, and an instruction validity metric that indicates an amount of the code blocks that executed correctly during execution of the code blocks, wherein the instruction validity metric is based on a validation of the code blocks (See paragraph [0034], “In some embodiments, assessing the document's language quality comprises: evaluating sentences of the documentation to determine ease of understandability score of the documentation, wherein the ease of understandability score indicate easiness in understating the documentation; evaluating the sentences to determine clarity score, wherein the clarity score indicates degree of clarity of subject matter of the documentation; and normalizing ease of understandability score and the clarity score based on the section intended coverage.”. See paragraph [0052], “The present subject matter discloses a system and method that evaluates a quality of a documentation of an open source software project and creates a quality score for the same. The present subject matter uses machine learning models and natural language processing models for evaluating different sections of the documentation to determine relevance, completeness, and ease of understanding of the documentation. Further, a source code and tech stack information of the documentation is determined. The tech stack information include details, such as size of the source code and number of the application technology stacks, their type and counts. These details are used as major features to compare them against similar tech domain projects to assess the documentation's quality. With the machine learning models, the expected coverage of documentation topics and the depth of explanation is obtained. Similar tech domain open source software projects are pre-selected for their established quality of documentation and are used to train the machine learning models for evaluating the documentation for other open source software projects. The documentation flow is checked for coherence and consistency of technology terminologies used to refer the project's subsystems.”. See paragraph [0072] and figure 3 (and related text), “In step 306, a quality of language of the documentation of determined. To determine the quality of language of the documentation, the document sentences are grouped by the sections and evaluated for ease of understanding, clarity etc. The ease of understandability score and clarity score are allotted for the documentation and the ease of understandability score and clarity score are normalized for different sections based on the section intended coverage and the section length”).  	Balasubramanian is silent to disclose, however in an analogous art, Polk teaches:
   	executing the code blocks, and determining whether the code blocks execute correctly (See column 3 line 50 – column 4 line 2, “In general, in one aspect, the invention relates to a method of testing an embedded example in a graphical user interface documentation. The method comprises creating an extractable embedded example by tagging the embedded example, extracting the extractable embedded example from the graphical user interface documentation to generate an extracted example, selecting a tagged entity from the extracted example, interpreting the tagged entity to generate an interpreted tagged entity, creating a test suite using the interpreted tagged entity, selecting a graphical tool against which to execute the test suite, executing the test suite against the graphical tool to generate an output response, creating a golden file using at least one tag from a tag set, comparing the output response to the golden file, creating the tagged entity using at least one tag chosen from the tag set, locating a source of error if the output response varies from the golden file, correcting the extractable embedded example if the output response varies from the golden file, and generating a comparison result after comparing the output response to the golden file.”).
   	Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Balasubramanian’s teaching, which discloses a method for scoring quality of open source software documentation, with Polk’s teaching, as Polk would provide an enhanced way to test graphical user interface documentation.
  	With respect to claim 3, Balasubramanian teaches characterizing a quality of writing that exists in the software document by inputting at least some of the software document into a Large Language Model (LLM) (See figures 3, 5-6, 11 (and related text) and paragraphs [0052], [0061]-[0062], [0076], [0084], “The present subject matter discloses a system and method that evaluates a quality of a documentation of an open source software project and creates a quality score for the same. The present subject matter uses machine learning models and natural language processing models for evaluating different sections of the documentation to determine relevance, completeness, and ease of understanding of the documentation. Further, a source code and tech stack information of the documentation is determined. The tech stack information include details, such as size of the source code and number of the application technology stacks, their type and counts. These details are used as major features to compare them against similar tech domain projects to assess the documentation's quality. With the machine learning models, the expected coverage of documentation topics and the depth of explanation is obtained. Similar tech domain open source software projects are pre-selected for their established quality of documentation and are used to train the machine learning models for evaluating the documentation for other open source software projects. The documentation flow is checked for coherence and consistency of technology terminologies used to refer the project's subsystems.”).  	With respect to claim 4, Balasubramanian teaches wherein an output of the LLM includes characterizations of the quality of writing that exists in the software document (See figures 3, 5-6, 11 (and related text) and paragraphs [0052], [0061]-[0062], [0076], [0084], “The present subject matter discloses a system and method that evaluates a quality of a documentation of an open source software project and creates a quality score for the same. The present subject matter uses machine learning models and natural language processing models for evaluating different sections of the documentation to determine relevance, completeness, and ease of understanding of the documentation. Further, a source code and tech stack information of the documentation is determined. The tech stack information include details, such as size of the source code and number of the application technology stacks, their type and counts. These details are used as major features to compare them against similar tech domain projects to assess the documentation's quality. With the machine learning models, the expected coverage of documentation topics and the depth of explanation is obtained. Similar tech domain open source software projects are pre-selected for their established quality of documentation and are used to train the machine learning models for evaluating the documentation for other open source software projects. The documentation flow is checked for coherence and consistency of technology terminologies used to refer the project's subsystems.”).  	With respect to claim 5, Balasubramanian teaches adding a lucidity metric in the generated report, wherein the lucidity metric indicates a readability of the software document and is based on the output of the LLM (See figures 3, 6-11 (and related text), assess section quality 305, assess document language quality 306, compute document quality score 307, check for consistency and readability 604, compute flow score and terminologies and readability score 605, computer completeness of sections score 606).    	With respect to claim 6, Balasubramanian teaches wherein the LLM is configured to check content of the software document for grammatical and lexical correctness in order to generate the characterizations (See paragraph [0059], “The documentation assessor 107 is used to assess the overall quality of the documentation. It assesses the documentation flow and documentation sections by comparing it against pre-validated or generated documentations trained models with the machine learning algorithms. The documentation assessor 107 also determine the consistency of usage of terminologies and readability using natural language processing techniques. Based on the determination the document assessor 107 may allot terminologies and readability score and a flow score for the documentation. Further, the completeness of expected sections in the documentation is also evaluated based on the system define standard structure model. Based on the evaluation, a completeness of expected sections score may be determined. The above determined terminologies and readability score, the flow score, and the completeness of expected sections scores are saved in the database 115.”).
   	With respect to claim 7, Balasubramanian teaches wherein the at least some of the software document input into the LLM does not include the identified code blocks (See paragraph [0076], “The project doc ML model builder 105 interacts with many other services and helps in training a ML model for predicting documentation quality of the open source software component. At step 401, details, documentation, and metadata of the software component is fetched from the project repository. To retrieve the details, documentation, and metadata from the project repository, the project doc ML builder connects with the project repository and downloads the documentation. In step 402, the documentation is validated to identify the availability of necessary sections of documentation. Once the documentation is validated, the repository documentation data is annotated to categories of high, low and medium quality documentation. This constitutes the data preparation for model training which is step 403. The annotated data serves as the knowledge base or training data for ML model training. The training data prepared in the step 403 is used to train a neural network model which serves the purpose of classifying a documentation quality to be high, medium or low in the step 404. The neural network model generates a machine learning model which analyzes a software documentation document and provides the quality of documentation as output. The built ML model from step 404 is saved in the storage in step 405 for further usage.”).
  	With respect to claims 10 and 12-16, the claim is directed to a computer program product that corresponds to the method recited in claims 1 and 3-7, respectively (see the rejection of claims 1 and 3-7 above, wherein Balasubramanian also teach such program in paragraph [0037]).
  	With respect to claim 19, the claim is directed to a computer system product that corresponds to the method recited in claim 1, respectively (see the rejection of claim 1 above, wherein Balasubramanian also teach such system in figures 1-2).

Claims 2, 11 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Balasubramanian et al (US Pub. No. 2022/0276862 – hereinafter Balasubramanian) in view of Polk (US Pat. No. 7,100,150 – IDS 02/29/2024) and further in view of Bhate et al. (US Pub. No. 2010/0146340 – hereinafter Bhate).   	With respect to claim 2, Balasubramanian in view of Polk is silent to disclose, however in an analogous art, Bhate teaches wherein the code blocks are executed in a sandbox environment (See paragraphs [0023], “The illustrative embodiments recognize that the higher the code coverage percentage, the more likely it is that a bug or error in the code will be identified during the unit testing. Consequently, the illustrative embodiments recognize that the higher the code coverage percentage, the better the quality of the test cases and the better the quality of the tested code.”. See paragraphs [0027]-[0028] and figure 7 (and related text), “In programming context, a sandbox is a development area of a data processing system where developers incrementally build and test code. A sandbox build is a collection, or build, of code that is to be tested in a sandbox setup. Developers build and test enhancements, modifications, or other changes to code in a sandbox build. A backing build is a collection, or build, of original program code, from which the incremental changes of the sandbox build are made.”).    	Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the combination of Balasubramanian and Polk with Bhate’s teaching, as Bhate would provide an improved data processing system, and in particular, to a computer implemented method for analyzing a software application (see paragraph [0002]).
  	With respect to claim 11, the claim is directed to a computer program product that corresponds to the method recited in claim 2, respectively (see the rejection of claim 2 above).
  	With respect to claim 20, the claim is directed to a computer system product that corresponds to the method recited in claim 2, respectively (see the rejection of claim 2 above).

Claims 8-9 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Balasubramanian et al (US Pub. No. 2022/0276862 – hereinafter Balasubramanian) in view of Polk (US Pat. No. 7,100,150 – IDS 02/29/2024) and further in view of Manoharan et al. (US Pub. No. 2025/0110704 – hereinafter Manoharan).   	With respect to claim 8, Balasubramanian in view of Polk is silent to disclose, however in an analogous art, Manoharan teaches wherein the first validation process is performed in response to a determination that a trigger event has occurred (See paragraph [0034], “The audit engine 136 is responsible for performing/executing an audit function on the generated new script. In some embodiments, the audit function may include validation of the new script and subsequently testing the validated new script in one or more test environments. The audit engine 136 may conduct a syntax validation of the generated new script, for example, by checking for correct syntax, formatting, adherence to coding standards specific to the chosen scripting language, coherence of the different code sections (i.e., configured module templates), integrity and compatibility, and so on. In a continuous integration and continuous development (CI/CD) environment, the audit engine 136 may integrate syntax validation into the CI/CD pipeline to ensure that code quality checks are performed at each stage of development.”).   	Therefore, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify the combination of Balasubramanian and Polk with Manoharan’s teaching, as Manoharan would provide an audit function to validate developed code.  	With respect to claim 9, Balasubramanian in view of Polk is silent to disclose, however in an analogous art, Manoharan teaches wherein the trigger event is selected from the group consisting of: a development platform release, a product release, and a manual request (See paragraph [0034], “The audit engine 136 is responsible for performing/executing an audit function on the generated new script. In some embodiments, the audit function may include validation of the new script and subsequently testing the validated new script in one or more test environments. The audit engine 136 may conduct a syntax validation of the generated new script, for example, by checking for correct syntax, formatting, adherence to coding standards specific to the chosen scripting language, coherence of the different code sections (i.e., configured module templates), integrity and compatibility, and so on. In a continuous integration and continuous development (CI/CD) environment, the audit engine 136 may integrate syntax validation into the CI/CD pipeline to ensure that code quality checks are performed at each stage of development.”. Examiner notes: see motivation rationale in claim 8, which similarly applies here).
   	With respect to claims 17-18, the claims are directed to a computer program product that corresponds to the method recited in claims 8-9, respectively (see the rejection of claims 8-9 above).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.   	Finkman et al. (US Pub. No. 2025/0225212) discloses a method for real-time evaluating code leakage during software code development when the developer is using a code assistant tool, comprising performing code leakage estimation by identifying and processing, using an LLM-based model, the most updated code segments as the segments evolve; and evaluating, using the LLM-based model, the extent to which a written code has been inadvertently revealed to one or more code assistant servers by reconstructing the original code from the requests sent to each code assistant server. (see abstract).
  	Gupta et al. (US Pub. No. 2024/0078107) discloses techniques for performing quality-based action(s) regarding engineer-generated documentation associated with code and/or an API. Features are extracted from data associated with the engineer-generated documentation, which includes engineer-generated document(s). (see abstract).
   	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANIBAL RIVERACRUZ whose telephone number is (571)270-1200. The examiner can normally be reached Monday-Friday 9:30 AM-6:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hyung S Sough can be reached at 5712726799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ANIBAL RIVERACRUZ/Primary Examiner, Art Unit 2192
Read full office action
Prosecution Timeline

Feb 29, 2024
Application Filed
Mar 17, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/970,450
Patent 12639582
METHOD AND APPARATUS FOR USING A PACKET ARCHITECTURE TO PROCESS NEURAL NETWORKS IN A NEURAL PROCESSING UNIT
3y 7m to grant Granted May 26, 2026
17/707,853
Patent 12619910
MACHINE LEARNING PIPELINE WITH VISUALIZATIONS
4y 1m to grant Granted May 05, 2026
18/509,240
Patent 12619400
MANAGEMENT OF A MULTI-LAYER MODEL PLATFORM
2y 5m to grant Granted May 05, 2026
18/753,629
Patent 12619402
Low-Code / No-Code Layer for Interactive Application Development
1y 10m to grant Granted May 05, 2026
18/739,595
Patent 12608192
APPARATUS FOR VEHICLE OVER-THE-AIR UPDATING, AND METHOD THEREOF
1y 10m to grant Granted Apr 21, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
91%
Grant Probability
99%
With Interview (+12.0%)
2y 3m (~0m remaining)
Median Time to Grant
Low
PTA Risk
Based on 749 resolved cases by this examiner. Grant probability derived from career allowance rate.