Last updated: May 29, 2026

Application No. 18/388,991

System and Methods for Enabling User Interaction with Scan or Image of Document

Non-Final OA §103

Filed

Nov 13, 2023

Priority

Nov 15, 2022 — provisional 63/425,544

Examiner

NAZAR, AHAMED I

Art Unit

2178

Tech Center

2100 — Computer Architecture & Software

Assignee

Evisort Inc.

OA Round

1 (Non-Final)

This examiner grants 53% of cases after interview

— +32.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.

Based on 383 resolved cases, 2023–2026

Examiner Intelligence

NAZAR, AHAMED I View full profile →

Grants 53% of resolved cases

Career Allowance Rate

204 granted / 383 resolved

-1.7% vs TC avg

Strong +33% interview lift

Without

With

+32.8%

Interview Lift

resolved cases with interview

Typical timeline

4y 1m

Avg Prosecution

12 currently pending

Career history

409

Total Applications

across all art units

Statute-Specific Performance

§101

0.7%

-39.3% vs TC avg

§103

87.0%

+47.0% vs TC avg

§102

10.0%

-30.0% vs TC avg

§112

1.0%

-39.0% vs TC avg

Black line = Tech Center average estimate • Based on career data from 383 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is responsive to the application filed 11/13/2023.
Claims 1-21 are pending with claims 1, 8, and 15 as independent claims.


Information Disclosure Statement
The information disclosure statement (IDS) submitted on 1/22/2024 was filed after the mailing date of the application on 11/13/2023.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-8, 10-15, and 17-21 are rejected under 35 U.S.C. 103 as being unpatentable over Gopalakrishnan et al. (US 2018/0217973, hereinafter as Gopalakrishnan) in view of Katsura (US 2019/0028607) in view of Radakovic et al (US 2011/0222773, hereinafter as Radakovic).

Claim 1. A method for processing a document, comprising:
performing optical character recognition (OCR) processing on a PDF file or document scan to identify text in the document, and [to identify bounding boxes for words and lines of that text]; Gopalakrishnan discloses in [0040] “The multi-layered OCR document generation module 208 receives the digital image of the input document, converts the digital image into binary form, removes all lines from the digital image, performs a morphological dilation operation on the binary image using a horizontal and a vertical structuring element, identifies and creates one or more text groups based on the dilation operation, performs OCR of each text group (i.e., extracting text information for each group), generates OCR layer for each text group, and combines the multiple OCR layers while generating a multi-layered OCR document.” (emphasis added) examiner note: the OCR module identifies text from the document image. 
overlaying the text output by the OCR process on top of the original document image and making that layer substantially invisible to a user; Gopalakrishnan discloses in [0035] “The multiple OCR layers 112 are then superimposed over the image layer 110 to generate the output OCR document, i.e., a multi-layered OCR document. The multiple layers 112 are invisible for the user and represent a group of text.” And in [0040] “The multi-layered OCR document generation module 208 then combines the OCR layers (with invisible option) with the scanned image to form a multi-layered OCR document. Here, the multi-layered OCR document is the editable and searchable document of a pre-defined format such as PDF.” (emphasis added) examiner note: the text layer may be overlaid over the original or image of document layer such that the text layer may be invisible to a user,  
providing the processed document to the user or a document processing pipeline for further evaluation or analysis. Gopalakrishnan discloses in [0035] “The engine then superimposes the scanned image over the text layer to create an OCR document. The text layer allows a user to select and copy the text from the OCR document.” (emphasis added) examiner note: the created OCR document may be the generated structured document object.
Gopalakrishnan does not explicitly disclose
to identify bounding boxes for words and lines of that text. However, Radakovic, in an analogous art, discloses in [0002] “an Optical Character Recognition (OCR) process involves paragraph detection. Paragraph detection will typically be performed after textual lines in a textual image have been identified by the coordinates of their respective bounding boxes.” And in [0018] “Individual lines and words are defined by line and word bounding boxes, respectively. The bounding boxes themselves are defined by a set of coordinates that are established for each.”
identifying one or more paragraphs in the document by grouping a series of lines together into a paragraph; Radakovic also discloses in [claim 1] “an Optical Character Recognition (OCR) process involves paragraph detection. Paragraph detection will typically be performed after textual lines in a textual image have been identified by the coordinates of their respective bounding boxes. In one implementation, the paragraph detection process classifies all textual lines on the page into one two classes: a "beginning paragraph line" class and a "continuation paragraph line" class. A beginning paragraph line follows a line with a hard break and a continuation paragraph line follows a line with a break that is not a hard break. Individual paragraphs are then identified. Each paragraph includes all lines located between two successive beginning paragraph lines, as well as a first of the two successive beginning paragraph lines.” (emphasis added) examiner note: the paragraph detection component may group text lines including beginning paragraph line and continuous paragraph lines to identify a paragraph in the scanned document.
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Gopalakrishnan with the teaching of Radakovic because “Optical character recognition (OCR) is a computer-based translation of an image of text into digital form as machine-editable text, generally in a standard encoding scheme. This process eliminates the need to manually type the document into the computer system.” Radakovic [Background].
Gopalakrishnan does not explicitly disclose
generating and populating a structured document object. However, Katsura, in an analogous art, discloses in [0049-0051] “the OCR processing portion 82 recognizes the characters and numbers in the table. Then, the document file generating portion 83 attaches as an object the data of the table set in ruled lines… The document file generating portion 83 generates a document file 10 in which objects (the text box 10a, the image object 10b, and the table data 10c) are arranged at the same positions as in the document… as shown in FIG. 5, the document file generating portion 83 generates an XML file that includes parts defining the text box 10a, the image object 10b, and the table data 10c respectively.” (emphasis added) examiner note: the generated XML file may be a structured document object as shown in fig. 5.
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Gopalakrishnan with the teaching of Radakovic because “When a paper document (original) is converted into an electronic format, document reading is performed. Image data acquired by reading is sometimes converted into a file in a particular format… aimed at automatically selecting a document file generation process to facilitate editing of a document file, thereby removing the burden on a user.” Katsura [Background and 0019].

Claims 3, 10, and 17. The rejection of the method of claim 1 is incorporated, wherein identifying one or more paragraphs in the document further comprises:
Gopalakrishnan does not explicitly disclose but Radakovic does
identifying spacing between lines and determining breaks between sections or paragraphs; Radakovic discloses in [0002] “A beginning paragraph line follows a line with a hard break and a continuation paragraph line follows a line with a break that is not a hard break. Individual paragraphs are then identified.” (emphasis added).
identifying one or more enumerators; Radakovic further discloses in [0045-0046] “The classification process for classifying each textual line as a beginning paragraph line or a continuation paragraph line may be accomplished by examining some or all of the features listed below, each of which are more likely indicative that the line is a beginning paragraph line or continuation paragraph line… both may use the Primary Line Feature Set, which is defined by the following features that characterize individual lines:… Does the previous line begin with a bullet symbol… Does the current line beginning with a bullet symbol… Does the next line begin with a bullet symbol... Does the previous line begin with a capital letter... Does the current line begin with a capital letter.” (emphasis added) examiner note: the bullet symbol and/or capital letters may be utilized, as enumerators, to identify paragraphs on the scanned document, and
using the determined breaks and/or enumerators to identify paragraphs in the document. Radakovic also discloses in [0002] “A beginning paragraph line follows a line with a hard break and a continuation paragraph line follows a line with a break that is not a hard break. Individual paragraphs are then identified.” (emphasis added).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Gopalakrishnan with the teaching of Radakovic because “Optical character recognition (OCR) is a computer-based translation of an image of text into digital form as machine-editable text, generally in a standard encoding scheme. This process eliminates the need to manually type the document into the computer system.” Radakovic [Background].

Claims 4, 11, and 18. The rejection of the method of claim 1 is incorporated, further comprising
Gopalakrishnan does not explicitly disclose detecting headers and/or footers based on position and/or contents of text. However, Radakovic discloses in [0027] “The header and footer are text fragments that do not belong to the text stream of the paragraph should be excluded when detecting "wrapping" paragraph (i.e. paragraphs that span across two or more pages. Information about text fragments that interrupt the flow of text flow (e.g., headers, footnotes, image captions, etc.) is contained within the information made available to the paragraph component from other components of the OCR engine.” (emphasis added).
Accordingly, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Gopalakrishnan with the teaching of Radakovic because “Optical character recognition (OCR) is a computer-based translation of an image of text into digital form as machine-editable text, generally in a standard encoding scheme. This process eliminates the need to manually type the document into the computer system.” Radakovic [Background].

Claims 5, 12, and 19. The rejection of the method of claim 1 is incorporated, further comprising identifying one or more elements for further processing and analysis using one or more of positional and semantic analysis or models. Gopalakrishnan discloses in [0047-0069] “the digital image 300 includes an address field 302, a code field 304, a description field 306, a price field 308, a date field 310, a customer number field 312, an invoice number field 314, and a total price field 316… FIG. 3B illustrates an exemplary binary image 320 generated from the digital image 300… FIG. 3C illustrates an exemplary intermediate dilated image 330 generated from the binary image 320… The intermediate dilated image 330 is generated by performing morphological dilation operation on the binary image 320 of FIG. 3B… FIG. 3F illustrates various text groups (360a,360b,360c, 360d, 360e, 360f, 360g, 360h, and 360i) 360a-360i (hereinafter collectively referred to as text groups 360) formed as a result of dilation operation in accordance with an embodiment of the present disclosure. These groups 360 include text content arranged in columns and/or rows format. The text content of the digital image 300 is segmented into multiple text groups 360 based on alignment of text in the image 300… FIG. 3G illustrates an exemplary image layer 365 and exemplary first through ninth OCR layers (370a, 370b, 370c . . . 370i), i.e., 370a-370i generated based on the text groups 360a-360i.” (emphasis added) examiner note: the document elements such as an address field 302, code field 304, etc. may be identified and the further processing may be the generation of binary image 320, the performing of morphological dilation of the binary image 320, the grouping of the text content as being arranged in columns and/or rows format to create OCR layers 370.

Claims 6, 13, and 20. The rejection of the method of claim 5 is incorporated, wherein the one or more elements identified include clause titles and tables. Gopalakrishnan discloses in [0040, 0047, and 0069] “The multi-layered OCR document generation module 208 receives the digital image of the input document, converts the digital image into binary form, removes all lines from the digital image, performs a morphological dilation operation on the binary image using a horizontal and a vertical structuring element, identifies and creates one or more text groups based on the dilation operation, performs OCR of each text group (i.e., extracting text information for each group), generates OCR layer for each text group, and combines the multiple OCR layers while generating a multi-layered OCR document… the digital image 300 includes an address field 302, a code field 304, a description field 306, a price field 308, a date field 310, a customer number field 312, an invoice number field 314, and a total price field 316… An OCR layer 370a is generated corresponding to a text group 360a by first recognizing characters of that text group using an OCR algorithm and extract those characters from the text group 360a. Similarly, other OCR layers 370b-370i are generated.” (emphasis added) examiner note: identified text groups may correspond title element 360h and returning table element values 360e as shown in fig. 3F.  
 
Claims 7, 14, and 21. The rejection of the method of claim 1 is incorporated, wherein providing the processed document to the user or a document processing pipeline for further evaluation or analysis further comprises generating a display of the processed document to enable a user to interact with the document by selecting an element of the document, annotating an element, or performing another action. Gopalakrishnan discloses in [0070] “FIG. 3H illustrates an exemplary multi-layered OCR file 380 as generated in accordance with an embodiment of the present disclosure. The multi-layered OCR file 380 enables a user to select and copy text corresponding to each OCR layer 360 independently therefrom, without having to worry about selection of other undesired text.” (emphasis added) examiner note: the generated OCR file would enable the user to interact with text of the original document based on the layered OCR text layer by performing copy action, for example.  

Claim 8. The claim is directed towards a system to implement the steps of the method of claim 1, therefore, is similarly rejected as claim 1. The system further comprising:
one or more electronic processors configured to execute a set of computer-executable instructions; and a non-transitory computer-readable medium including the set of computer-executable instructions, Gopalakrishnan discloses in [0044 and 0084] “system 220 including the scanning device 202 and a computing device 230… one or more computing devices having at least one processor configured to or programmed to execute software instructions stored on a computer readable tangible, non-transitory medium or also referred to as a processor-readable medium.” (emphasis added).

Claim 15. The claim is directed towards a non-transitory computer readable medium containing a set of computer-executable instructions to implement the steps of the method of claim 1, therefore, is similarly rejected as claim 1.


Claims 2, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Gopalakrishnan, Radakovic, and Katsura as applied to claim 1 above, and further in view of Kim et al. (US 2024/0330357, filed 7/9/2021, hereinafter as Kim).

Claims 2, 9, and 16. The rejection of the method of claim 1 is incorporated, wherein overlaying the text output by the OCR process on top of the original document image and making that layer substantially invisible to the user further comprises:
converting one or more pages into an image representation; Gopalakrishnan discloses in [0005] “The method includes receiving a document for scanning by a user, the document includes text. A scanned image of the document is created and the scanned image is converted into a binary image.” (emphasis added).
computing and apply a scaling factor between the image representation and the document scan; and assembling the text into a layer and overlaying on the PDF or document scan and making the overlay substantially invisible to the user. Gopalakrishnan discloses in [0004] “the combined OCR layers are superimposed as invisible text layers over the scanned image to create the multi-layered OCR document, the multi-layered OCR document facilitates selection of a text group corresponding to the OCR layer.” (emphasis added) examiner note: the superimposing of the text layer over the scanned document may indicate that the text layer may be generated with similar size as the scanned document such that the text layer would appear invisible to the user.
Gopalakrishnan does not explicitly disclose
identifying and applying a font type and font size to the text. However, Kim, in an analogous art, discloses in [0107] “The font library (DB) may store font data regarding various fonts. The electronic apparatus or the font library (DB) may perform font classification on the input handwritten image to identify a font (e.g., Font 3) that is similar to the input handwritten image in the font library (DB). For example, the electronic apparatus may analyze a handwritten image on a line including the replacement text “1” to analyze font characteristics of the handwritten image, such as, curvature, inclination, shape, position, etc. of strokes in order to identify a font (e.g., Font 3) similar to the handwritten image in the font library (DB).” (emphasis added) examiner note:


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. See PTO-892.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHAMED I NAZAR whose telephone number is (571)270-3174. The examiner can normally be reached 10 am to 7 pm Mon-Fri.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached at 571-272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/AHAMED I NAZAR/Examiner, Art Unit 2178                                                                                                                                                                                                        12/23/2025

/STEPHEN S HONG/Supervisory Patent Examiner, Art Unit 2178

Read full office action

Prosecution Timeline

Nov 13, 2023

Application Filed

Dec 31, 2025

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/899,130

Patent 12619796

VIRTUAL ENVIRONMENT FOR LARGE-SCALE CAPITAL PROJECTS

3y 8m to grant Granted May 05, 2026

17/761,532

Patent 12564342

METHODS, SYSTEMS, AND DEVICES FOR THE DIAGNOSIS OF BEHAVIORAL DISORDERS, DEVELOPMENTAL DELAYS, AND NEUROLOGIC IMPAIRMENTS

3y 11m to grant Granted Mar 03, 2026

17/566,782

Patent 12548333

DYNAMIC NETWORK QUANTIZATION FOR EFFICIENT VIDEO INFERENCE

4y 1m to grant Granted Feb 10, 2026

18/383,839

Patent 12549503

INFORMATION INTERACTION METHOD AND APPARATUS, AND ELECTRONIC DEVICE

2y 3m to grant Granted Feb 10, 2026

17/771,649

Patent 12539042

Multi-Modal Imaging System and Method Therefor

3y 9m to grant Granted Feb 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

53%

Grant Probability

86%

With Interview (+32.8%)

4y 1m (~1y 7m remaining)

Median Time to Grant

Low

PTA Risk

Based on 383 resolved cases by this examiner. Grant probability derived from career allowance rate.