DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-3, 7-12, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Borges et al., WO 2021221614 A1 (Borges), and further in view of Melen, 6,151,423 (Melen).
Regarding claim 1, Borges teaches a method for training a machine learning model for document rotation detection (training a machine learning (ML) orientation classifier for determining an orientation of an electronic document) ([0014]), comprising:
rotating each document image in a first set of document images by a plurality of rotation angles to obtain a first set of rotated document images (the apparatus may electronically rotate the electronic document by various angles of rotation clockwise and/or counter clockwise (such as 0, 90, 180, and 270) to generate rotated images for analysis by the ML orientation classifier) ([0015]),
wherein each document image in the first set of document images has a known orientation (wherein the ML classifier may be trained on a training corpus of images labeled with orientations) (Fig. 2; [0022-0023] and [0035]);
associating a rotation classification label to each rotated document image in the first set of rotated document images (wherein the ML classifier may be trained on a training corpus of images labeled with orientations) (Fig. 2; [0022-0023] and [0035]);
for each document image in a second set of document images (wherein the input can be an electronic document 301, that has an unknown orientation) (Fig. 3; [0039]):
rotating the respective document image by a plurality of rotation angles (wherein the angles of rotation can be in a first direction including a first set of angles comprising 0, 90, 180, and 270) (Figs. 4 and 5; [0021] and [0044]),
performing an character recognition analysis at each rotation angle of the plurality of rotation angles (wherein the ML orientation classifier 230 can identify text objects that are oriented one way or another of each rotation angle of 0, 90, 180, and 270) (Fig. 2; [0036-0037]),
generating a confidence score based on the character recognition analyses (based on the distinction of objects, such as text, in the electronic document and the knowledge of the orientations of the labelled documents obtained during ML training, the ML orientation classifier can output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]),
assigning the confidence score to the respective document image (output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]), and
associating a rotation classification label to the respective document image (associating a final orientation classification of the electronic document) (Fig. 5; [0029]) based on the character recognition analyses (text object analysis) (Fig. 2; [0036-0038]), the rotation classification label being analogous to the rotation classification labels associated with the first set of rotated document images (wherein associating the final orientation classification can be one of the four rotation angles (such as 0, 90, 180, and 270)) (Fig. 5; [0029] and [0032]); and
training a machine learning model to detect document rotation based on a combination of the first set of rotated document images having the associated rotation classification labels (wherein the ML classifier may be trained on a training corpus of images labeled with orientations) (Fig. 2; [0022-0023] and [0035]) and the second set of document images having the confidence scores (based on the distinction of objects, such as text, in the electronic document and the knowledge of the orientations of the labelled documents obtained during ML training, the ML orientation classifier can output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]) and the associated rotation classification labels (wherein each orientation, of the set of orientations (0, 90, 180, and 270 degrees), is associated with a respective probability that the orientation is correct) ([0038]).
However, Borges does not explicitly teach “optical” character recognition.
Melen teaches the correct orientation for a document scanned by an OCR system is determined from the confidence factors associated with multiple character images identified in the document (Abstract); and wherein the orientation is obtained by the OCR system along with a confidence factor corresponding to the orientation (Figs. 6 and 7; col. 6, lines 8-40).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Borges to include “optical” character recognition with a confidence score since it can provide more accurate document orientation (Melen; col. 5, lines 45-47).
Regarding claim 2, Borges teaches wherein generating the confidence score based on the character recognition analyses (based on the distinction of objects, such as text, in the electronic document and the knowledge of the orientations of the labelled documents obtained during ML training, the ML orientation classifier can output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]) comprises: recording an character recognition analysis confidence score at each rotation angle of the plurality of rotation angles (wherein each orientation, of the set of orientations (0, 90, 180, and 270 degrees), is associated with a respective probability that the orientation is correct) ([0038]); and comparing the character recognition analysis confidence scores for each rotation angle of the plurality of rotation angles to identify a highest confidence score (wherein the scores are based on the text object distinctions along with a plurality of confidence scores; including a highest score) (Fig. 3; [0041]), wherein the confidence score has the highest confidence score (the highest confidence score being the score used for orientation determination) (Fig. 3; [0041]).
However, Borges does not explicitly teach “optical” character recognition.
Melen teaches the correct orientation for a document scanned by an OCR system is determined from the confidence factors associated with multiple character images identified in the document (Abstract); and wherein the orientation is obtained by the OCR system along with a confidence factor corresponding to the orientation (Figs. 6 and 7; col. 6, lines 8-40).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Borges to include “optical” character recognition with a confidence score since it can provide more accurate document orientation (Melen; col. 5, lines 45-47).
Regarding claim 3, Borges teaches wherein the plurality of rotation angles comprise 0, 90, 180, and 270 degrees (wherein the angles of rotation can be in a first direction including a first set of angles comprising 0, 90, 180, and 270) (Figs. 4 and 5; [0021] and [0044]).
Regarding claim 6, Borges teaches further comprising assigning a confidence score to each document image in the first set of rotated document images (based on this distinction of objects in the electronic document and the knowledge of the orientations of the labelled documents 212A-D obtained during ML training 220, the ML orientation classifier 230 may output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]).
Regarding claim 7, Borges teaches further comprising: using character recognition to detect regions of text in the document images (identifying text objects in the documents) ([0036]); cropping portions of the document images that include one or more detected regions of text to generate one or more text image patches (wherein detected regions of identified text objects are generated) ([0035-0037]); adding the one or more text image patches to a training dataset (wherein the ML training may include text object training) (Fig. 2; [0035-0036]); and training the machine learning model on the training dataset comprising the one or more text image patches (wherein the ML training includes text objects) ([0035-0037]).
Regarding claim 8, Borges teaches wherein assigning the generated confidence score (based on this distinction of objects in the electronic document and the knowledge of the orientations of the labelled documents 212A-D obtained during ML training 220, the ML orientation classifier 230 may output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]) comprises: analyzing the document image using character recognition to detect one or more text regions (detecting text object regions) ([0035-0037]); extracting one or more cropped image patches that include the detected one or more text regions from the document image; performing an character recognition process image patches at different rotation angles (wherein each orientation, of the set of orientations (0, 90, 180, and 270 degrees), is associated with a respective probability that the orientation is correct) ([0038]); and assigning the generated confidence score based on the extracted one or more image patches at different rotation angles (based on this distinction of objects in the electronic document and the knowledge of the orientations of the labelled documents 212A-D obtained during ML training 220, the ML orientation classifier 230 may output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]).
However, Borges does not explicitly teach “optical” character recognition or “cropped image patches”.
Melen teaches the correct orientation for a document scanned by an OCR system is determined from the confidence factors associated with multiple character images identified in the document (Abstract); wherein the orientation is obtained by the OCR system along with a confidence factor corresponding to the orientation (Figs. 6 and 7; col. 6, lines 8-40); and wherein the text is extracted/cropped (processing by the character recognition module implements conventional routines for segmenting the scanned document as necessary, and for extracting plural character images residing within the scanned document, such as those for the various alphanumeric characters on a page) (col. 3, line 63 to col. 4, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Borges to include “optical” character recognition with a confidence score and extracting characters since it can provide more accurate document orientation (Melen; col. 5, lines 45-47).
Regarding claim 9, Borges teaches further comprising: providing a document to the trained machine learning model (providing a document to the ML orientation classifier) ([0016]); predicting a document rotation angle for the provided document (determining the final orientation classification to indicate an orientation of the electronic document) ([0016]); and rotating the provided document to a known rotation angle based on the predicted document rotation angle (the apparatus may determine a corrective angle of rotation to apply to the electronic document based on the final orientation classification; wherein the apparatus may correct the orientation of electronic document based on the corrective angle of rotation) ([0016]).
Regarding claim 10, see the rejection made to claim 1, as well as prior art Borges for a processing system (computing device 100) (Fig. 1; [0017-0018]), comprising: a memory (memory 110) (Fig. 1; [0019]) comprising computer-executable instructions (wherein memory 110 may have stored thereon machine-readable instructions) (Fig. 1; [0019]); and a processor (processor 102) (Fig. 1; [0018-0019]) configured to execute the computer-executable instructions (wherein the processor executes the machine-readable instructions) (Fig. 1; [0019]) and cause the processing system (wherein the processor 102 can control the operations of the apparatus 100) (Fig. 1; [0018]), for they teach all the limitations within this claim.
Regarding claim 11, see the rejection made to claim 2, as well as prior art Borges for a processing system (computing device 100) (Fig. 1; [0017-0018]), comprising: a memory (memory 110) (Fig. 1; [0019]) comprising computer-executable instructions (wherein memory 110 may have stored thereon machine-readable instructions) (Fig. 1; [0019]); and a processor (processor 102) (Fig. 1; [0018-0019]) configured to execute the computer-executable instructions (wherein the processor executes the machine-readable instructions) (Fig. 1; [0019]) and cause the processing system (wherein the processor 102 can control the operations of the apparatus 100) (Fig. 1; [0018]), for they teach all the limitations within this claim.
Regarding claim 12, see the rejection made to claim 3, as well as prior art Borges for a processing system (computing device 100) (Fig. 1; [0017-0018]), comprising: a memory (memory 110) (Fig. 1; [0019]) comprising computer-executable instructions (wherein memory 110 may have stored thereon machine-readable instructions) (Fig. 1; [0019]); and a processor (processor 102) (Fig. 1; [0018-0019]) configured to execute the computer-executable instructions (wherein the processor executes the machine-readable instructions) (Fig. 1; [0019]) and cause the processing system (wherein the processor 102 can control the operations of the apparatus 100) (Fig. 1; [0018]), for they teach all the limitations within this claim.
Regarding claim 16, see the rejection made to claim 7, as well as prior art Borges for a processing system (computing device 100) (Fig. 1; [0017-0018]), comprising: a memory (memory 110) (Fig. 1; [0019]) comprising computer-executable instructions (wherein memory 110 may have stored thereon machine-readable instructions) (Fig. 1; [0019]); and a processor (processor 102) (Fig. 1; [0018-0019]) configured to execute the computer-executable instructions (wherein the processor executes the machine-readable instructions) (Fig. 1; [0019]) and cause the processing system (wherein the processor 102 can control the operations of the apparatus 100) (Fig. 1; [0018]), for they teach all the limitations within this claim.
Regarding claim 17, see the rejection made to claim 8, as well as prior art Borges for a processing system (computing device 100) (Fig. 1; [0017-0018]), comprising: a memory (memory 110) (Fig. 1; [0019]) comprising computer-executable instructions (wherein memory 110 may have stored thereon machine-readable instructions) (Fig. 1; [0019]); and a processor (processor 102) (Fig. 1; [0018-0019]) configured to execute the computer-executable instructions (wherein the processor executes the machine-readable instructions) (Fig. 1; [0019]) and cause the processing system (wherein the processor 102 can control the operations of the apparatus 100) (Fig. 1; [0018]), for they teach all the limitations within this claim.
Regarding claim 18, see the rejection made to claim 9, as well as prior art Borges for a processing system (computing device 100) (Fig. 1; [0017-0018]), comprising: a memory (memory 110) (Fig. 1; [0019]) comprising computer-executable instructions (wherein memory 110 may have stored thereon machine-readable instructions) (Fig. 1; [0019]); and a processor (processor 102) (Fig. 1; [0018-0019]) configured to execute the computer-executable instructions (wherein the processor executes the machine-readable instructions) (Fig. 1; [0019]) and cause the processing system (wherein the processor 102 can control the operations of the apparatus 100) (Fig. 1; [0018]), for they teach all the limitations within this claim.
Regarding claim 19, Borges teaches a method for correcting a rotated document (the apparatus may correct the orientation of the electronic document) ([0016]) comprising:
providing a document image to a machine learning model (wherein an electronic document can be provided to a machine learning (ML) orientation classifier) (Fig. 3; [0039]) trained on training data (wherein the ML classifier may be trained on a training corpus of images labeled with orientations) (Fig. 2; [0022-0023] and [0035]) including:
a first set of rotated document images (the apparatus may electronically rotate the electronic document by various angles of rotation clockwise and/or counter clockwise (such as 0, 90, 180, and 270) to generate rotated images for analysis by the ML orientation classifier) ([0015]), each rotated document image of the first set of rotated document images being associated with a known orientation label (wherein the ML classifier may be trained on a training corpus of images labeled with orientations) (Fig. 2; [0022-0023] and [0035]), and
a second set of document images associated with an estimated orientation label and assigned an character recognition process (based on the distinction of objects, such as text, in the electronic document and the knowledge of the orientations of the labelled documents obtained during ML training, the ML orientation classifier can output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]),
wherein the character recognition process is performed on a rotated version of each document in the second set of document images (wherein each orientation, of the set of orientations (0, 90, 180, and 270 degrees), is associated with a respective probability that the orientation is correct) ([0038]);
predicting a document rotation angle for the provided document image (determining the final orientation classification to indicate an orientation of the electronic document) ([0016]); and
rotating the provided document image to a known rotation angle based on the predicted document rotation angle (the apparatus may determine a corrective angle of rotation to apply to the electronic document based on the final orientation classification; wherein the apparatus may correct the orientation of electronic document based on the corrective angle of rotation) ([0016]).
However, Borges does not explicitly teach “optical” character recognition or “uncertainty weighting label based on an optical character recognition”.
Melen teaches the correct orientation for a document scanned by an OCR system is determined from the confidence factors (uncertainty weighting label) associated with multiple character images identified in the document (Abstract); and wherein the orientation is obtained by the OCR system along with a confidence factor (uncertainty weighting label) corresponding to the orientation (Figs. 6 and 7; col. 6, lines 8-40).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Borges to include “optical” character recognition with a confidence score (uncertainty weighting label) since it can provide more accurate document orientation (Melen; col. 5, lines 45-47).
Regarding claim 20, Borges teaches wherein the character recognition process is performed on each document in the second set of document images at angles comprising 0, 90, 180, and 270 degrees (wherein each orientation, of the set of orientations (0, 90, 180, and 270 degrees), is associated with a respective probability that the orientation is correct) ([0038]).
However, Borges does not explicitly teach “optical” character recognition.
Melen teaches the correct orientation for a document scanned by an OCR system is determined from the confidence factors associated with multiple character images identified in the document (Abstract); and wherein the orientation is obtained by the OCR system along with a confidence factor corresponding to the orientation (Figs. 6 and 7; col. 6, lines 8-40).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Borges to include “optical” character recognition with a confidence score since it can provide more accurate document orientation (Melen; col. 5, lines 45-47).
Claim(s) 4 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Borges et al., WO 2021221614 A1 (Borges), Melen, 6,151,423 (Melen), and further in view of Berryman et al., US 2006/0161780 A1 (Berryman).
Regarding claim 4, Borges teaches an apparatus may include a processor that may be caused to access an electronic document and electronically rotate the electronic document by a plurality of angles of rotation to generate a plurality of rotated images (Abstract). Melen teaches the correct orientation for a document scanned by an OCR system is determined from the confidence factors associated with multiple character images identified in the document (Abstract).
However, neither explicitly teaches “adding text to one or more portions of an empty document field within a document image to generate the first set of document images”.
Berryman teaches an apparatus for adding signature information to electronic documents (Abstract); wherein determining the orientation of the electronic document ([0002]); and wherein adding text to one or more portions of an empty document field within a document image to generate the first set of document images (wherein signature data is added as text into the signate data field template, which is also in text format, to generate an electronic document(s)) ([0019]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of prior arts to include adding text since it adds a significant advancement in the field of electronic document management (Berryman; [0038]).
Regarding claim 13, see the rejection made to claim 4, as well as prior art Borges for a processing system (computing device 100) (Fig. 1; [0017-0018]), comprising: a memory (memory 110) (Fig. 1; [0019]) comprising computer-executable instructions (wherein memory 110 may have stored thereon machine-readable instructions) (Fig. 1; [0019]); and a processor (processor 102) (Fig. 1; [0018-0019]) configured to execute the computer-executable instructions (wherein the processor executes the machine-readable instructions) (Fig. 1; [0019]) and cause the processing system (wherein the processor 102 can control the operations of the apparatus 100) (Fig. 1; [0018]), for they teach all the limitations within this claim.
Claim(s) 5, 6, 14, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Borges et al., WO 2021221614 A1 (Borges), Melen, 6,151,423 (Melen), and further in view of Gao et al., CN 112001394 A (Gao).
Regarding claim 5, Borges teaches an apparatus may include a processor that may be caused to access an electronic document and electronically rotate the electronic document by a plurality of angles of rotation to generate a plurality of rotated images (Abstract); and wherein training the machine learning model on the combination of the first set of rotated document images (wherein the ML classifier may be trained on a training corpus of images labeled with orientations) (Fig. 2; [0022-0023] and [0035]) and the second set of document images (based on the distinction of objects, such as text, in the electronic document and the knowledge of the orientations of the labelled documents obtained during ML training, the ML orientation classifier can output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]). Melen teaches the correct orientation for a document scanned by an OCR system is determined from the confidence factors associated with multiple character images identified in the document (Abstract).
However, neither explicitly teaches “iteratively determining a loss using an uncertainty-aware loss function that weights loss terms based on magnitudes of the confidence scores”.
Gao teaches constructing and training the OCR identification neural network (bottom of page 9 to top of page 10); and wherein iteratively determining a loss using an uncertainty-aware loss function that weights loss terms based on magnitudes of the confidence scores (iterative loss function based on the confidence level) (bottom of page 9 to top of page 10).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of prior arts to include a loss function since it helps with optimization (Gao; bottom of page 9 to top of page 10).
Regarding claim 6, Borges teaches further comprising assigning a confidence score to each document image in the first set of rotated document images (based on this distinction of objects in the electronic document and the knowledge of the orientations of the labelled documents 212A-D obtained during ML training 220, the ML orientation classifier 230 may output probabilities that an input electronic document is in each of a plurality of orientations corresponding to the labels) (Fig. 2; [0038]).
Regarding claim 14, see the rejection made to claim 5, as well as prior art Borges for a processing system (computing device 100) (Fig. 1; [0017-0018]), comprising: a memory (memory 110) (Fig. 1; [0019]) comprising computer-executable instructions (wherein memory 110 may have stored thereon machine-readable instructions) (Fig. 1; [0019]); and a processor (processor 102) (Fig. 1; [0018-0019]) configured to execute the computer-executable instructions (wherein the processor executes the machine-readable instructions) (Fig. 1; [0019]) and cause the processing system (wherein the processor 102 can control the operations of the apparatus 100) (Fig. 1; [0018]), for they teach all the limitations within this claim.
Regarding claim 15, see the rejection made to claim 6, as well as prior art Borges for a processing system (computing device 100) (Fig. 1; [0017-0018]), comprising: a memory (memory 110) (Fig. 1; [0019]) comprising computer-executable instructions (wherein memory 110 may have stored thereon machine-readable instructions) (Fig. 1; [0019]); and a processor (processor 102) (Fig. 1; [0018-0019]) configured to execute the computer-executable instructions (wherein the processor executes the machine-readable instructions) (Fig. 1; [0019]) and cause the processing system (wherein the processor 102 can control the operations of the apparatus 100) (Fig. 1; [0018]), for they teach all the limitations within this claim.
Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL J VANCHY JR whose telephone number is (571)270-1193. The examiner can normally be reached Monday - Friday 9am - 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Emily Terrell can be reached at (571) 270-3717. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/MICHAEL J VANCHY JR/Primary Examiner, Art Unit 2666 Michael.Vanchy@ustpo.gov