DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant’s claim for foreign priority under 35 U.S.C. 119 (a)-(d). The certified copy has been filed in parent Application No. RU-2020122468, filed on 11/03/2021.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/05/2023 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Claim Status
Claims 1-7 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-7 of U.S. Patent No. 11,816,910 B2.
Claim(s) 1-7 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bulatov et al “On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision mode.”; Bulatov).
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-7 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-7 of U.S. Patent No. 11,816,910 B2.
For claims 1 and 6-7 although this claim(s) is/are not identical to Claims 1 and 6-7 of U.S. Patent No. 11,816,910 B2, this claim(s) is/are not patentably distinct from Claim 1 and 6-7 of U.S. Patent No. 11,816,910 B2 because Claims 1 and 6-7 is/are broader than and fully encompassed by Claim 1 and 6-7 of U.S. Patent No. 11,816,910 B2
U.S. Patent No. 11,816,910 B2
Application 18/377,206 (US- 20240029465 A1)
A method comprising using at least one hardware processor to:
until a determination to stop processing is made, for each of a plurality of image frames in a video stream, receive the image frame, generate a text-recognition result from the image frame, wherein the text-recognition result comprises a vector of class estimations for each of one or more characters, combine the text-recognition result with an accumulated text-recognition result, estimate a distance between the accumulated text-recognition result and a next accumulated text-recognition result based on an approximate model of the next accumulated text-recognition result, wherein the distance between the accumulated text-recognition result and the next accumulated text-recognition result is estimated as
PNG
media_image1.png
50
312
media_image1.png
Greyscale
wherein {circumflex over (Δ)}.sub.n is the estimated distance, wherein n is a current number of image frames for which text-recognition results have been combined with the accumulated text-recognition result, wherein δ is an external parameter, wherein S.sub.n is a number of vectors of class estimations in the accumulated text-recognition result, wherein K is a number of classes represented in each vector of class estimations in the accumulated text-recognition result, and wherein Δ.sub.ijk is a contribution to the estimated distance by a class estimation for a k-th class to a j-th component of the accumulated text-recognition result from the vector of class estimations in the text-recognition result generated from an i-th image frame, and determine whether or not to stop the processing based on the estimated distance; and, after stopping the processing, output a character string based on the accumulated text-recognition result.
A method comprising using at least one hardware processor to:
[Claim 6: A system comprising: at least one hardware processor; and one or more software modules that, when executed by the at least one hardware processor ]
[Claim 7: A non-transitory computer-readable medium having instructions stored therein, wherein the instructions, when executed by a processor, cause the processor to:]
until a determination to stop processing is made, for each of a plurality of image frames in a video stream, receive the image frame, generate a text-recognition result from the image frame, wherein the text- recognition result comprises a vector of class estimations for each of one or more characters, combine the text-recognition result with an accumulated text-recognition result, estimate a distance between the accumulated text-recognition result and a future next accumulated text-recognition result, which would be generated from an image frame that has not yet been received, based on an approximate model of the future next accumulated text-recognition result, and determine whether or not to stop the processing based on the estimated distance; and, after stopping the processing, output a character string based on the accumulated text- recognition result.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1-7 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Bulatov et al “On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision mode.”; Bulatov).
The Examiner notes that Bulatov et al “On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision mode.” was published on (07/23/2019) before the effective filing date of the application (07/07/2020) making the “Bulatov et al” NPL prior art under 102(a)(1). The inventorship of the reference NPL is not identical the inventorship of the current application (or a subset of the inventors of the application). It is not clear what inventors (Nikita Razumnyi) contribution to the invention of the reference application is and how that contribution is not part of invention of the current application. Applicant may consider filing an affidavit or declaration of attribution under 37 CFR 1.130a. Also, exception 102(b)(2)C exception is not applicable under 102(a)(1).
Regarding claim 1, Bulatov discloses a method comprising using at least one hardware processor (Abstract: “The paper describes the problem of stopping the text field recognition process in a video stream, which is a novel problem, particularly relevant to real-time mobile document recognition systems.”; 1. Introduction: “The stopping problem is particularly important in relation to real-time computer vision systems working on a mobile device”, the person ordinary skill in the art could understand that “Mobile devices” is interpreted as “ hardware processor”) to:
until a determination to stop processing is made, for each of a plurality of image frames in a video stream, (1. Introduction: “In summary, the goal of this paper is to explore and describe a decision-theoretic framework for finding the optimal stopping rule for the video stream text field recognition process”)
receive the image frame, generate a text-recognition result from the image frame, wherein the text- recognition result comprises a vector of class estimations for each of one or more characters, (2.1. Problem Statement: “Consider the task of text field recognition in a video stream. LetXbe the set of all text field recognition results (i.e. strings over some fixed alphabet), with a defined metric function ρ : X×X → [0,+∞). A text field with correct value X∗ ∈ X is recognized such that a sequence of random recognition results X = (X1, X2, . . .) is observed one result at a time, … Recognition results integrator as a function of several recognition results which produces a single integrated result R : X + → X (here by X + we mean the set of all non-empty sequences of elements from X). At any moment n the observations X1 = x1, . . . , Xn = xn are obtained and the integrated result Rn = R(x1, . . . , xn) can be produced.”) combine the text-recognition result with an accumulated text-recognition result, (4.1 Evaluated integrator and distance metrics: “define a metric ρ and an integrator function R for the set of possible text field recognition results X, i.e. on the set of strings. … ROVER (Recognizer Output Voting Error Reduction) method [16], which is used for merging text field recognition results produced with different recognition algorithms [25,27] and for accumulating text recognition result in a video stream [6]”)
estimate a distance between the accumulated text-recognition result and a future next accumulated text-recognition result, which would be generated from an image frame that has not yet been received, based on an approximate model of the future next accumulated text-recognition result, ( 3.2 Estimation of the expected distance: “an estimation has to be provided for the expected distance between integrated results Δn def= E(ρ(Rn, Rn+1)) given current observations … propose to estimate Δn by modelling the integrated recognition result on the next stage assuming that the new observation will be close to the ones already obtained … estimating the next integrated recognition result (or the expected distance between it and the current result) might depend on the nature of the integrator function R”) and
determine whether or not to stop the processing based on the estimated distance; after stopping the processing, output a character string based on the accumulated text- recognition result. (Figs .4-5; Table 1 and 1.Introduction: “the text field is recognized on multiple frames such that to obtain the recognition result for next frame … continue the recognition process in hope that the result would improve, or to stop the process and output the currently accumulated result.”; 4.3 Evaluation of stopping rules: “a performance profile can be constructed which would graphically show the change of average number of integrated observations and the corresponding average distance from the obtained result (at stopping time) to the correct one, … Finally, the stopping rule NB, constructed in this paper, estimates at stage n the expected distance Δn to the next integrated result and stops when the estimation becomes lower or equal to a constant threshold.”; 6. Conclusion: “proposed an original stopping method which treats the text field recognition in a video stream as a monotone stopping rule problem and which relies on approximating the optimal stopping rule by estimating the distance to the next integrated recognition result.)
Regarding claim 2, Bulatov discloses wherein estimating the distance between the accumulated text-recognition result and the future next accumulated text-recognition result comprises modeling the future next accumulated text-recognition result by using previous text-recognition results as candidates for the future next text-recognition result. (Fig.2; 3.2 Estimation of the expected distance: we propose to estimate Δn by modelling the integrated recognition result on the next stage assuming that the new observation will be close to the ones already obtained:
PNG
media_image2.png
73
345
media_image2.png
Greyscale
equation (19) where δ is an external parameter. In general the selection of the method for estimating the next integrated recognition result (or the expected distance between it and the current result) might depend on the nature of the integrator function R”)
Regarding claim 3, Bulatov discloses wherein estimating the distance between the accumulated text-recognition result and the future next accumulated text-recognition result further comprises, for each of the previous text-recognition results, calculating a distance between the accumulated text-recognition result and a combination of the accumulated text-recognition result with the previous text-recognition result. (Fig.2; Section 3.1: “a fixed X∗ at stage n using triangle inequality, we can obtain the relationship between the distance from the current result to the ideal, the expected distance to the result on next stage and the expected distance from the next result to the ideal: … equaltion 15 … to estimate the expected distance between the current integrated recognition result Rn (which is known at stage n) and the unknown next result Rn+1,”; 3.2 Estimation of the expected distance: we propose to estimate Δn by modelling the integrated recognition result on the next stage assuming that the new observation will be close to the ones already obtained:
PNG
media_image2.png
73
345
media_image2.png
Greyscale
equation (19) where δ is an external parameter. In general the selection of the method for estimating the next integrated recognition result (or the expected distance between it and the current result) might depend on the nature of the integrator function R”, it shows that the “current integrated recognition result” is interpreted as “previous text-recognition results”)
Regarding claim 4, Bulatov discloses calculating a distance between the accumulated text-recognition result and the combination of the accumulated text-recognition result with the previous text-recognition result comprises aligning the accumulated text-recognition result with the previous text-recognition result based on a previous alignment of the accumulated text- recognition result with the previous text-recognition result. (Fig.2; 3.2 Estimation of the expected distance: we propose to estimate Δn by modelling the integrated recognition result on the next stage assuming that the new observation will be close to the ones already obtained:
PNG
media_image2.png
73
345
media_image2.png
Greyscale
equation (19) where δ is an external parameter. In general the selection of the method for estimating the next integrated recognition result (or the expected distance between it and the current result) might depend on the nature of the integrator function R”, it shows that the “current integrated recognition result” is interpreted as “previous text-recognition results”; 4.1 Evaluated integrator and distancemetrics: “In order to apply the model presented in Sect. 3, we have to define a metric ρ and an integrator function R for the set of possible text field recognition results X, …The algorithm consists of two modules: the alignment module performs an optimal wrapping of each incoming string to a word transition network, then the voting module selects the best result by traversing the net-work and choosing the best result on each stage. )
Regarding claim 5, Bulatov discloses the at least one hardware processor is comprised in a mobile device, and wherein the image frames are received in real time or near-real time as the image frames are captured by a camera of the mobile device. (Abstract: “The paper describes the problem of stopping the text field recognition process in a video stream, which is a novel problem, particularly relevant to real-time mobile document recognition systems.”; 1. Introduction: “Modern mobile devices are equipped with high-quality cameras and decent computing power which allows them to be used for camera-based document analysis tasks.”)
Regarding claim 6, Bulatov discloses A system comprising: at least one hardware processor; and one or more software modules that, when executed by the at least one hardware processor, (Abstract: “The paper describes the problem of stopping the text field recognition process in a video stream, which is a novel problem, particularly relevant to real-time mobile document recognition systems.”; 1. Introduction: “The stopping problem is particularly important in relation to real-time computer vision systems working on a mobile device”, the person ordinary skill in the art could understand that “Mobile devices” is including “a hardware processor and memory”)
until a determination to stop processing is made, for each of a plurality of image frames in a video stream, (1. Introduction: “In summary, the goal of this paper is to explore and describe a decision-theoretic framework for finding the optimal stopping rule for the video stream text field recognition process”)
receive the image frame, generate a text-recognition result from the image frame, wherein the text- recognition result comprises a vector of class estimations for each of one or more characters, (2.1. Problem Statement: “Consider the task of text field recognition in a video stream. LetXbe the set of all text field recognition results (i.e. strings over some fixed alphabet), with a defined metric function ρ : X×X → [0,+∞). A text field with correct value X∗ ∈ X is recognized such that a sequence of random recognition results X = (X1, X2, . . .) is observed one result at a time, … Recognition results integrator as a function of several recognition results which produces a single integrated result R : X + → X (here by X + we mean the set of all non-empty sequences of elements from X). At any moment n the observations X1 = x1, . . . , Xn = xn are obtained and the integrated result Rn = R(x1, . . . , xn) can be produced.”) combine the text-recognition result with an accumulated text-recognition result, (4.1 Evaluated integrator and distance metrics: “define a metric ρ and an integrator function R for the set of possible text field recognition results X, i.e. on the set of strings. … ROVER (Recognizer Output Voting Error Reduction) method [16], which is used for merging text field recognition results produced with different recognition algorithms [25,27] and for accumulating text recognition result in a video stream [6]”)
estimate a distance between the accumulated text-recognition result and a future next accumulated text-recognition result, which would be generated from an image frame that has not yet been received, based on an approximate model of the future next accumulated text-recognition result, ( 3.2 Estimation of the expected distance: “an estimation has to be provided for the expected distance between integrated results Δn def= E(ρ(Rn, Rn+1)) given current observations … propose to estimate Δn by modelling the integrated recognition result on the next stage assuming that the new observation will be close to the ones already obtained … estimating the next integrated recognition result (or the expected distance between it and the current result) might depend on the nature of the integrator function R”) and
determine whether or not to stop the processing based on the estimated distance; after stopping the processing, output a character string based on the accumulated text- recognition result. (Figs .4-5; Table 1 and 1.Introduction: “the text field is recognized on multiple frames such that to obtain the recognition result for next frame … continue the recognition process in hope that the result would improve, or to stop the process and output the currently accumulated result.”; 4.3 Evaluation of stopping rules: “a performance profile can be constructed which would graphically show the change of average number of integrated observations and the corresponding average distance from the obtained result (at stopping time) to the correct one, … Finally, the stopping rule NB, constructed in this paper, estimates at stage n the expected distance Δn to the next integrated result and stops when the estimation becomes lower or equal to a constant threshold.”; 6. Conclusion: “proposed an original stopping method which treats the text field recognition in a video stream as a monotone stopping rule problem and which relies on approximating the optimal stopping rule by estimating the distance to the next integrated recognition result.)
Regarding claim 7, Bulatov discloses A non-transitory computer-readable medium having instructions stored therein, wherein the instructions, when executed by a processor, (Abstract: “The paper describes the problem of stopping the text field recognition process in a video stream, which is a novel problem, particularly relevant to real-time mobile document recognition systems.”; 1. Introduction: “The stopping problem is particularly important in relation to real-time computer vision systems working on a mobile device”, the person ordinary skill in the art could understand that “Mobile devices” is including “a hardware processor and memory”) cause the processor to:
until a determination to stop processing is made, for each of a plurality of image frames in a video stream, (1. Introduction: “In summary, the goal of this paper is to explore and describe a decision-theoretic framework for finding the optimal stopping rule for the video stream text field recognition process”)
receive the image frame, generate a text-recognition result from the image frame, wherein the text- recognition result comprises a vector of class estimations for each of one or more characters, (2.1. Problem Statement: “Consider the task of text field recognition in a video stream. LetXbe the set of all text field recognition results (i.e. strings over some fixed alphabet), with a defined metric function ρ : X×X → [0,+∞). A text field with correct value X∗ ∈ X is recognized such that a sequence of random recognition results X = (X1, X2, . . .) is observed one result at a time, … Recognition results integrator as a function of several recognition results which produces a single integrated result R : X + → X (here by X + we mean the set of all non-empty sequences of elements from X). At any moment n the observations X1 = x1, . . . , Xn = xn are obtained and the integrated result Rn = R(x1, . . . , xn) can be produced.”) combine the text-recognition result with an accumulated text-recognition result, (4.1 Evaluated integrator and distance metrics: “define a metric ρ and an integrator function R for the set of possible text field recognition results X, i.e. on the set of strings. … ROVER (Recognizer Output Voting Error Reduction) method [16], which is used for merging text field recognition results produced with different recognition algorithms [25,27] and for accumulating text recognition result in a video stream [6]”)
estimate a distance between the accumulated text-recognition result and a future next accumulated text-recognition result, which would be generated from an image frame that has not yet been received, based on an approximate model of the future next accumulated text-recognition result, ( 3.2 Estimation of the expected distance: “an estimation has to be provided for the expected distance between integrated results Δn def= E(ρ(Rn, Rn+1)) given current observations … propose to estimate Δn by modelling the integrated recognition result on the next stage assuming that the new observation will be close to the ones already obtained … estimating the next integrated recognition result (or the expected distance between it and the current result) might depend on the nature of the integrator function R”) and
determine whether or not to stop the processing based on the estimated distance; after stopping the processing, output a character string based on the accumulated text- recognition result. (Figs .4-5; Table 1 and 1.Introduction: “the text field is recognized on multiple frames such that to obtain the recognition result for next frame … continue the recognition process in hope that the result would improve, or to stop the process and output the currently accumulated result.”; 4.3 Evaluation of stopping rules: “a performance profile can be constructed which would graphically show the change of average number of integrated observations and the corresponding average distance from the obtained result (at stopping time) to the correct one, … Finally, the stopping rule NB, constructed in this paper, estimates at stage n the expected distance Δn to the next integrated result and stops when the estimation becomes lower or equal to a constant threshold.”; 6. Conclusion: “proposed an original stopping method which treats the text field recognition in a video stream as a monotone stopping rule problem and which relies on approximating the optimal stopping rule by estimating the distance to the next integrated recognition result.)
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Prasad et al (U.S. 20100246961 A1), “MULTI-FRAME VIDEOTEXT RECOGNITION”, teaches about exploited to mitigate challenges posed by varying characteristics of videotext across frame instances to improve OCR techniques of text in video streams, for examples, as measured by a word error rate (WER).
Isaev (U.S. 20170116494 A1), “VIDEO CAPTURE IN DATA CAPTURE SCENARIO”, teaches about computer systems, and more particularly, to facilitating data capture in video streams. It also teaches about lower cost alternative platform to capture data from physical documents using mobile devices (e.g., smart phones, tablet computers, etc.). Data may be captured from data fields on physical documents (forms, questionnaires, financial documents, etc.) using mobile devices with built-in cameras, processed using OCR, and either stored locally or sent to remote databases all within an application executing on the mobile device.
Gokturk et al (U.S. 20060251339 A1), “System And Method For Enabling The Use Of Captured Images Through Recognition”, teaches about a system and method for enabling the use of captured images. It also teaches about the programmatic of digitally captured images using, among other advancements, image recognition. Image files for data and information that enables, among other features, the indexing of the contents of images based on analysis of the images. Additionally, images may be made searchable based on recognition information of objects contained in the images.
Wang et al (U.S. 20010012400 A1), “PAGE ANALYSIS SYSTEM”, teaches about a page analysis system for analyzing image data of a document page by utilizing a block selection technique, and particularly to such a system in which blocks of image data are classified based on characteristics of the image data. For example, blocks of image data may be classified as text data, titles, half-tone image data, line drawings, tables, vertical lines or horizontal lines.
Lin et al (U.S. 20150254507 A1), “Image-Based Character Recognition”, teaches about Systems and methods in accordance with various embodiments of the present disclosure may overcome one or more of the aforementioned and other deficiencies experienced in conventional approaches to recognizing text in an image. It also teaches about various approaches enable a device to perform tasks such as processing an image to recognize and locate text in the image, and providing the recognized text an application executing on the device for performing a function (e.g., calling a number, opening an internet browser, etc.) associated with the recognized text.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Duy A Tran whose telephone number is (571)272-4887. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ONEAL R MISTRY can be reached at (313)-446-4912. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/DUY TRAN/Examiner, Art Unit 2674
/ONEAL R MISTRY/Supervisory Patent Examiner, Art Unit 2674