Last updated: May 29, 2026
Application No. 18/600,621
TEXT RECOGNITION METHOD AND DEVICE, AND ELECTRONIC DEVICE

Non-Final OA §103§112
Filed
Mar 08, 2024
Priority
Mar 10, 2023 — CN 202310259454.X
Examiner
HAUSMANN, MICHELLE M
Art Unit
2671
Tech Center
2600 — Communications
Assignee
Lenovo (Beijing) Limited
OA Round
1 (Non-Final)
Interview Optional

— +21.3% interview lift. Examiner has a relatively high allowance rate (76%); +21.3% interview lift. A written response may suffice.
Based on 870 resolved cases, 2023–2026
Examiner Intelligence

HAUSMANN, MICHELLE M View full profile →
Grants 76% — above average
Career Allowance Rate
663 granted / 870 resolved
+14.2% vs TC avg
Strong +21% interview lift
Without
With
+21.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
22 currently pending
Career history
895
Total Applications
across all art units
Statute-Specific Performance

§101
1.1%
-38.9% vs TC avg
§103
94.8%
+54.8% vs TC avg
§102
0.6%
-39.4% vs TC avg
§112
0.9%
-39.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 870 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a text image acquisition module, the text image acquisition module being configured to… a contextual information acquisition module, the contextual information acquisition module being configured to… a text filtering condition acquisition module, the text filtering condition acquisition module being configured to… a text recognition model, the text recognition model being used to… a to-be-recognized text acquisition module, the to-be-recognized text acquisition module being configured to… an output module, the output module being configured to…” in claim 9.
Claim limitations “a text image acquisition module, the text image acquisition module being configured to… a contextual information acquisition module, the contextual information acquisition module being configured to… a text filtering condition acquisition module, the text filtering condition acquisition module being configured to… a text recognition model, the text recognition model being used to… a to-be-recognized text acquisition module, the to-be-recognized text acquisition module being configured to… an output module, the output module being configured to…” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph (see 35 USC 112b rejection below for a full description of the relevant portion of the specification).
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 9 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “a text image acquisition module, the text image acquisition module being configured to… a contextual information acquisition module, the contextual information acquisition module being configured to… a text filtering condition acquisition module, the text filtering condition acquisition module being configured to… a text recognition model, the text recognition model being used to… a to-be-recognized text acquisition module, the to-be-recognized text acquisition module being configured to… an output module, the output module being configured to…” in claim 9. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph. The publication of the specification indicates: “It should be noted that the various modules, units, etc. in the above device embodiments can be stored in the memory of the electronic device as program modules, and the processor of the electronic device can execute the program modules stored in the memory to implement the corresponding functions. For the function implemented by each program module and its combination, as well as the technical effects achieved, reference can be made to the description of the corresponding parts of the foregoing method embodiments, which will not be repeated here” ([0122]). Therefore the steps are implemented as pure software, and have no specific hardware to carry out the functions of the claims.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4, 7, 9, 10, 13, and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hong et al. (Machine Translation: CN111669312A).

Regarding claims 1, 9, and 10, Hong et al. disclose a text recognition method comprising, a text recognition device comprising, and electronic device comprising: a communication device; an output device; a storage device to store a program for implementing a text recognition method; and a processing device, the processing device being configured to load and execute the program stored in the storage device to implement the text recognition method (As shown in FIG10, the electronic device 30 may include: a processor 300, a memory 301, a bus 302 and a communication interface 303. The processor 300, the communication interface 303 and the memory 301 are connected through the bus 302. The memory 301 stores a computer program that can run on the processor 300. When the processor 300 runs the computer program, it executes the message interaction method provided in any of the foregoing embodiments of this application, [0228]), the text recognition method includes: obtaining a text image and contextual information of an interactive environment in which the electronic device is currently located, the text image being obtained by collecting images of a to-be-recognized text (receiving a target image sent by a first terminal, [0019], [0040], The first terminal and the second terminal can be any electronic device with information interaction and display functions, including but not limited to mobile phones, tablets, [0041], “Based on the type of the target image, determine the recognition template corresponding to the type”, [0050], “For example, in some implementations, the target image can be a document, such as an ID card, bank card, or express delivery slip. Correspondingly, the recognition template can be a document recognition template, which records the location information (i.e., valid text location information) of valid text in the corresponding document image. Recognizing the valid text information in the target image based on the recognition template may include”, [0053], the type of the target image can also be automatically identified by a second terminal or social application, the type of the target image can be determined based on the texture, color, background, feature image, and other feature information in the target image, [0093]); obtaining a text filtering condition for the to-be-recognized text based on the contextual information (Identify the valid text information in the target image, [0044], The text information contained in the target image can be divided into valid text information and invalid text information, “The valid text information can refer to text information that meets the user's identification or usage requirements, while text information that does not meet the user's identification or usage requirements can be called invalid text information. For example, in some instances, watermarks and background text in images can be regarded as invalid text information”, [0046], “It should be noted that in practical applications, the above-mentioned valid and invalid text information can be defined and distinguished according to actual needs, and the specific content of this application embodiment is not limited”, [0047], In implementation, recognition conditions can be set according to user identification and usage needs, and then valid text information that meets user identification and usage needs can be identified based on these conditions, [0048], “Based on the type of the target image, determine the recognition template corresponding to the type”, [0050], The aforementioned recognition template can be regarded as a concretization of recognition conditions. It may include the location or region information of valid text information in the image of this type, or it may include the specified semantic conditions or keyword matching conditions corresponding to the image of this type, [0052]); performing text recognition on the text image to obtain a corresponding text recognition result (recognition of text information in the image can be achieved using existing OCR (Optical Character Recognition) algorithms, [0045], Optical character recognition is performed on each of the at least one valid text block to obtain at least one valid text string, [0056]); obtaining the to-be-recognized text included in the text image based on the text filtering condition and the text recognition result (“The message interaction method provided in the first aspect of this application, after receiving a target image sent by a first terminal and displaying the target image in the message interaction interface, can further identify valid text information in the target image, and then display the valid text information in text format in the message interaction interface, and/or copy the valid text information to the clipboard, thereby facilitating direct copying and /or pasting by the user without requiring the user to view and enter each character. This allows for convenient and efficient extraction of valid text information, and is particularly suitable for quickly and conveniently obtaining valid text information such as bank card numbers and ID card numbers in instant messaging applications”, [0019], After receiving the target image sent by the first terminal, the second terminal first needs to display the target image in the message interaction interface. Since text recognition technology still has a certain error rate, displaying the target image allows users to easily compare the subsequently recognized valid text information with the information in the target image, so as to correct errors in a timely manner and ensure the correctness of the output valid text information, [0043], “For example, it can be to first identify all text information in the target image, and then filter out valid text information that meets the specified semantic conditions or keyword matching conditions (specified semantic conditions or keyword matching conditions are a type of recognition condition) from all text information; or it can be to first determine the recognition area or position containing valid text information in the target image (the recognition area or position is also a type of recognition condition), and then identify the valid text information in a targeted manner based on the recognition area; this application embodiment does not limit its specific implementation method”, [0048], Based on the recognition template, identify the valid text information in the target image, [0051], Based on the type of the target image, identify and extract the document image from the target image, [0054], Based on the valid text location information recorded in the document recognition template, at least one valid text block is extracted from the document image, [0055], The at least one valid text string is concatenated to obtain the valid text information in the target image, [0057], Then, the results of each attribute recognition, i.e., the valid text string, and the attribute name are concatenated together to obtain the extracted valid text information, [0069], The third method is to simultaneously display the valid text information in text format on the message interaction interface and copy the valid text information to the clipboard, [0098]); and outputting the to-be-recognized text (display the valid text information in text format, [0019], The third method is to simultaneously display the valid text information in text format on the message interaction interface and copy the valid text information to the clipboard. The purpose and effect of this is that users can either directly paste the valid text information to where they need it, or edit the valid text information before copying and pasting it to where they need it, thereby meeting the diverse usage needs of users, [0098], “The valid text information is displayed in text format in the display box of the message interaction interface, wherein the display position of the valid text information is adjacent to the display position of the target image”, [0105]). With respect to claim 9 in particular, modules are described (The steps of the method disclosed in the embodiments of this application can be directly reflected as being executed by a hardware decoding processor, or being executed by a combination of hardware and software modules in the decoding processor, [0232]), a text recognition model (This step can be implemented using any text recognition technology provided by existing technology. For example, the recognition of text information in the image can be achieved using existing OCR (Optical Character Recognition) algorithms, engines, scripts or plugins, [0045]), and output module which is interpreted as a display is also described (electronic device with information interaction and display functions, including but not limited to mobile phones, tablets, laptops, desktop computers, virtual reality devices, augmented reality devices, etc., [0041]). 

The above sections are cited from different embodiments of the invention. It would have been obvious at the time of filing to one of ordinary skill in the art to combine the embodiments as this was known at the time of filing, the combinations would have predictable results, and as each embodiment has particular beneficial effects: “The purpose and effect of doing so is to make it convenient for users to directly paste the valid text information to the place where it is needed” ([0097]) “The purpose and effect of this is that users can either directly paste the valid text information to where they need it, or edit the valid text information before copying and pasting it to where they need it, thereby meeting the diverse usage needs of users.” ([0098]) “Of course, only standard format images, such as ID card images, can be identified using
recognition templates to improve recognition efficiency and make the output effective text
information easier to read. For non-standard format images, text recognition can be performed on the entire image to extract effective text information, or the user can select a certain area and then identify the effective text information within that area” ([0103]) “This implementation method can promptly notify users that the social application has the function of extracting valid text information from images, making it convenient for users to use this function and thus improving the user experience” ([0163]) “This embodiment of the application does not limit this. By automatically displaying a third prompt message after the target image is detected as being added, the user can be proactively prompted to recognize valid text information, which can improve both operational efficiency and user experience” ([0168]), allowing the customization of the invention according to the preferred application.

Regarding claim 4 and 13, Hong et al. disclose the method and device of claims 1 and 10. Hong et al. further indicate obtaining the to-be-recognized text included in the text image based on the text filtering condition and the contextual information includes: comparing a plurality of texts included in the text recognition result with the text filtering condition respectively, and determining a plurality of candidate texts included in the text image that meet the text filtering condition (Identify the valid text information in the target image, [0044], The text information contained in the target image can be divided into valid text information and invalid text information, “The valid text information can refer to text information that meets the user's identification or usage requirements, while text information that does not meet the user's identification or usage requirements can be called invalid text information. For example, in some instances, watermarks and background text in images can be regarded as invalid text information”, [0046], “It should be noted that in practical applications, the above-mentioned valid and invalid text information can be defined and distinguished according to actual needs, and the specific content of this application embodiment is not limited”, [0047], In implementation, recognition conditions can be set according to user identification and usage needs, and then valid text information that meets user identification and usage needs can be identified based on these conditions, [0048], “Based on the type of the target image, determine the recognition template corresponding to the type”, [0050], The aforementioned recognition template can be regarded as a concretization of recognition conditions. It may include the location or region information of valid text information in the image of this type, or it may include the specified semantic conditions or keyword matching conditions corresponding to the image of this type, [0052]); outputting the plurality of candidate texts (compare the subsequently recognized valid text information, [0043], pre-set corresponding recognition templates for different types of images so that they can directly recognize the valid text information in the target image based on the recognition template, [0052], display the valid text information, [0098]); and in response to a selection operation on the plurality of candidate texts, obtaining a selected to-be-recognized text (correct errors in a timely manner and ensure the correctness of the output valid text information, [0043], The third method is to simultaneously display the valid text information in text format on the message interaction interface and copy the valid text information to the clipboard. The purpose and effect of this is that users can either directly paste the valid text information to where they need it, or edit the valid text information before copying and pasting it to where they need it, thereby meeting the diverse usage needs of users, [0098], The picture type selection control can have a next-level selection menu, allowing the user to select the front or back of the ID card, [0190]).

Regarding claims 7 and 16, Hong et al. disclose the method and device of claims 1 and 10. Hong et al. further indicate obtaining the text filtering condition for the to-be-recognized text includes: extracting keywords from the contextual information to obtain at least one keyword in the interactive environment; and using the at least one keyword to obtain the text filtering condition for the to-be-recognized text (For example, it can be to first identify all text information in the target image, and then filter out valid text information that meets the specified semantic conditions or keyword matching conditions (specified semantic conditions or keyword matching conditions are a type of recognition condition) from all text information; or it can be to first determine the recognition area or position containing valid text information in the target image (the recognition area or position is also a type of recognition condition), and then identify the valid text information in a targeted manner based on the recognition area, [0048], The aforementioned recognition template can be regarded as a concretization of recognition conditions. It may include the location or region information of valid text information in the image of this type, or it may include the specified semantic conditions or keyword matching conditions corresponding to the image of this type, [0052], For example, in some implementations, the target image can be a document, such as an ID card, bank card, or express delivery slip. Correspondingly, the recognition template can be a document recognition template, which records the location information (i.e., valid text location information) of valid text in the corresponding document image. Recognizing the valid text information in the target image based on the recognition template may include, [0053]).

Claim(s) 2-3 and 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hong et al. (Machine Translation: CN111669312A) as applied to claim 1 above, further in view of Parker et al. (US 12475699 B1).

Regarding claims 2 and 11, Hong et al. disclose the method and device of claims 1 and 10. Hong et al. further indicate displaying the to-be-recognized text in a text input area in the interactive environment (The message interaction method provided in the first aspect of this application, after receiving a target image sent by a first terminal and displaying the target image in the message interaction interface, can further identify valid text information in the target image, and then display the valid text information in text format in the message interaction interface, [0019], user's selection operation of the image type selection control, [0090], valid text information is displayed in text format in the input box, [0195]) and outputting a text recognition window in the display area of the text image and displaying the to-be-recognized text in the text recognition window (The valid text information is displayed in text format in the display box of the message interaction interface, wherein the display position of the valid text information is adjacent to the display position of the target image, [0105], The target image sent from the first terminal is displayed in the display frame on the second terminal. Through this embodiment, valid text information can be displayed in a position adjacent to the display position of the target image, [0106]).

Hong et al. do not disclose outputting the to-be-recognized text includes at least one of: enlarging a to-be-recognized text area and displaying the enlarged to-be-recognized text in a display area of the text image.

Parker et al. teach outputting the to-be-recognized text includes at least one of: enlarging a to-be-recognized text area and displaying the enlarged to-be-recognized text in a display area of the text image (“In some examples, the image(s) 142 can be presented “directly adjacent” to the field(s) 144 populated with the text element(s) such that no user interface elements are presented between the image 142 and the field 144. In some examples, the text element in the field 144 is larger than other text elements in the user interface 200D, 200E, so as to improve the readability of the text element, to highlight or feature the text element, and/or to otherwise make the text element salient in the user interface 200D, 200E. In some examples, this may involve enlarging a size of the text element and/or enlarging a size of the field 144 in the user interface 200D, 200E, such as by “magnifying” the text element and/or the field 144”, col. 26, lines 13-64) adjusting a display state of the to-be-recognized text in the text image (cause the image(s) 142 to be presented at block 322 (e.g., as a pop-up, a modal, etc.) adjacent to the corresponding field 144, col. 26, lines 13-64); and outputting a text recognition window in the display area of the text image and displaying the to-be-recognized text in the text recognition window (“For example, the document recognizer component 130 and/or a particular document type-specific recognizer 140 may have determined coordinates (e.g., pixel coordinates) of a bounding box around the recognized text element “$2,543.00” in the document 118 (e.g., the 1099-INT form), and the bounding box coordinates may have been utilized to generate the first image 142(1) of the first portion of the document 118 that include the recognized text element “$2,543.00,” such as by “zooming in” on the coordinates of the bounding box. In this example, the data validation component 132 may have retrieved the first image 142(1) from the data store 114 and caused the first image 142(1) to be presented via the user interface 200D in response to an interaction with the first interactive element 146(1). In this manner, as the user 102 reviews the text elements that have been populated in the fields 144 of the electronic form 212, the user 102 is able to quickly validate the text elements (e.g., by confirming that the text elements correspond to the source text in the images 142) without averting their gaze from the user interface 200D to validate that the text that has been recognized from the document image data 116 representing the document 118. Even in the case where the user 102 manually entered the text into the fields 144, the availability of the images 142 adjacent to the respective fields is a convenient way of validating that the user 102 has entered the text correctly”, col. 19, line 50 - col. 20, line 17).

The limitation “at least one of: enlarging a to-be-recognized text area and displaying the enlarged to-be-recognized text in a display area of the text image; outputting a text recognition window in the display area of the text image and displaying the to-be-recognized text in the text recognition window; displaying the to-be-recognized text in a text input area in the interactive environment; and adjusting a display state of the to-be-recognized text in the text image” is interpreted in the conjunctive, in accordance with SuperguideCorp. v. DirecTV Enter., Inc., 358 F.3d 870, 875 (Fed. Cir. 2004) in which the Federal Circuit held that the plain meaning of “at least one of A, B, and C” means: at least one A, at least one of B and at least one of C.

Hong et al. and Parker et al. are in the same art of OCR (Hong et al., [0045]; Parker et al., col. 24, line 40 - col. 25, line 2). The combination of Parker et al. with Hong et al. will enable enlarging a portion of the image. It would have been obvious at the time of filing the combine the enlarging of Parker et al. with the invention of Hong et al. as this was known at the time of filing, the combination would have predictable results, and as Parker et al. state “This “side-by-side” data validation feature mitigates the so-called “tennis spectator problem” where a user is forced to avert their eyes from a computer screen in order to look at the text of a paper copy of a document, and subsequently redirect their eyes back to the computer screen in order to validate that the text of the paper copy of the document has been entered correctly into a field(s) (e.g., a field(s) of an electronic form) that is being displayed on the computer screen. In other words, by causing display of a user interface that presents a digital image of a relevant portion of a document adjacent to a corresponding field where text from that portion has been recognized and automatically populated into the corresponding field, a user is able to quickly and easily confirm that the text has been recognized and entered into the field correctly” (col. 2, line 25 - col. 3, line 5 ) and “In some examples, the text element in the field 144 is larger than other text elements in the user interface 200D, 200E, so as to improve the readability of the text element, to highlight or feature the text element, and/or to otherwise make the text element salient in the user interface 200D, 200E. In some examples, this may involve enlarging a size of the text element and/or enlarging a size of the field 144 in the user interface 200D, 200E, such as by “magnifying” the text element and/or the field 144” (col. 26, lines 13-64), indicating an improvement to ease of use in commercial applications for customers.

Regarding claims 3 and 12, Hong et al. and Parker et al. disclose the method and device of claims 2 and 11. Hong et al. further indicate displaying the to-be-recognized text in the text input area in the interactive environment includes: writing the obtained to-be-recognized text into the text input area in the interactive environment and displaying a to-be-recognized file in the text input area; or, outputting copy prompt information for a to-be-recognized file (The preset trigger operation can be a long press, double-click, drag, etc., which can be flexibly set by those skilled in the art according to actual needs. This application embodiment does not limit it. The system displays the first prompt message based on the user's preset trigger operation. It can trigger the step of recognizing valid text information in the target image through humancomputer interaction according to the user's actual needs, which has high operability and meets the diverse needs of users, [0115], Therefore, it can be determined that the message interaction context related to the target image meets the preset prompt conditions, thereby triggering the display of the first prompt information, [0116]); in response to an input triggering operation on the text input area in the interactive environment, writing the copied to-be-recognized file into the text input area (user B long-presses the bank card image, and a first prompt message pops up to ask user A whether to perform valid text recognition on the target image. After user A confirms that recognition is needed, a further image type selection control pops up. If user A selects the image type as bank card, the recognition of valid text information in the bank card image is triggered, and then displayed in text format next to the bank card image. The user is prompted that double-clicking can edit and long-pressing can copy, so that the user can perform editing, copying and pasting operations, [0118], The text patch extraction subunit is used to extract at least one valid text patch from the document image based on the valid text location information recorded in the document recognition template, [0141], The system displays a third prompt message based on the user's preset trigger operation. It can trigger the step of recognizing valid text information in the target image through humancomputer interaction according to the user's actual needs, which has high operability and meets the diverse needs of users, [0167]), and displaying the to-be-recognized text in the text input area (display the valid text information in text format, [0019], The third method is to simultaneously display the valid text information in text format on the message interaction interface and copy the valid text information to the clipboard. The purpose and effect of this is that users can either directly paste the valid text information to where they need it, or edit the valid text information before copying and pasting it to where they need it, thereby meeting the diverse usage needs of users, [0098], “The valid text information is displayed in text format in the display box of the message interaction interface, wherein the display position of the valid text information is adjacent to the display position of the target image”, [0105]).

Claim(s) 5-6 and 14-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hong et al. (Machine Translation: CN111669312A) as applied to claims 1 and 10 above, further in view of Nepomniachtchi et al. (US 20190278986 A1).

Regarding claims 5 and 14, Hong et al. disclose the method and device of claims 1 and 10. Hong et al. do not disclose obtaining the text filtering condition for the to-be-recognized text includes: analyzing the contextual information to obtain at least one prediction content for the to-be-recognized text, and a confidence level for each prediction content; determining a first prediction content in the at least one prediction content whose confidence level is greater than a preset threshold; and using the first prediction content to obtain the text filtering condition for the to-be-recognized text.

Nepomniachtchi et al. teach obtaining the text filtering condition for the to-be-recognized text includes: analyzing the contextual information to obtain at least one prediction content for the to-be-recognized text, and a confidence level for each prediction content (A matching algorithm is executed on the bi-tonal image of the document in an attempt to find a matching template (step 4210). According to an embodiment, one or more computing devices can include a template data store that can be used to store templates of the layouts of various types of documents. Various matching techniques can be used to match a template to a document image, the positions of frames/boxes found on image and/or other such landmarks, can be cross-correlated with landmark information associated a template to compute the matching confidence score, [0383]); determining a first prediction content in the at least one prediction content whose confidence level is greater than a preset threshold (If the confidence score exceeds a predetermined threshold, [0383]); and using the first prediction content to obtain the text filtering condition for the to-be-recognized text (If the confidence score exceeds a predetermined threshold, the template is considered to be a match and can be selected for use in extracting information from the mobile image, [0383], If a matching template is found, data can be extracted from the image of the document using the template (step 4220). The template can provide the location of various data within the document, such as the document's author(s), the document's publication date, the names of any corporate, governmental, or educational entities associated with the document, an amount due, an account holder name, an account number, a payment due date, etc. In some embodiments, various OCR techniques can be used to read text content from the locations specified by the template, [0385]).

The limitation “analyzing the contextual information to obtain at least one prediction content for the to-be-recognized text, and a confidence level for each prediction content” is interpreted in the conjunctive, in accordance with SuperguideCorp. v. DirecTV Enter., Inc., 358 F.3d 870, 875 (Fed. Cir. 2004) in which the Federal Circuit held that the plain meaning of “at least one of A, B, and C” means: at least one A, at least one of B and at least one of C.

Hong et al. and Nepomniachtchi et al. are in the same art of OCR (Hong et al., [0045]; Nepomniachtchi et al., [0385]). The combination of Nepomniachtchi et al. with Hong et al. will enable using a confidence level for each prediction content. It would have been obvious at the time of filing the combine the confidence of Nepomniachtchi et al. with the invention of Hong et al. as this was known at the time of filing, the combination would have predictable results, and as Nepomniachtchi et al. indicate “Since the location of various data elements is known, ambiguities regarding the type of data found can be eliminated. That is, use of the template enables the system to distinguish among data elements which have a similar data type,” ([0385]) thereby increasing the accuracy of the combination of inventions.

Regarding claims 6 and 15, Hong et al. and Nepomniachtchi et al. disclose the method and device of claims 5 and 14. Nepomniachtchi et al. further teach analyzing the contextual information to obtain at least one prediction content for the to-be-recognized text, and the confidence level for each prediction content includes: obtaining a text recognition prediction model; and processing the contextual information to obtain the at least one prediction content for the to-be-recognized text, and the confidence level of each prediction content based on the text recognition prediction model (“A matching algorithm is executed on the bi-tonal image of the document in an attempt to find a matching template (step 4210). According to an embodiment, one or more computing devices can include a template data store that can be used to store templates of the layouts of various types of documents. Various matching techniques can be used to match a template to a document image. For example, optical character recognition can be used to identify and read text content from the image. The types of data identified and the positions of the data on the document can be used to identify a matching template. According to another embodiment, a document can include a unique symbol or identifier that can be matched to a particular document template. In yet other embodiments, the image of the document can be processed to identify “landmarks” on the image that may correspond to labels and/or data. In some embodiments, these landmarks can include, but are not limited to: positions of horizontal and/or vertical lines on the document, the position and/or size of boxes and/or frames on the document, and/or the location of pre-printed text. The position of these landmarks on the document may be used to identify a template from the plurality of templates in the template data store. According to some embodiments, a cross-correlation matching technique can be used to match a template to an image of a document. In some embodiments, the positions of frames/boxes found on image and/or other such landmarks, can be cross-correlated with landmark information associated a template to compute the matching confidence score. If the confidence score exceeds a predetermined threshold, the template is considered to be a match and can be selected for use in extracting information from the mobile image”, [0383], If a matching template is found, data can be extracted from the image of the document using the template (step 4220). The template can provide the location of various data within the document, such as the document's author(s), the document's publication date, the names of any corporate, governmental, or educational entities associated with the document, an amount due, an account holder name, an account number, a payment due date, etc. In some embodiments, various OCR techniques can be used to read text content from the locations specified by the template, [0385]).

Claim(s) 8 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hong et al. (Machine Translation: CN111669312A) and Nepomniachtchi et al. (US 20190278986 A1) as applied to claims 5 and 14 above, further in view of Zalmanson et al. (US 20240005084 A1).

Regarding claims 8 and 17, Hong et al. and Nepomniachtchi et al. disclose the method and device of claims 5 and 14. Hong et al. and Nepomniachtchi et al. do not explicitly disclose determining that the confidence level of the at least one prediction content is less than or equal to the preset threshold, and performing text recognition on the text image to obtain the corresponding text recognition result; and obtaining the to-be-recognized text based on the text recognition result.

Zalmanson et al. teach determining that the confidence level of the at least one prediction content is less than or equal to the preset threshold, and performing text recognition on the text image to obtain the corresponding text recognition result; and obtaining the to-be-recognized text based on the text recognition result (In some embodiments, techniques described herein may be utilized in conjunction with other technical processes such as optical character recognition (OCR). For example, if OCR is used to extract items from an image of a document, the items in the document may also be predicted using the dynamic, iterative machine learning process described herein based on historical documents of the type for which the items are known, and the predicted items may be used to enhance the OCR-based item extraction process. For instance, if the OCR data (e.g., data extracted using OCR) for a given item matches a predicted item, then the confidence for that extraction may be increased, and the OCR data may be used. If OCR data for a given item is close to the predicted item but off by a small amount (e.g., fewer than a threshold number of characters are different), then the predicted item data may be used instead of the OCR data. If the OCR data is significantly different than a predicted item (e.g., more than a threshold number of characters are different), then additional processing may be performed to confirm that the OCR data is accurate, such as comparing the OCR data to other predicted items for the document, re-performing OCR with different parameters, prompting the user to capture a new image of the document, presenting the OCR data to the user for manual confirmation, and/or the like, [0023]).

Hong et al. and Zalmanson et al. are in the same art of OCR (Hong et al., [0045]; Zalmanson et al., [0023]). The combination of Zalmanson et al. with Hong et al. and Nepomniachtchi et al. will enable using a less than threshold confidence. It would have been obvious at the time of filing to add the condition of Zalmanson et al. to the invention of Hong et al. and Nepomniachtchi et al. as this was known at the time of filing, the combination would have predictable results, and as Zalmanson et al. indicate “Embodiments of the present disclosure provide multiple improvements over conventional techniques for electronic document creation. For example, by utilizing the dynamic, iterative machine learning techniques described herein to recommend items for inclusion in a document, embodiments of the present disclosure allow for a significant reduction in time, repetition, and computing resource utilization (e.g., utilization of resources such as processing, memory, display, input device, etc.) that would otherwise be associated with the manual creation of the document through interaction with a user interface. Furthermore, by augmenting outputs from a machine learning model with co-occurrence measures dynamically based on each successively selected item for recommendation in an iterative loop, techniques described herein improve upon machine learning models by adding an additional real-time, dynamic component. By utilizing input and feedback from the user to iteratively improve item recommendations and the machine learning model (e.g., by updating recommendation as additional user input is received and/or by re-training the machine learning based on user feedback), techniques described herein provide a continuously-improving feedback loop. Additionally, the dynamic, iterative machine learning techniques described herein allow for providing real-time recommendations of items for inclusion in a document as the document is being created in a manner that could not be practically performed in the human mind.” ([0025]) thus providing a computing benefit to combining the inventions.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: US 11176621 B1 (Embodiments also provide new capabilities that address technical challenges and limitations of optical character recognition (OCR), which may be inaccurate or fail to correctly identify imaged data for various reasons (such as the OCR algorithm itself or various imaging or environmental variables) by utilizing OCR on a limited or initial basis or as a trigger to identify data from other electronic sources to utilize to populate sections of an electronic tax return rather than use less reliable OCR results for that other data, dt20, The image processing system 160 transforms images 128 of tax documents 129 acquired by a camera or image acquisition component 126 (generally, camera 126) of the computing device 120 into corresponding electronic tax data 161., dt25, tax document imaging, dt29, The fallback program or procedure 171 may involve generating interview screens 152 or instructions for the user to acquire a new image 128 of the tax document 129, reorient the tax document 129 or the mobile communication device 120, enter new data, correct data to determine whether the new image processing results 161, data or corrected data contains the pre-determined field data 312, dt38, receives and processes images 128 of tax documents 129, dt73, imaged tax documents 129 may include a mortgage interest statement for 2016 for a house purchased in 2016, dt74, tax return preparation application 150 generates interactive interview screens 152 that are presented to the taxpayer via a display 123, dt126, “In a single or multiple embodiments, when electronic tax data determined from image processing does not include the first set of electronic tax data for the plurality of pre-determined fields, the second input requested by the intermediate computer may request the subject taxpayer to reactivate the image acquisition component and acquire another image of the tax document, or to do so after repositioning the tax document” bs31, “In one embodiment, during preparation of an electronic tax return by a subject taxpayer, the intermediate computer determines whether the electronic tax data of the subject taxpayer includes electronic tax data of a first set of at least three pre-determined fields of the electronic tax form and determines whether this set of electronic tax data for at least three pre-determined fields is included in the data store. The at least three pre-determined fields may be selected to provide a high level of confidence of uniquely identifying the subject taxpayer, e.g., for Form W-2, whether the electronic tax data includes Social Security Number (SSN), Employer Identification Number (EIN), and Box 1 (Wages, tips, other compensation). The pre-determined fields may also be selected based on location-based criteria or where they appear within a tax form, which may vary for different tax forms of different sources such as employers and financial institutions. For example, the three-pre-determined fields may include adjacent or contiguous pre-determined fields of the electronic tax form. In certain embodiments, only the sections of the tax document image corresponding to the pre-determined fields are processed, which can reduce the amount of image processing and computing resources required.”, bs32, acquisition of a second image of a second document, second image processing and population, and so on utilizing the electronic data importation rules, bs33, OCR-triggered electronic import when certain rules or criteria are satisfied, dt35 processing a portion of the image, e.g., OCR on only the certain fields, or other fields can be masked, thus reducing the amount of image processing and computing resources utilized”, bs30, Hybrid population is also illustrated in FIG. 8, which shows an example in which image processing results 161 of data for the pre-determined fields are used to populate corresponding fields in the electronic tax form 154c (shown as group or set 801), whereas other fields of the electronic tax form 154c are populated with data received from the data store 117 (shown as second group or set 802) rather than with image processing results 161. Thus, embodiments utilize imaging processing (e.g., OCR) 160 to at least identify electronic source data 131 in the data store 117 (or identify an electronic source 130 and electronic source data 131 thereof) that can be used for electronic import, which is more reliable or more accurate than image processing, to populate all or a majority or substantial portion of the electronic tax form 154c, dt49); US 20220222284 A1 (In other words, pattern matching of key images (logo/icons) with those in database would be made and correlation co-efficient beyond a threshold value determines document category. Same approach would be made to assess similarity among the keywords in the input document with those stored in database. Full-Page OCR followed by natural language processing (NLP) is performed by the system 100 to find a good match of the File/image with those in the Database based on the key-word searches. Search is page-wise until it gets a match (beyond threshold value) and deter document category, [0058], Search is carried out by the system 100 page-wise until it gets a match (beyond threshold value) and determine document category, [0060]).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M ENTEZARI HAUSMANN whose telephone number is (571)270-5084. The examiner can normally be reached 10-7 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent M Rudolph can be reached at (571) 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MICHELLE M ENTEZARI HAUSMANN/
Primary Examiner, Art Unit 2671
Read full office action
Prosecution Timeline

Mar 08, 2024
Application Filed
Apr 23, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/267,598
Patent 12638400
Method for monitoring and/or controlling phase separation in chemical processes and samples
2y 11m to grant Granted May 26, 2026
18/348,495
Patent 12639803
SYSTEMS AND METHODS FOR MATERIAL ACCRETION DETECTION AND REMOVAL
2y 10m to grant Granted May 26, 2026
18/136,006
Patent 12629121
METHOD OF DETERMINING VESSEL FLUID FLOW VELOCITY
3y 1m to grant Granted May 19, 2026
18/034,833
Patent 12626375
HOMOGRAPHY MATRIX GENERATION APPARATUS, CONTROL METHOD, AND COMPUTER-READABLE MEDIUM
3y 0m to grant Granted May 12, 2026
18/179,635
Patent 12620252
INFORMATION SOURCE DETECTION USING UNIQUE WATERMARKS
3y 2m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
76%
Grant Probability
98%
With Interview (+21.3%)
3y 0m (~9m remaining)
Median Time to Grant
Low
PTA Risk
Based on 870 resolved cases by this examiner. Grant probability derived from career allowance rate.