DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 04/26/2024 has/have been considered by the examiner.
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1-3, 5-15, and 17-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-11 and 13-20 of U.S. Patent No. 12014556 B2 . Although the claims at issue are not identical, they are not patentably distinct from each other because Claims 1-3, 5-15, and 17-20 of the instant application 18647080 can be anticipated by claims 1-11 and 13-20 of U.S. Patent No. 12014556 B2 respectively.
Instant Application 18647080
U.S. Patent No. 12014556 B2
1. An image recognition method, performed by an electronic device, comprising: acquiring a target image, the target image being an image of a certificate to be recognized; performing image feature extraction on the target image to obtain an image feature of the target image; using an image classification network to determine an image type of the target image and a certificate direction of the target image, wherein the certificate direction refers to a front direction of the certificate to be recognized in the target image; and determining, by using the image classification network, a text direction of the target text according to the text area image; performing direction adjustment on the text area image according to the certificate direction and the text direction to obtain an adjusted text area image; and performing text recognition on the adjusted text area image to obtain a text content of the target text.
1. An image recognition method, performed by an electronic device, comprising: acquiring a target image, the target image being an image of a certificate to be recognized; performing text area recognition on the target image to obtain a text area image of a target text corresponding to the certificate to be recognized: comprising: performing image feature extraction on the target image to obtain an image feature of the target image; and performing image type recognition on the target image according to the image feature by using an image classification network, and determining an image type of the target mage and a certificate direction of the target image, wherein the certificate direction refers to a front direction of the certificate to be recognized in the target image; determining, by using the image classification network, a text direction of the target text according to the text area image, wherein the certificate direction and the text direction are both included in an output layer of the image classification network; performing direction adjustment on the text area image by using the text direction and the certificate direction at the same time to obtain an adjusted text area image; and performing text recognition on the adjusted text area image to obtain a text content of the target text.
2. The image recognition method of claim 1, further comprising: when the image type of the target image is a preset certificate type, performing text area segmentation on the target image according to the image feature by using a region-based detection network to obtain the text area image of the target text corresponding to the certificate to be recognized.
2. The image recognition method of claim 1, wherein the performing text area recognition on the target image to obtain a text area image of a target text corresponding to the certificate to be recognized comprises: when the image type of the target image is a preset certificate type, performing text area segmentation on the target image according to the image feature by using a region-based detection network to obtain the text area image of the target text corresponding to the certificate to be recognized.
3. The image recognition method of claim 2, wherein the performing image feature extraction on the target image to obtain an image feature of the target image comprises: performing image segmentation processing on the target image to obtain an image segment group, the image segment group comprising a plurality of image segments; performing multi-scale feature extraction on the image segment group by using a group convolutional network to obtain a plurality of image segment feature groups in different sizes, wherein image segment features in each image feature group are in a same size; and performing feature fusion processing based on the image segment feature groups in different sizes to obtain the image feature of the target image.
3. The image recognition method of claim 2, wherein the performing image feature extraction on the target image to obtain an image feature of the target image comprises: performing image segmentation processing on the target image to obtain an image segment group, the image segment group comprising a plurality of image segments; performing multi-scale feature extraction on the image segment group by using a group convolutional network to obtain a plurality of image segment feature groups in different sizes, wherein image segment features in each image feature group are in a same size; and performing feature fusion processing based on the image segment feature groups in different sizes to obtain the image feature of the target image.
4. The image recognition method of claim 1, wherein the certificate direction and the text direction are determined using different channels at an output layer of the image classification network.
13. The method according to claim 1, wherein the certificate direction and the text direction are determined using different channels at the output layer of the image classification network.
5. The image recognition method of claim 2, wherein the performing text area segmentation on the target image according to the image feature by using a region-based detection network to obtain the text area image of the target text corresponding to the certificate to be recognized comprises: determining a text area location feature according to the image feature by using a region-based detection network; determining text area location feature points in the target image according to the text area location feature; and segmenting the target image according to the text area location feature points to obtain the text area image, the text area image being an image comprising the target text.
4. The image recognition method of claim 2, wherein the performing text area segmentation on the target image according to the image feature by using a region-based detection network to obtain the text area image of the target text corresponding to the certificate to be recognized comprises: determining a text area location feature according to the image feature by using a region-based detection network; determining text area location feature points in the target image according to the text area location feature; and segmenting the target image according to the text area location feature points to obtain the text area image, the text area image being an image comprising the target text.
6. The image recognition method of claim 5, wherein after the determining a text area location feature according to the image feature by using the region-based detection network, the method further comprises: determining a direction feature of the target text in the text area image according to the image feature by using the region-based detection network; the determining the text direction of the target text according to the text area image comprises: determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network.
5. The image recognition method of claim 4, wherein after the determining a text area location feature according to the image feature by using the region-based detection network, the method further comprises: determining a direction feature of the target text in the text area image according to the image feature by using the region-based detection network; the determining the text direction of the target text according to the text area image comprises: determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network.
7. The image recognition method of claim 6, wherein the region-based detection network comprises a multi-channel output layer, and the determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network comprises: determining direction prediction values of pixels in the text area image according to the direction feature in the multi-channel output layer; performing statistics on the direction prediction values of the pixels in the text area image to obtain a global direction value of the text area image; and determining the text direction of the target text according to the global direction value.
6. The image recognition method of claim 5, wherein the region-based detection network comprises a multi-channel output layer, and the determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network comprises: determining direction prediction values of pixels in the text area image according to the direction feature in the multi-channel output layer; performing statistics on the direction prediction values of the pixels in the text area image to obtain a global direction value of the text area image; and determining the text direction of the target text according to the global direction value.
8. The image recognition method of claim 1, wherein the performing text recognition based on the adjusted text area image to obtain a text content of the target text comprises: performing image segmentation processing on the text area image to obtain a text area image segment; performing feature extraction on the text area image segment by using a group convolutional network to obtain a text area image segment feature; determining a target text feature according to the text area image segment feature; and performing text recognition based on the target text feature by using a bidirectional recurrent network to obtain the text content of the target text.
7. The image recognition method of claim 1, wherein the performing text recognition based on the adjusted text area image to obtain a text content of the target text comprises: performing image segmentation processing on the text area image to obtain a text area image segment; performing feature extraction on the text area image segment by using a group convolutional network to obtain a text area image segment feature; determining a target text feature according to the text area image segment feature; and performing text recognition based on the target text feature by using a bidirectional recurrent network to obtain the text content of the target text.
9. The image recognition method of claim 8, wherein the bidirectional recurrent network comprises a forward layer and a backward layer, and the performing text recognition based on the target text feature by using a bidirectional recurrent network to obtain the text content of the target text comprises: determining a current moment in the bidirectional recurrent network, a forward hidden layer state corresponding to a previous moment of the current moment, and a backward hidden layer state corresponding to a next moment of the current moment; determining a forward hidden layer state of the forward layer at the current moment according to the target text feature and the forward hidden layer state of the forward layer at the previous moment; determining a backward hidden layer state of the backward layer at the current moment according to the target text feature and the backward hidden layer state of the backward layer at the next moment; determining a text semantic vector according to the forward hidden layer state of the forward layer at the current moment and the backward hidden layer state of the backward layer at the current moment; and determining the text content of the target text according to the text semantic vector.
8. The image recognition method of claim 7, wherein the bidirectional recurrent network comprises a forward layer and a backward layer, and the performing text recognition based on the target text feature by using a bidirectional recurrent network to obtain the text content of the target text comprises: determining a current moment in the bidirectional recurrent network, a forward hidden layer state corresponding to a previous moment of the current moment, and a backward hidden layer state corresponding to a next moment of the current moment; determining a forward hidden layer state of the forward layer at the current moment according to the target text feature and the forward hidden layer state of the forward layer at the previous moment; determining a backward hidden layer state of the backward layer at the current moment according to the target text feature and the backward hidden layer state of the backward layer at the next moment; determining a text semantic vector according to the forward hidden layer state of the forward layer at the current moment and the backward hidden layer state of the backward layer at the current moment; and determining the text content of the target text according to the text semantic vector.
10. The image recognition method of claim 1, wherein the text direction comprises a first direction, a second direction, a third direction, and a fourth direction, and the performing direction adjustment on the text area image according to the text direction to obtain an adjusted text area image comprises: when the text direction is the first direction, regarding the text area image as the adjusted text area image; when the text direction is the second direction, rotating the text area image counterclockwise by 90 degrees to obtain the adjusted text area image; when the text direction is the third direction, rotating the text area image counterclockwise by 180 degrees to obtain the adjusted text area image; and when the text direction is the fourth direction, rotating the text area image counterclockwise by 270 degrees to obtain the adjusted text area image.
9. The image recognition method of claim 1, wherein the text direction comprises a first direction, a second direction, a third direction, and a fourth direction, and the performing direction adjustment on the text area image according to the text direction to obtain an adjusted text area image comprises: when the text direction is the first direction, regarding the text area image as the adjusted text area image; when the text direction is the second direction, rotating the text area image counterclockwise by 90 degrees to obtain the adjusted text area image; when the text direction is the third direction, rotating the text area image counterclockwise by 180 degrees to obtain the adjusted text area image; and when the text direction is the fourth direction, rotating the text area image counterclockwise by 270 degrees to obtain the adjusted text area image.
11. The image recognition method of claim 1, wherein before the performing text recognition based on the adjusted text area image to obtain a text content of the target text, the method further comprises: acquiring a text area training sample image and a preset recurrent convolutional network, wherein the text content of the target text is annotated in the text area training sample image, and the preset recurrent convolutional network comprises a group convolutional network, a bidirectional recurrent network, and a connectionist temporal classifier; and training the preset recurrent convolutional network according to the text area training sample image, until the connectionist temporal classifier determines that the preset recurrent convolutional network is converged, to obtain a recurrent convolutional network; and the performing text recognition based on the adjusted text area image to obtain a text content of the target text comprises: performing text recognition based on the adjusted text area image by using the recurrent convolutional network to obtain the text content of the target text.
10. The image recognition method of claim 1, wherein before the performing text recognition based on the adjusted text area image to obtain a text content of the target text, the method further comprises: acquiring a text area training sample image and a preset recurrent convolutional network, wherein the text content of the target text is annotated in the text area training sample image, and the preset recurrent convolutional network comprises a group convolutional network, a bidirectional recurrent network, and a connectionist temporal classifier; and training the preset recurrent convolutional network according to the text area training sample image, until the connectionist temporal classifier determines that the preset recurrent convolutional network is converged, to obtain a recurrent convolutional network; and the performing text recognition based on the adjusted text area image to obtain a text content of the target text comprises: performing text recognition based on the adjusted text area image by using the recurrent convolutional network to obtain the text content of the target text.
12. The image recognition method of claim 1, wherein the determining the text direction of the target text according to the text area image comprises: acquiring a training sample image, a preset image classification network, and a preset region-based detection network, the training sample image being annotated with a certificate direction and a text direction; processing the training sample image by using the preset image classification network to obtain a certificate direction training result; training the preset region-based detection network according to the training sample image and the certificate direction training result, until the preset region-based detection network is converged, to obtain the trained region-based detection network; and determining the text direction of the target text according to the text area image by using the trained region-based detection network.
11. The image recognition method of claim 1, wherein the determining the text direction of the target text according to the text area image comprises: acquiring a training sample image, a preset image classification network, and a preset region-based detection network, the training sample image being annotated with a certificate direction and a text direction; processing the training sample image by using the preset image classification network to obtain a certificate direction training result; training the preset region-based detection network according to the training sample image and the certificate direction training result, until the preset region-based detection network is converged, to obtain the trained region-based detection network; and determining the text direction of the target text according to the text area image by using the trained region-based detection network.
13. An image recognition apparatus, comprising a processor and a memory, and the memory storing a plurality of instructions; the processor loading the instructions from the memory to perform: acquiring a target image, the target image being an image of a certificate to be recognized; performing image feature extraction on the target image to obtain an image feature of the target image; using an image classification network to determine an image type of the target image and a certificate direction of the target image, wherein the certificate direction refers to a front direction of the certificate to be recognized in the target image; and determining, by using the image classification network, a text direction of the target text according to the text area image; performing direction adjustment on the text area image according to the certificate direction and the text direction to obtain an adjusted text area image; and performing text recognition on the adjusted text area image to obtain a text content of the target text.
14. An image recognition apparatus, comprising a processor and a memory, and the memory storing a plurality of instructions; the processor loading the instructions from the memory to perform: acquiring a target image, the target image being an image of a certificate to be recognized; performing text area recognition on the target image to obtain a text area image of a target text corresponding to the certificate to be recognized: comprising: performing image feature extraction on the target image to obtain an image feature of the target image; and performing image type recognition on the target image according to the image feature by using an image classification network, and determining an image type of the target mage and a certificate direction of the target image, wherein the certificate direction refers to a front direction of the certificate to be recognized in the target image; determining, by using the image classification network, a text direction of the target text according to the text area image, wherein the certificate direction and the text direction are both included in an output layer of the image classification network; performing direction adjustment on the text area image by using the text direction and the certificate direction at the same time to obtain an adjusted text area image; and performing text recognition on the adjusted text area image to obtain a text content of the target text.
14. The image recognition apparatus of claim 13, wherein the processor is further configured to perform: when the image type of the target image is a preset certificate type, performing text area segmentation on the target image according to the image feature by using a region-based detection network to obtain the text area image of the target text corresponding to the certificate to be recognized.
15. The image recognition apparatus of claim 14, wherein the performing text area recognition on the target image to obtain a text area image of a target text corresponding to the certificate to be recognized comprises: when the image type of the target image is a preset certificate type, performing text area segmentation on the target image according to the image feature by using a region-based detection network to obtain the text area image of the target text corresponding to the certificate to be recognized.
15. The image recognition apparatus of claim 14, wherein the performing image feature extraction on the target image to obtain an image feature of the target image comprises: performing image segmentation processing on the target image to obtain an image segment group, the image segment group comprising a plurality of image segments; performing multi-scale feature extraction on the image segment group by using a group convolutional network to obtain a plurality of image segment feature groups in different sizes, wherein image segment features in each image feature group are in a same size; and performing feature fusion processing based on the image segment feature groups in different sizes to obtain the image feature of the target image.
16. The image recognition apparatus of claim 15, wherein the performing image feature extraction on the target image to obtain an image feature of the target image comprises: performing image segmentation processing on the target image to obtain an image segment group, the image segment group comprising a plurality of image segments; performing multi-scale feature extraction on the image segment group by using a group convolutional network to obtain a plurality of image segment feature groups in different sizes, wherein image segment features in each image feature group are in a same size; and performing feature fusion processing based on the image segment feature groups in different sizes to obtain the image feature of the target image.
16. The image recognition apparatus of claim 13, wherein the certificate direction and the text direction are determined using different channels at an output layer of the image classification network.
13. The method according to claim 1, wherein the certificate direction and the text direction are determined using different channels at the output layer of the image classification network.
17. The image recognition apparatus of claim 14, wherein the performing text area segmentation on the target image according to the image feature by using a region-based detection network to obtain the text area image of the target text corresponding to the certificate to be recognized comprises: determining a text area location feature according to the image feature by using a region-based detection network; determining text area location feature points in the target image according to the text area location feature; and segmenting the target image according to the text area location feature points to obtain the text area image, the text area image being an image comprising the target text.
17. The image recognition apparatus of claim 15, wherein the performing text area segmentation on the target image according to the image feature by using a region-based detection network to obtain the text area image of the target text corresponding to the certificate to be recognized comprises: determining a text area location feature according to the image feature by using a region-based detection network; determining text area location feature points in the target image according to the text area location feature; and segmenting the target image according to the text area location feature points to obtain the text area image, the text area image being an image comprising the target text.
18. The image recognition apparatus of claim 17, wherein after the determining a text area location feature according to the image feature by using the region-based detection network, the processor is further configured to perform: determining a direction feature of the target text in the text area image according to the image feature by using the region-based detection network; the determining the text direction of the target text according to the text area image comprises: determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network.
18. The image recognition apparatus of claim 17, wherein after the determining a text area location feature according to the image feature by using the region-based detection network, the processor is further configured to perform: determining a direction feature of the target text in the text area image according to the image feature by using the region-based detection network; the determining the text direction of the target text according to the text area image comprises: determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network.
19. The image recognition apparatus of claim 18, wherein the region-based detection network comprises a multi-channel output layer, and the determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network comprises: determining direction prediction values of pixels in the text area image according to the direction feature in the multi-channel output layer; performing statistics on the direction prediction values of the pixels in the text area image to obtain a global direction value of the text area image; and determining the text direction of the target text according to the global direction value.
19. The image recognition apparatus of claim 18, wherein the region-based detection network comprises a multi-channel output layer, and the determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network comprises: determining direction prediction values of pixels in the text area image according to the direction feature in the multi-channel output layer; performing statistics on the direction prediction values of the pixels in the text area image to obtain a global direction value of the text area image; and determining the text direction of the target text according to the global direction value.
20. A non-transitory computer-readable storage medium storing a plurality of instructions, the instructions being adaptable to be loaded by a processor to perform: acquiring a target image, the target image being an image of a certificate to be recognized; performing image feature extraction on the target image to obtain an image feature of the target image; using an image classification network to determine an image type of the target image and a certificate direction of the target image, wherein the certificate direction refers to a front direction of the certificate to be recognized in the target image; and determining, by using the image classification network, a text direction of the target text according to the text area image; performing direction adjustment on the text area image according to the certificate direction and the text direction to obtain an adjusted text area image; and performing text recognition on the adjusted text area image to obtain a text content of the target text.
20. A non-transitory computer-readable storage medium storing a plurality of instructions, the instructions being adaptable to be loaded by a processor to perform: acquiring a target image, the target image being an image of a certificate to be recognized; performing text area recognition on the target image to obtain a text area image of a target text corresponding to the certificate to be recognized: comprising: performing image feature extraction on the target image to obtain an image feature of the target image; and performing image type recognition on the target image according to the image feature by using an image classification network, and determining an image type of the target mage and a certificate direction of the target image, wherein the certificate direction refers to a front direction of the certificate to be recognized in the target image; determining, by using the image classification network, a text direction of the target text according to the text area image, wherein the certificate direction and the text direction are both included in an output layer of the image classification network; performing direction adjustment on the text area image by using the text direction and the certificate direction at the same time to obtain an adjusted text area image; and performing text recognition on the adjusted text area image to obtain a text content of the target text.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 13 and 20 recite the limitation "the text area image" in lines 9, 11, and 10 respectively. There is insufficient antecedent basis for this limitation in the claim.
Claims 1, 13 and 20 recite the limitation "the target text" in lines 8, 10, and 9 respectively. There is insufficient antecedent basis for this limitation in the claim.
Regarding claims 1, 13 and 20, the phrase "front direction" renders the claim(s) indefinite because the term “front direction” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For examination purpose, the term has been interpreted as a direction compared to an image direction with the contents of the image in horizontal direction.
Claims 1, 13 and 20 recite the limitation “performing image feature extraction on the target image to obtain an image feature of the target image”. However, the extracted image feature is not associated with any rest of text recognition steps in the claims. The claim scope or claimed subject matter of these claims is not clear .
Claims 2-12 and 14-19 are also rejected under 35 U.S.C. 112(b) as being dependent upon a rejected base claim.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
Claim(s) 1, 8-11, 13 and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Han et al (CN 110443239 A), hereinafter Han.
-Regarding claim 1, Han discloses an image recognition method (Abstract; FIGS. 1-11; Page 18, 4th -7th paragraphs; Page 19, 2nd paragraph), performed by an electronic device, comprising (Abstract; FIG. 11; FIGS. 1, 4, 7-9): acquiring a target image (FIG. 1, S101; FIG. 7, S201), the target image being an image of a certificate to be recognized (FIGS. 8, 2; Page 6, last paragraph, “certificate photograph … the like … not limited”); performing image feature extraction on the target image to obtain an image feature of the target image (Abstract; FIG. 1, S102; FIG.4; FIG. 7, S202; FIG. 9, extraction module 320; Page 2, 5th paragraph, “extracting a plurality of image areas …”, last paragraphs, “text boxes”; Page 10, 2nd paragraph, “feature extraction”; Page 12, 6th -7th paragraphs); using an image classification network to determine an image type of the target image (Abstract; Page 11, 6th paragraph, “using the text recognition model for text recognition, capable of identifying various types of character image”) and a certificate direction of the target image, wherein the certificate direction refers to a front direction of the certificate to be recognized in the target image (Abstract; FIG. 1, S103-S104; FIG. 7, S202; FIG. 8; Page 3, 5th paragraph, “a direction identifying module … determining the positive direction of the character image …”; Page 12, 7th paragraph); and determining, by using the image classification network, a text direction of the target text according to the text area image (FIGS. 1-2, 7; Page 2, last paragraph, “direction identifying model … comprising a plurality of text boxes, respectively obtaining the positive direction corresponding to each of the text boxes”; Page 3, 2nd paragraph; Page 7, 4th- 5th paragraphs, 8th paragraph, “direction recognition model training can identify the positive direction of each section of text content …”); performing direction adjustment on the text area image according to the certificate direction and the text direction to obtain an adjusted text area image (FIG.1, S104; FIG. 7, S204-S207; Page 2, last paragraph, “rotate … modifies the positive direction ..”; Page 13, 6th -8th paragraphs); and performing text recognition on the adjusted text area image to obtain a text content of the target text (Abstract; FIG. 1, S105, Page 8, 2nd -3rd paragraphs; Page 7, 8th paragraph, “text content in each image area”; Page 11, 7th paragraph).
-Regarding claim 8, Han discloses the method of claim 1. Han further discloses wherein the performing text recognition based on the adjusted text area image to obtain a text content of the target text comprises (Abstract; FIG. 1, S105, Page 8, 2nd -3rd paragraphs; FIG. 7, S207; Page 7, 8th paragraph, “text content in each image area”): performing image segmentation processing on the text area image to obtain a text area image segment (Pages 8-9, steps S111-S13); performing feature extraction on the text area image segment by using a group convolutional network to obtain a text area image segment feature (Page 11, 2nd paragraph, “stacked two-way long-term memory neural network for feature extraction”; FIGS. 3-5); determining a target text feature according to the text area image segment feature (FIGS. 3-5); and performing text recognition based on the target text feature by using a bidirectional recurrent network to obtain the text content of the target text (FIGS. 3-6; Pages 8-9, steps S111-S13; FIG. 1, S105).
-Regarding claim 9, it recites the common facts or basic knowledges of a bidirectional recurrent network (The recited facts are inherently presented in Han (FIGS. 3-5) or any bidirectional recurrent network in order to perform various stages of calculations for text recognition).
-Regarding claim 10, Han discloses the method of claim 1. Han further discloses wherein the text direction comprises a first direction, a second direction, a third direction, and a fourth direction (FIG. 8; Page 12, 9th - 10th paragraphs); when the text direction is the first direction, regarding the text area image as the adjusted text area image; when the text direction is the second direction, rotating the text area image counterclockwise by 90 degrees to obtain the adjusted text area image; when the text direction is the third direction, rotating the text area image counterclockwise by 180 degrees to obtain the adjusted text area image; and when the text direction is the fourth direction, rotating the text area image counterclockwise by 270 degrees to obtain the adjusted text area image (Page 12, 9th paragraph, “0 represents a positive direction … 90 degrees … 180 degrees … 270 degrees).
-Regarding claim 11, Han discloses the method of claim 1. Han further discloses acquiring a text area training sample image (Page 8, step S11) and a preset recurrent convolutional network (Page 8, 4th paragraph, “trained model”; FIGS. 3-6), wherein the text content of the target text is annotated in the text area training sample image (Page 8, step S11) and the preset recurrent convolutional network comprises a group convolutional network, a bidirectional recurrent network, and a connectionist temporal classifier (FIGS. 3-6); and training the preset recurrent convolutional network according to the text area training sample image (Page 8, step S12), until the connectionist temporal classifier determines that the preset recurrent convolutional network is converged (Page 10, step 13; FIG. 6; Page 10, 5th paragraph, “CTC loss”), to obtain a recurrent convolutional network (FIGS. 4-5; Page 9, 4th paragraph, “classification model”);
-Regarding claim 13, Han discloses an image recognition apparatus, comprising a processor and a memory, and the memory storing a plurality of instructions (FIG. 11); the processor loading the instructions from the memory to perform (Abstract; FIGS. 1-11; Page 18, 4th -7th paragraphs; Page 19, 2nd paragraph): acquiring a target image (FIG. 1, S101; FIG. 7, S201), the target image being an image of a certificate to be recognized (FIGS. 8, 2; Page 6, last paragraph, “certificate photograph … the like … not limited”); performing image feature extraction on the target image to obtain an image feature of the target image (Abstract; FIG. 1, S102; FIG.4; FIG. 7, S202; FIG. 9, extraction module 320; Page 2, 5th paragraph, “extracting a plurality of image areas …”, last paragraphs, “text boxes”; Page 10, 2nd paragraph, “feature extraction”; Page 12, 6th -7th paragraphs); using an image classification network to determine an image type of the target image (Abstract; Page 11, 6th paragraph, “using the text recognition model for text recognition, capable of identifying various types of character image”) and a certificate direction of the target image, wherein the certificate direction refers to a front direction of the certificate to be recognized in the target image (Abstract; FIG. 1, S103-S104; FIG. 7, S202; FIG. 8; Page 3, 5th paragraph, “a direction identifying module … determining the positive direction of the character image …”; Page 12, 7th paragraph); and determining, by using the image classification network, a text direction of the target text according to the text area image (FIGS. 1-2, 7; Page 2, last paragraph, “direction identifying model … comprising a plurality of text boxes, respectively obtaining the positive direction corresponding to each of the text boxes”; Page 3, 2nd paragraph; Page 7, 4th- 5th paragraphs, 8th paragraph, “direction recognition model training can identify the positive direction of each section of text content …”); performing direction adjustment on the text area image according to the certificate direction and the text direction to obtain an adjusted text area image (FIG.1, S104; FIG. 7, S204-S207; Page 2, last paragraph, “rotate … modifies the positive direction ..”; Page 13, 6th -8th paragraphs); and performing text recognition on the adjusted text area image to obtain a text content of the target text (Abstract; FIG. 1, S105, Page 8, 2nd -3rd paragraphs; Page 7, 8th paragraph, “text content in each image area”; Page 11, 7th paragraph).
-Regarding claim 20, Han discloses a non-transitory computer-readable storage medium storing a plurality of instructions (FIG. 11; Page 4, 5th paragraph) the instructions being adaptable to be loaded by a processor to perform (Abstract; FIGS. 1-11; Page 18, 4th -7th paragraphs; Page 19, 2nd paragraph): acquiring a target image (FIG. 1, S101; FIG. 7, S201), the target image being an image of a certificate to be recognized (FIGS. 8, 2; Page 6, last paragraph, “certificate photograph … the like … not limited”); performing image feature extraction on the target image to obtain an image feature of the target image (Abstract; FIG. 1, S102; FIG.4; FIG. 7, S202; FIG. 9, extraction module 320; Page 2, 5th paragraph, “extracting a plurality of image areas …”, last paragraphs, “text boxes”; Page 10, 2nd paragraph, “feature extraction”; Page 12, 6th -7th paragraphs); using an image classification network to determine an image type of the target image (Abstract; Page 11, 6th paragraph, “using the text recognition model for text recognition, capable of identifying various types of character image”) and a certificate direction of the target image, wherein the certificate direction refers to a front direction of the certificate to be recognized in the target image (Abstract; FIG. 1, S103-S104; FIG. 7, S202; FIG. 8; Page 3, 5th paragraph, “a direction identifying module … determining the positive direction of the character image …”; Page 12, 7th paragraph); and determining, by using the image classification network, a text direction of the target text according to the text area image (FIGS. 1-2, 7; Page 2, last paragraph, “direction identifying model … comprising a plurality of text boxes, respectively obtaining the positive direction corresponding to each of the text boxes”; Page 3, 2nd paragraph; Page 7, 4th- 5th paragraphs, 8th paragraph, “direction recognition model training can identify the positive direction of each section of text content …”); performing direction adjustment on the text area image according to the certificate direction and the text direction to obtain an adjusted text area image (FIG.1, S104; FIG. 7, S204-S207; Page 2, last paragraph, “rotate … modifies the positive direction ..”; Page 13, 6th -8th paragraphs); and performing text recognition on the adjusted text area image to obtain a text content of the target text (Abstract; FIG. 1, S105, Page 8, 2nd -3rd paragraphs; Page 7, 8th paragraph, “text content in each image area”; Page 11, 7th paragraph).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 2-7 and 14-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Han et al (CN 110443239 A), hereinafter Han in view of Zhou et al (2017 CVPR), hereinafter Zhou.
-Regarding claims 2 and 14, Han discloses the method of claim 1 and apparatus of claim 13. Han further discloses performing image feature extraction on the target image to obtain an image feature of the target image (Page 10, 2nd paragraph, “adopt … stacked two-way long-term memory neural network for feature extraction”); and performing text area segmentation on the target image to obtain the text area image of the target text to be recognized (Abstract: ”extracting a plurality of image areas from the characters in the image to be identified”; Page 11, 7th paragraph; Page 12, 7th paragraph).
Han does not disclose a region-based detection network for text area segmentation.
In the same field of endeavor, Zhou teaches a method for text area detection (Zhou: Abstract; Figures. 2-6). Zhou further teaches a region-based detection network for text area segmentation (Zhou: Page 5552, 1st Col., 1st paragraph; Figure 2(e), multi-channel FCN; Figure 3
PNG
media_image1.png
200
400
media_image1.png
Greyscale
).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Han with the teaching of Zhou by using a region-based detection network in order to improve performance of text area detection in terms of accuracy and speed.
-Regarding claims 3 and 15, Han in view of Zhou discloses the method of claim 2 and apparatus of claim 14.
Han does not disclose performing image segmentation processing on the target image to obtain an image segment group, the image segment group comprising a plurality of image segments; performing multi-scale feature extraction on the image segment group by using a group convolutional network to obtain a plurality of image segment feature groups in different sizes, wherein image segment features in each image feature group are in the same size; and performing feature fusion processing based on the image segment feature groups in different sizes to obtain the image feature of the target image.
In the same field of endeavor, Zhou teaches a method for text area detection (Zhou: Abstract; Figures. 2-6). Zhou further teaches wherein the performing image feature extraction on the target image to obtain an image feature of the target image comprises (Zhou: Figure 2): performing image segmentation processing on the target image to obtain an image segment group (Zhou: Figure 3, Feature extractor), the image segment group comprising a plurality of image segments (Zhou: Figure 3, Feature extractor, filter 7x7); performing multi-scale feature extraction on the image segment group by using a group convolutional network to obtain a plurality of image segment feature groups in different sizes (Zhou: Figure 3, Feature extractor, stages 1-4), wherein image segment features in each image feature group are in the same size (Zhou: Figure 3, Feature extractor, stage 1 or 2 or 3 or 4); and performing feature fusion processing based on the image segment feature groups in different sizes to obtain the image feature of the target image (Zhou: Figure 3, Feature-merging branch).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Han with the teaching of Zhou by using a region-based detection network in order to improve performance of text area detection in terms of accuracy and speed.
-Regarding claims 4 and 16, Han discloses the method of claim 1 and apparatus of claim 13.
Han does not disclose wherein the certificate direction and the text direction are determined using different channels at an output layer of the image classification network.
However, Zhou is an analogous art pertinent to the problem to be solved in this application and teaches a method for text area detection and text detection (Zhou: Abstract; Figures. 2-6). Zhou further teaches wherein the certificate direction and the text direction are determined using different channels at an output layer of the image classification network (Zhou: Abstract, “arbitrary orientations”; Figures. 2; Figure 3, output layer, QUAD, RBOX, text box, text rotation angle).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Han with the teaching of Zhou by using different channels at an output layer of the image classification network to determine the certificate direction and the text direction in order to improve performance of text area detection in terms of accuracy and speed.
-Regarding claims 5 and 17, Han discloses the method of claim 2 and apparatus of claim 14.
Han does not disclose a region-based detection network.
However, Zhou is an analogous art pertinent to the problem to be solved in this application and teaches a method for text area detection (Zhou: Abstract; Figures. 2-6). Zhou further teaches determining a text area location feature according to the image feature by using a region-based detection network (Zhou: Abstract; Figure2(e); Figure 3); determining text area location feature points in the target image according to the text area location feature (Zhou: Figure 3); and segmenting the target image according to the text area location feature points to obtain the text area image, the text area image being an image comprising the target text (Zhou: Figure 3, output layer).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Han with the teaching of Zhou by using a region-based detection network in order to improve performance of text area detection in terms of accuracy and speed.
-Regarding claims 6 and 18, Han discloses the method of claim 5 and apparatus of claim 17.
Han does not disclose determining a direction feature of the target text in the text area image according to the image feature by using the region-based detection network; and determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network.
However, Zhou is an analogous art pertinent to the problem to be solved in this application and teaches a method for text area detection (Zhou: Abstract; Figures. 2-6). Zhou further teaches determining a direction feature of the target text in the text area image according to the image feature by using the region-based detection network (Zhou: Figure 2(e), multi-channel FCN; Figure 3); and determining the text direction of the target text according to the direction feature of the text area image by using the region-based detection network (Zhou: Figure 2(e); Figure 3, output layer).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Han with the teaching of Zhou by using a region-based detection network in order to improve performance of text area detection in terms of accuracy and speed.
-Regarding claims 7 and 19, Han discloses the method of claim 6 and apparatus of claim 18.
Han does not disclose determining direction prediction values of pixels in the text area image according to the direction feature in the multi-channel output layer; performing statistics on the direction prediction values of the pixels in the text area image to obtain a global direction value of the text area image; and determining the text direction of the target text according to the global direction value.
However, Zhou is an analogous art pertinent to the problem to be solved in this application and teaches a method for text area detection (Zhou: Abstract; Figures. 2-6). Zhou further teaches determining direction prediction values of pixels in the text area image according to the direction feature in the multi-channel output layer (Zhou: Figure 3, output layer); performing statistics on the direction prediction values of the pixels in the text area image to obtain a global direction value of the text area image; and determining the text direction of the target text according to the global direction value (Zhou: Figure 3, output layer; Figure 4; Page 5554, 2nd Col., Sections 3.3.2, 3.4.1; Figure 2).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Han with the teaching of Zhou by using a region-based detection network in order to improve performance of text area detection in terms of accuracy and speed.
Claim(s) 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Han et al (CN 110443239 A), hereinafter Han in view of in view of Lian et al (CN 108885699 A), hereinafter Lian, and further in view of Zhou et al (2017 CVPR), hereinafter Zhou.
-Regarding claim 12, Han discloses the method of claim 1. Han further discloses determining an image type of the target image (Page 11, 6th paragraph, “identify various types of character image”).
Han does not disclose a preset image classification network.
In the same field of endeavor, Lian teaches a character recognition method by determining an image type, correct target image and extracting text line image from the target image (Lian: Abstract; FIGS. 1-8). Lian further teaches an image classification network for image type recognition (Lian: FIG. 1, step S101; Page 6, 1st paragraph, “based on …
deep learning image sample … preset classifier to obtain the target classifier, so as to input the target image into the target classifier, the target classifier can output the image type corresponding to the target image”). Lian also teaches methods for text area detection (Lian: Page 7, S21, “text area detection may include … area detection … any one of the learning detections”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Han with the teaching of Lian by using an image classification network for image type recognition in order to improve the generalization ability of character image recognition method.
Han in view of Lian does not teach a preset region- based detection network, the training sample image being annotated with an image direction and a text direction; processing the training sample image by using the preset image classification network to obtain a certificate direction training result; training the preset region-based detection network according to the training sample image and the image direction training result, until the preset region-based detection network is converged, to obtain the trained region-based detection network; and determining the text direction of the target text according to the text area image by using the trained region-based detection network.
However, Zhou is an analogous art pertinent to the problem to be solved in this application and teaches scene text detection pipeline (Zhou: Abstract; Figures 2-6). Zhou further teaches a preset region- based detection network (Zhou: Figure 2(e), multi-channel FCN; Figure 3), the training sample image being annotated with a certificate direction and a text direction (Zhou: Figure 3; annotated text direction and image direction are inherently presented in output layer of Figure 3); processing the training sample image by using the preset image classification network to obtain an image direction training result; training the preset region-based detection network according to the training sample image and the image direction training result (Zhou: equation (4)), until the preset region-based detection network is converged, to obtain the trained region-based detection network; and determining the text direction of the target text according to the text area image by using the trained region-based detection network (Zhou: Figures 2-3; Page 5555, section 3.5 “training”).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Han in view of Lian with the teaching of Zhou by using the scene text detector in order to improve performance of text area detection in terms of accuracy and speed.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAO LIU whose telephone number is (571)272-4539. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jennifer Mehmood can be reached at (571) 272-2976. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.
For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/XIAO LIU/ Primary Examiner, Art Unit 2664