Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA as explained in MPEP § 2159. See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b).
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 9 and 17 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1, 11-12 and 22 of U.S. Patent No. US12153624B2 Although the claims at issue are not identical, they are not patentably distinct from each other because they are obvious variants of each other.
The chart below shows the correspondence between the claims in the current application and the claims in the patent.
Instant Application 18/929,730
U.S. Patent No. US12153624B2
1. A system comprising: a data repository storing: a collection of character recognized documents
22. storing the first document as a character recognized document in the set of character recognized documents,
and a stroke mapping that maps strokes to stroke identifier (ids);
1. mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id)
a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a search user interface executable to receive a first search query that includes a first ideogram character from an ideogram language;
22. The non-transitory computer readable medium of claim 12, wherein the computer readable program code further comprises instructions for
1. receiving a first search phrase for searching a set of character recognized documents, the first search phrase submitted by a user via a user interface and comprising a first set of ideogram characters;
partitioning the first ideogram character into a first plurality of strokes;
1.partitioning a first ideogram character from the first search phrase into a plurality of strokes;
mapping each stroke of the first plurality of strokes to a corresponding stroke identifier (id) according to the stroke mapping to create a first stroke id sequence comprising a plurality of stroke identifiers;
1.mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id) to create a first stroke id sequence comprising a plurality of stroke identifiers;
modifying the first stroke id sequence to create a second stroke id sequence within a specified distance to the first stroke id sequence;
1.modifying the first stroke id sequence to create a second stroke id sequence within a specified edit distance to the first stroke id sequence;
determining that the second stroke id sequence is a valid stroke id sequence that maps to a second ideogram character of the ideogram language;
1.determining that the second stroke id sequence is a valid stroke id sequence that maps to a second ideogram character of the ideogram language;
and a fuzzy search engine executable to: generate a variation of the first search query using the ideogram character variation;
1.create a second search phrase, the second set of ideogram characters comprising a subset of the first set of ideogram characters;
perform a search of the collection of character recognized documents using the first search query and the variation of the first search query to obtain a search result identifying relevant documents from the collection of character recognized documents;
1. performing a search of a set of character recognized documents using the first search phrase and the second search phrase to obtain a result identifying any documents in the set of character recognized documents
and provide the search result to the search user interface.
1. providing the result for display in the user interface.
9. A system comprising: a data repository storing: a collection of character recognized documents;
1. a set of character recognized documents
and a stroke mapping that maps strokes to stroke identifier (ids);
maintaining a set of valid stroke id sequences mapped to a set of valid ideogram characters of an ideogram language;
receive a first phrase extracted from the selected document, the first phrase from the selected document comprising first ideogram character from an ideogram language;
receiving a first search phrase for searching a set of character recognized documents, the first search phrase submitted by a user via a user interface and comprising a first set of ideogram characters;
partitioning a first ideogram character into a first plurality of strokes;
partitioning a first ideogram character from the first search phrase into a plurality of strokes;
mapping each stroke of the first plurality of strokes to a corresponding stroke identifier (id) according to the stroke mapping to create a first stroke id sequence comprising a plurality of stroke identifiers;
mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id) to create a first stroke id sequence comprising a plurality of stroke identifiers;
modifying the first stroke id sequence to create a second stroke id sequence within a specified distance to the first stroke id sequence;
modifying the first stroke id sequence to create a second stroke id sequence within a specified edit distance to the first stroke id sequence;
determining that the second stroke id sequence is a valid stroke id sequence that maps to a second ideogram character of the ideogram language;
determining that the second stroke id sequence is a valid stroke id sequence that maps to a second ideogram character of the ideogram language;
and a fuzzy search engine executable to: generate a variation of the first phrase using the ideogram character variation;
based on the determination that the second stroke id sequence is the valid stroke id sequence, combining the second ideogram character with a second set of ideogram characters to create a second search phrase, the second set of ideogram characters comprising a subset of the first set of ideogram characters;
perform a search of the collection of character recognized documents using the variation of the first phrase to obtain a search result identifying documents from the collection of character recognized documents that are similar to the selected document;
performing a search of a set of character recognized documents using the first search phrase and the second search phrase to obtain a result identifying any documents in the set of character recognized documents that match the first search phrase and any documents in the set of character recognized documents that match the second search phrase;
and provide the search result to the search user interface.
and providing the result for display in the user interface.
17. A system comprising:
and a stroke mapping that maps strokes to stroke identifier (ids);
1. maintaining a set of valid stroke id sequences mapped to a set of valid ideogram characters of an ideogram language;
and a character analyzer is executable to: generate a second set of ideogram characters, wherein generating the second set of ideogram characters comprises: for each of the set of extracted ideogram characters;
receiving a first search phrase for searching a set of character recognized documents, the first search phrase submitted by a user via a user interface and comprising a first set of ideogram characters;
partitioning the extracted ideogram character into a respective plurality of strokes;
partitioning a first ideogram character from the first search phrase into a plurality of strokes;
mapping each stroke of the respective plurality of strokes to a corresponding stroke identifier (id) according to the stroke mapping to create a respective stroke id sequence comprising a plurality of stroke identifiers;
mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id) to create a first stroke id sequence comprising a plurality of stroke identifiers;
modifying the respective stroke id sequence to create a variation of the respective stroke id sequence within a specified distance to the respective stroke id sequence;
modifying the first stroke id sequence to create a second stroke id sequence within a specified edit distance to the first stroke id sequence;
determining that the variation of the respective stroke id sequence is a valid stroke id sequence that maps to a corresponding ideogram character of the ideogram language;
determining that the second stroke id sequence is a valid stroke id sequence that maps to a second ideogram character of the ideogram language;
and adding the variation of the respective stroke id sequence to the second set of ideogram characters; combine the set of extracted ideogram characters and the second set of ideogram characters to generate a plurality of phrases;
based on the determination that the second stroke id sequence is the valid stroke id sequence, combining the second ideogram character with a second set of ideogram characters to create a second search phrase, the second set of ideogram characters comprising a subset of the first set of ideogram characters;
purge incorrect phrases from the plurality of phrases to generate a set of candidate phrases;
11. purging any incorrect phrases from the set of phrases to determine a remaining set of phrases;
select a phrase from the set of candidate phrases;
11. selecting a candidate phrase from the remaining set of phrases;
and add the selected phrase as character recognized content to a character recognized document corresponding to the selected document image.
11. storing the first document as a character recognized document in the set of character recognized documents, wherein storing the first document as the character recognized document comprises storing the candidate phrase as a recognized phrase for the first document.
Each patent claim in the above chart contains all the limitations recited in the corresponding claim of the instant application. In other words, each patent claim is either 1) narrower than or 2) substantially equivalent to the corresponding claim of the instant application. It would have been obvious to a person of ordinary skill in the data processing art at the time the invention was made to omit elements when the remaining elements perform as before. A person of ordinary skill could have arrived at the present claims by omitting the details of the patent claims. See In re Karlson (CCPA) 136 USPQ 184, decided January 16, 1963 (“Omission of element and its function in combination is obvious expedient if remaining elements perform same functions as before.”).
Regarding claim 1, ‘US12153624 B2‘ discloses the features of claim 1 of the instant application as shown above,
However ‘‘US12153624 B2‘ does not recite “a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises:”
However JANG discloses:
a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises: (JANG, page 4- In addition, for each Chinese character, in dictionary, has at least one dictionary entry. In dictionary entry, the stroke code sequence of the stroke when storing the ISN of corresponding Chinese character and writing this Chinese character .In order to adapt to the writing style of different user, for some Chinese character a plurality of stroke code sequences can be set, therefore, a plurality of dictionary entries are set. Also can comprise other guide in the dictionary entry, dictionary can adopt multiple mode to realize (for example, database), and all these are not construed as limiting the invention.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of ‘‘US12153624 B2‘ with the teaching of JANG to improve the recognition of character input by generating candidate characters based on stroke analysis.
Regarding claim 9, ‘US12153624 B2‘ discloses the features of claim 9 of the instant application as shown above,
However ‘‘US12153624 B2‘ does not recite “a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a search user interface executable to: receive a document identifier for a selected document;”
However Henry discloses:
a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a search user interface executable to: receive a document identifier for a selected document; (Henry, Fig. 47 item 4710, “Receive Document Identifier”; [0158] At input information block 110, a user may select the starting documents to be analyzed. In an example, the user may input a patent application and drawings; [0159] For inputs that are in graphical format, such as a TIFF file or PDF file that does not contain metadata, the text and symbol information are converted first using optical character recognition (OCR) and then metadata is captured’ [0175] when the user requests analysis of a published application or patent. In such cases, server processor 210 may receive an identifier, such as a patent number or published application number,)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of ‘‘US12153624 B2‘ with the teaching of Henry to allow other systems and methods to identify which document the embodiment was produced from, (Henry, [0517]) and also to know what document the image is provided from, (Henry, [0534]).
However ‘‘US12153624 B2‘ in view of Henry does not clearly disclose: a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises:
However JANG discloses:
a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises: (JANG, page 4- In addition, for each Chinese character, in dictionary, has at least one dictionary entry. In dictionary entry, the stroke code sequence of the stroke when storing the ISN of corresponding Chinese character and writing this Chinese character .In order to adapt to the writing style of different user, for some Chinese character a plurality of stroke code sequences can be set, therefore, a plurality of dictionary entries are set. Also can comprise other guide in the dictionary entry, dictionary can adopt multiple mode to realize (for example, database), and all these are not construed as limiting the invention.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of ‘‘US12153624 B2‘ in view of Henry with the teaching of JANG to improve the recognition of character input by generating candidate characters based on stroke analysis.
Regarding claim 17, ‘US12153624 B2‘ discloses the features of claim 17 of the instant application as shown above,
However ‘‘US12153624 B2‘ does not recite “a collection of document images; a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user; an optical character recognition engine executable to: perform optical character recognition on the selected document image to generate a set of extracted ideogram characters according to an ideogram language;”
However Ide discloses:
a collection of document images; (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance; page 5, 6th-8th paragraphs, e.g. 6th paragraph, The storage device 15 includes an image database (image DB) 21…7th paragraph, The image database 21 stores various image data. The search target database 22 stores data used as a search target. In the present embodiment, text data obtained as a result of reading a newspaper page by the scanner unit 14 and character recognition of an article on the page is stored in the search target database 22 as a search target. …8th paragraph, Further, the image related to the article is captured by the scanner unit 14 and stored in the image database 21. At that time, as shown in FIG. 2, the search target database 22 stores link information indicating the storage destination of the image data together with the text data that has been character-recognized.)
an optical character recognition engine executable to: perform optical character recognition on the selected document image to generate a set of extracted ideogram characters according to an ideogram language; (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance; page 5, 6th-8th paragraphs, e.g. 6th paragraph, The storage device 15 includes an image database (image DB) 21…7th paragraph, The image database 21 stores various image data. The search target database 22 stores data used as a search target. In the present embodiment, text data obtained as a result of reading a newspaper page by the scanner unit 14 and character recognition of an article on the page is stored in the search target database 22 as a search target. …8th paragraph, Further, the image related to the article is captured by the scanner unit 14 and stored in the image database 21. At that time, as shown in FIG. 2, the search target database 22 stores link information indicating the storage destination of the image data together with the text data that has been character-recognized.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of ‘US12153624 B2‘ with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
However ‘‘US12153624 B2‘ in view of Ide does not clearly disclose: a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user;
However Lehoux discloses:
a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user; (Lehoux, [0029]-[0030] , this pointer movement (dragging the mouse) selects the data 112 from the input document 110 that is to be processed through optical character recognition processes)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of ‘‘US12153624 B2‘ in view of Ide with the teaching of Lehoux to recognize characters in the image, and aggregate the recognition results from the different optical character recognition processes to produce a final character recognition result that includes relatively higher confidence recognized characters and relatively lower confidence recognized characters, (Lehoux, abstract).
Claims 1, 9 and 17 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1, 9 and 16 of U.S. Patent No. US11321384B2 Although the claims at issue are not identical, they are not patentably distinct from each other because they are obvious variants of each other.
The chart below shows the correspondence between the claims in the current application and the claims in the patent.
Instant Application 18/929,730
U.S. Patent No. US11321384B2
1. A system comprising: a data repository storing: a collection of character recognized documents
1.
a set of character recognized documents in a repository,
and a stroke mapping that maps strokes to stroke identifier (ids);
1. mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id) to create an original stroke id sequence comprising a plurality of stroke identifiers;
a memory storing instructions executable by the processor, the instructions comprising: a search user interface executable to receive a first search query that includes a first ideogram character from an ideogram language;
1. receiving a search phrase for searching a set of character recognized documents in a repository, the search phrase including an original ideogram character;
partitioning the first ideogram character into a first plurality of strokes;
1. partitioning the original ideogram character from the search phrase into a plurality of strokes;
mapping each stroke of the first plurality of strokes to a corresponding stroke identifier (id) according to the stroke mapping to create a first stroke id sequence comprising a plurality of stroke identifiers;
1. mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id) to create an original stroke id sequence comprising a plurality of stroke identifiers;
modifying the first stroke id sequence to create a second stroke id sequence within a specified distance to the first stroke id sequence;
creating a set of candidate stroke id sequences by modifying the original stroke id sequence using a number of edits based on the threshold distance;
determining that the second stroke id sequence is a valid stroke id sequence that maps to a second ideogram character of the ideogram language;
discarding from the set of candidate stroke id sequences any candidate stroke id sequence that does not map to a corresponding ideogram character of an ideogram language to determine a set of remaining candidate stroke id sequences;
and a fuzzy search engine executable to: generate a variation of the first search query using the ideogram character variation;
identifying a set of replacement ideogram characters based on corresponding edit distances of the set of remaining candidate stroke id sequences to the original stroke id sequence;
creating new phrases, creating the new phrases comprising replacing the original ideogram character with each replacement ideogram character from the set of replacement ideogram characters in the search phrase
perform a search of the collection of character recognized documents using the first search query and the variation of the first search query to obtain a search result identifying relevant documents from the collection of character recognized documents;
1. performing a search of the set of character recognized documents using the search phrase and the new phrases to obtain a result identifying any documents in the set of character recognized documents that match the search phrase and any documents in the set of character recognized documents that match the new phrases;
and provide the search result to the search user interface.
1. and presenting the result.
9. A system comprising: a data repository storing: a collection of character recognized documents;
16. a set of character recognized documents in a repository,
and a stroke mapping that maps strokes to stroke identifier (ids);
mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id) to create an original stroke id sequence comprising a plurality of stroke identifiers;
receive a first phrase extracted from the selected document, the first phrase from the selected document comprising first ideogram character from an ideogram language;
receiving a search phrase for searching a set of character recognized documents in a repository, the search phrase including an original ideogram character,
partitioning a first ideogram character into a first plurality of strokes;
partitioning the original ideogram character from the search phrase into a plurality of strokes
mapping each stroke of the first plurality of strokes to a corresponding stroke identifier (id) according to the stroke mapping to create a first stroke id sequence comprising a plurality of stroke identifiers;
mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id) to create an original stroke id sequence comprising a plurality of stroke identifiers;
modifying the first stroke id sequence to create a second stroke id sequence within a specified distance to the first stroke id sequence;
creating a set of candidate stroke id sequences by modifying the original stroke id sequence using a number of edits based on the threshold distance;
determining that the second stroke id sequence is a valid stroke id sequence that maps to a second ideogram character of the ideogram language;
discarding from the set of candidate stroke id sequences any candidate stroke id sequence that does not map to a corresponding ideogram character of an ideogram language to determine a set of remaining candidate stroke id sequences; and
and a fuzzy search engine executable to: generate a variation of the first phrase using the ideogram character variation;
creating new phrases, creating the new phrases comprising replacing the original ideogram character with each replacement ideogram character from the set of replacement ideogram characters in the search phrase;
perform a search of the collection of character recognized documents using the variation of the first phrase to obtain a search result identifying documents from the collection of character recognized documents that are similar to the selected document;
performing a search of a set of character recognized documents using the search phrase and the new phrases to obtain a result identifying any documents in the set of character recognized documents that match the search phrase and any documents in the set of character recognized documents that match the new phrase;
and provide the search result to the search user interface.
and presenting the result.
17. A system comprising: a data repository storing: a collection of document images;
and a stroke mapping that maps strokes to stroke identifier (ids);
9. accessing a result of an (OCR) performed on a document image, the result including an original phrase that includes an original ideogram character recognized from the document image;
16. a set of character recognized documents in a repository,
16. mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id) to create an original stroke id sequence comprising a plurality of stroke identifiers;
an optical character recognition engine executable to: perform optical character recognition on the selected document image to generate a set of extracted ideogram characters according to an ideogram language;
9. accessing a result of an (OCR) performed on a document image, the result including an original phrase that includes an original ideogram character recognized from the document image;
and a character analyzer is executable to: generate a second set of ideogram characters, wherein generating the second set of ideogram characters comprises: for each of the set of extracted ideogram characters;
9. identifying a set of replacement ideogram character
partitioning the extracted ideogram character into a respective plurality of strokes;
9. partitioning the original ideogram character into a plurality of strokes;
mapping each stroke of the respective plurality of strokes to a corresponding stroke identifier (id) according to the stroke mapping to create a respective stroke id sequence comprising a plurality of stroke identifiers;
9. mapping each stroke of the plurality of strokes to a corresponding stroke identifier (id) to create an original stroke id sequence comprising a plurality of stroke identifiers;
modifying the respective stroke id sequence to create a variation of the respective stroke id sequence within a specified distance to the respective stroke id sequence;
9. creating a set of candidate stroke id sequences by modifying the original stroke id sequence using a number of edits based on the threshold distance;
determining that the variation of the respective stroke id sequence is a valid stroke id sequence that maps to a corresponding ideogram character of the ideogram language;
9. discarding from the set of candidate stroke id sequences any candidate stroke id sequence that does not map to a corresponding ideogram character of an ideogram language to determine a set of remaining candidate stroke id sequences;
and adding the variation of the respective stroke id sequence to the second set of ideogram characters; combine the set of extracted ideogram characters and the second set of ideogram characters to generate a plurality of phrases;
9. identifying a set of replacement ideogram characters based on corresponding edit distances of the set of remaining candidate stroke id sequences to the original stroke id sequence, the set of replacement ideogram characters including the candidate ideogram character;
generate a set of new phrases, generating the set of new phrases comprising replacing the original ideogram character with each replacement ideogram character from the set of replacement ideogram characters;
purge incorrect phrases from the plurality of phrases to generate a set of candidate phrases;
9. purge any incorrect phrases from the set of new phrases to create a set of remaining phrases;
select a phrase from the set of candidate phrases;
9. select a replacement phrase from the set of remaining phrases and replace the original phrase with the remaining phrase in a character recognized document;
and add the selected phrase as character recognized content to a character recognized document corresponding to the selected document image.
9. storing the character recognized document including the replacement phrase with the candidate ideogram character in a searchable data repository in association with a document identifier usable to retrieve the document image.
Each patent claim in the above chart contains all the limitations recited in the corresponding claim of the instant application. In other words, each patent claim is either 1) narrower than or 2) substantially equivalent to the corresponding claim of the instant application. It would have been obvious to a person of ordinary skill in the data processing art at the time the invention was made to omit elements when the remaining elements perform as before. A person of ordinary skill could have arrived at the present claims by omitting the details of the patent claims. See In re Karlson (CCPA) 136 USPQ 184, decided January 16, 1963 (“Omission of element and its function in combination is obvious expedient if remaining elements perform same functions as before.”).
Regarding claim 1, ‘US11321384B2’ discloses the features of claim 1 of the instant application as shown above,
However ‘US11321384B2 ’does not recite “a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises:”
However JANG discloses:
a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises: (JANG, page 4- In addition, for each Chinese character, in dictionary, has at least one dictionary entry. In dictionary entry, the stroke code sequence of the stroke when storing the ISN of corresponding Chinese character and writing this Chinese character .In order to adapt to the writing style of different user, for some Chinese character a plurality of stroke code sequences can be set, therefore, a plurality of dictionary entries are set. Also can comprise other guide in the dictionary entry, dictionary can adopt multiple mode to realize (for example, database), and all these are not construed as limiting the invention.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of ‘US11321384B2’ with the teaching of JANG to improve the recognition of character input by generating candidate characters based on stroke analysis.
Regarding claim 9, ‘US11321384B2’ discloses the features of claim 9 of the instant application as shown above,
However ‘US11321384B2’ does not recite “a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a search user interface executable to: receive a document identifier for a selected document; a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises:”
However Henry discloses:
a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a search user interface executable to: receive a document identifier for a selected document; (Henry, Fig. 47 item 4710, “Receive Document Identifier”; [0158] At input information block 110, a user may select the starting documents to be analyzed. In an example, the user may input a patent application and drawings; [0159] For inputs that are in graphical format, such as a TIFF file or PDF file that does not contain metadata, the text and symbol information are converted first using optical character recognition (OCR) and then metadata is captured’ [0175] when the user requests analysis of a published application or patent. In such cases, server processor 210 may receive an identifier, such as a patent number or published application number,)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of ‘US11321384B2’ with the teaching of Henry to allow other systems and methods to identify which document the embodiment was produced from, (Henry, [0517]) and also to know what document the image is provided from, (Henry, [0534]).
However ‘US11321384B2’ in view of Henry does not clearly disclose:
a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises:
However JANG discloses:
a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises: (JANG, page 4- In addition, for each Chinese character, in dictionary, has at least one dictionary entry. In dictionary entry, the stroke code sequence of the stroke when storing the ISN of corresponding Chinese character and writing this Chinese character .In order to adapt to the writing style of different user, for some Chinese character a plurality of stroke code sequences can be set, therefore, a plurality of dictionary entries are set. Also can comprise other guide in the dictionary entry, dictionary can adopt multiple mode to realize (for example, database), and all these are not construed as limiting the invention.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of ‘US11321384B2’ in view of Henry with the teaching of JANG to improve the recognition of character input by generating candidate characters based on stroke analysis.
Regarding claim 17, ‘US11321384B2’ discloses the features of claim 17 of the instant application as shown above,
However ‘US11321384B2’ does not recite “a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user;”
However ‘US11321384B2’ does not clearly disclose: “a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user;”
However Lehoux discloses:
a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user; (Lehoux, [0029]-[0030] , this pointer movement (dragging the mouse) selects the data 112 from the input document 110 that is to be processed through optical character recognition processes)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of ‘US11321384B2’ in view of Ide with the teaching of Lehoux to recognize characters in the image, and aggregate the recognition results from the different optical character recognition processes to produce a final character recognition result that includes relatively higher confidence recognized characters and relatively lower confidence recognized characters, (Lehoux, abstract).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-3 and 5-7 are rejected under 35 U.S.C. 103 as being unpatentable over JANG (CN1156741C) in view of Ide (JP2011065597)
Regarding claim 1, JANG discloses: A system comprising: a data repository storing: and a stroke mapping that maps strokes to stroke identifier (ids); (JANG page 4-The stroke code sequence is that the stroke code that utilizes current stroke and current stroke hand-written stroke before to identify constitutes. Stroke code (corresponding to “stroke identifier (ids)”) sequence …in being stored in the dictionary of computer memory … In addition, for each Chinese character, in dictionary, has at least one dictionary entry. In dictionary entry, the stroke code sequence of the stroke when storing the ISN of corresponding Chinese character and writing this Chinese character .In order to adapt to the writing style of different user, for some Chinese character a plurality of stroke code sequences can be set, therefore, a plurality of dictionary entries are set. Also can comprise other guide in the dictionary entry, dictionary can adopt multiple mode to realize (for example, database), and all these are not construed as limiting the invention.
a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a search user interface executable to receive a first search query that includes a first ideogram character from an ideogram language; (JANG, page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively; Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase; (Note: Examiner interprets that Chinese character corresponds to “ideogram character from an ideogram language”))
a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises: (JANG, page 4- In addition, for each Chinese character, in dictionary, has at least one dictionary entry. In dictionary entry, the stroke code sequence of the stroke when storing the ISN of corresponding Chinese character and writing this Chinese character .In order to adapt to the writing style of different user, for some Chinese character a plurality of stroke code sequences can be set, therefore, a plurality of dictionary entries are set. Also can comprise other guide in the dictionary entry, dictionary can adopt multiple mode to realize (for example, database), and all these are not construed as limiting the invention.)
partitioning the first ideogram character into a first plurality of strokes; (JANG, page 7, utilizes the recognition system of scribbling of PalmPilot computing machine hand-written stroke is identified as scribbles letter or special symbol. Step 52 will identify scribbles letter and special symbol is converted at least one stroke code; page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively;)
mapping each stroke of the first plurality of strokes to a corresponding stroke identifier (id) according to the stroke mapping to create a first stroke id sequence comprising a plurality of stroke identifiers; (JANG, page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively; Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase;)
modifying the first stroke id sequence to create a second stroke id sequence within a specified distance to the first stroke id sequence; (JANG, page 5- the difference of the number of times that the contained various stroke codes of the stroke code sequence in the dictionary entry and the stroke code sequence of being discerned occur in two stroke code sequences respectively and less than predetermined threshold (corresponding to “within a specified distance”) . This is to occur indivedual mistakes when allowing the user writing Chinese character. This search condition can be expressed as: i=1I(xi-sji),θ is a threshold value …At this moment, if threshold value θ=1, though then the user has wrongly write a stroke, step 13 also can retrieve and " hand " corresponding dictionary entry, with " hand " as candidate Chinese character. Adopt above-mentioned three kinds of search methods, can dynamically revise (corresponding to “modifying”) stroke code sequence in the corresponding dictionary entry according to user's writing style. For example in last example, though " 1,1; 1,15 " is wrong stroke code sequence, this is user's a writing style, therefore, can in dictionary, increase by one and contain the dictionary entry of stroke code sequence " 1,1; 1; 15 " accordingly with " hand ", perhaps will be original with " hand " corresponding dictionary entry in the stroke code sequence change " 1,1; 1,15 " into.)
determining that the second stroke id sequence is a valid stroke id sequence that maps to a second ideogram character of the ideogram language; (JANG, Step 14 utilizes the Hanzi internal code that obtains to show corresponding Chinese character. Step 15 judges whether the user continues to write stroke, i.e. the next stroke of institute's writing of Chinese characters. With this step be accordingly, the user checks the candidate Chinese character that demonstrates, to find whether to have shown the Chinese character of writing. If the user finds not show the Chinese character of being write, promptly continue to write next stroke. If the judged result of step 15 is a "Yes", then forward step 17 to. Step 17 is similar with step 11, and difference only is to detect the next stroke that the user writes. If the judged result of step 15 is a "No", then step 16 is selected the user from shown Chinese character a Chinese character as a result of be input in the computing machine or other devices in.)
and provide the search result to the search user interface. (JANG, page 2, Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase; Step display in the user writing stroke, dynamically shows described at least one candidate Chinese character/phrase; Transfer step continues to write next stroke if judge the user, then transfers to above-mentioned stroke identification step; Recognition result generates step, has selected at least one shown candidate Chinese character/phrase one if judge the user, then with a selected Chinese character/phrase as handwritten Chinese character/phrase recognition result.)
However JANG, does not clearly disclose:
a collection of character recognized documents; and a fuzzy search engine executable to: generate a variation of the first search query using the ideogram character variation; perform a search of the collection of character recognized documents using the first search query and the variation of the first search query to obtain a search result identifying relevant documents from the collection of character recognized documents;
However Ide discloses:
a collection of character recognized documents; (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance)
and a fuzzy search engine executable to: generate a variation of the first search query using the ideogram character variation; (Ide, page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; See also e.g. page 7, 5th paragraph, line 2- the CPU 11 determines each character constituting the input search word. Is replaced with the corresponding misrecognized character in the misrecognition probability database 23 to generate a search word with a high probability of misrecognition , and text data is generated using the generated new search word (corresponding to “a variation of the first search query”). Search again.)
perform a search of the collection of character recognized documents using the first search query and the variation of the first search query to obtain a search result identifying relevant documents from the collection of character recognized documents; (See Ide, page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; See also e.g. page 7, 5th paragraph, line 2- the CPU 11 determines each character constituting the input search word. Is replaced with the corresponding misrecognized character in the misrecognition probability database 23 to generate a search word with a high probability of misrecognition, and text data is generated using the generated new search word (corresponding to “the variation of the first search query"). Search again (step S13); see also page 8, 4th paragraph, line 6- and displays them in a predetermined format on the display unit 12 so that the user can confirm them. Output (step S16).)
and provide the search result to the search user interface.
(See Ide page 8, 4th paragraph- CPU 11 reads these search results from the work area of the memory 16, develops them in the output buffer 26, and displays them in a predetermined format on the display unit 12 so that the user can confirm them. Output (step S16).
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
Regarding claim 2, JANG in view of Ide discloses all of the features with respect to claim 1 as outlined above. Claim 2 further recites: wherein the strokes in the stroke mapping are ordered, wherein the ordering of the strokes in the stroke mapping creates a determinable partitioning of ideogram characters into the strokes of the stroke mapping. (JANG, page 8- in order to adapt to the writing style of different user, also can define the another kind of order of strokes of " state", in dictionary, comprise another dictionary entry: Stroke code sequence: 3,8,1,3,1,1,6,1…As shown in Figure 9, it is perpendicular that the user at first writes the first stroke, and then step 51 (first step of step 17) is identified as this stroke and scribbles letter " i "; Step 52 (second step of step 17) will be scribbled letter " i " and will be converted to stroke code 3;)
Regarding claim 3, JANG in view of Ide discloses all of the features with respect to claim 2 as outlined above. Claim 3 further recites: wherein partitioning the first ideogram character into a plurality of strokes comprises determining whether each stroke from the stroke mapping is found in the first ideogram character until either the first ideogram character is covered by strokes from the stroke mapping or all the strokes in the stroke mapping have been processed. (JANG, page 8- in order to adapt to the writing style of different user, also can define the another kind of order of strokes of " state", in dictionary, comprise another dictionary entry: Stroke code sequence: 3,8,1,3,1,1,6,1…As shown in Figure 9, it is perpendicular that the user at first writes the first stroke, and then step 51 (first step of step 17) is identified as this stroke and scribbles letter " i "; Step 52 (second step of step 17) will be scribbled letter " i " and will be converted to stroke code 3; page 8, that the user at first writes the first stroke, and then step 51 (first step of step 17) is identified as this stroke and scribbles letter " i "; Step 52 (second step of step 17) will be scribbled letter " i " and will be converted to stroke code 3; Step 12 forms stroke code sequence " 3 "; Step 13 retrieve some candidate Chinese characters " Bu Shen mouth ... " Step 14 shows these candidate Chinese characters. Because the Chinese character " state " that does not have the user writing in shown Chinese character, so the judged result of step 15 is a "No", process proceeds to step 17. Then, the user writes second cross break, and then step 51 (first step of step 17) is identified as this stroke and scribbles letter " t "; Step 52 (second step of step 17) will be scribbled letter " t " and will be converted to three stroke codes 8,9,10; So step 12 forms three stroke code sequences " 3,8 ", " 3,9 ", " 3,10 "; Step 13 corresponding to three stroke code sequences retrieve three groups of candidate Chinese characters " mouthful seeing ... ", " superfluous writing ... ", " Shen Gang ... " Step 14 shows some Chinese character in above-mentioned three groups of candidate Chinese characters. Because the Chinese character " state " that does not have the user writing in shown Chinese character, so the judged result of step 15 is a "No", process proceeds to step 17. Similarly, the user continues to write the 3rd, the 4th ...After having write the 5th, have in the shown candidate Chinese character " state ". Therefore, the judged result in the step 15 is a "Yes", and process proceeds to step 16; The user can be input to " state " in the computing machine by clicking shown " state " word on the screen.)
Regarding claim 5, JANG in view of Ide discloses all of the features with respect to claim 1 as outlined above. JANG does not clearly disclose:
wherein the search result references the relevant documents and wherein the search user interface is executable to store the search result.
However Ide discloses:
wherein the search result references the relevant documents and wherein the search user interface is executable to store the search result. (See Ide, page 10, 3rd paragraph- Data that matches either the search word or each search candidate is extracted from the text data stored in 22 as a search result, and stored in a work area (not shown) of the memory 16; page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; page 8, If an image is included in the extracted article, the corresponding image data is read from the image database 21 based on the link information added to the text data of the article and displayed together with the text data. It will be.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
Regarding claim 6, JANG in view of Ide discloses all of the features with respect to claim 1 as outlined above. JANG does not clearly disclose:
wherein the search result references the relevant documents and wherein the search user interface is executable to provide the search result to a user.
However Ide discloses:
wherein the search result references the relevant documents and wherein the search user interface is executable to provide the search result to a user. (Ide, page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; page 8, 4th paragraph, line 6- and displays them in a predetermined format on the display unit 12 so that the user can confirm them. Output (step S16); page 8, If an image is included in the extracted article, the corresponding image data is read from the image database 21 based on the link information added to the text data of the article and displayed together with the text data. It will be.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
Regarding claim 7, JANG in view of Ide discloses all of the features with respect to claim 6 as outlined above. JANG does not clearly disclose:
wherein the data repository further stores document images corresponding to the character recognized documents in the collection of character recognized documents, and wherein providing the search result to the user comprises presenting the document images corresponding to the relevant documents to the user.
However Ide discloses:
wherein the data repository further stores document images corresponding to the character recognized documents in the collection of character recognized documents, (Ide, page 10- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance may be acquired from the outside via a recording medium or a communication medium; page 8, Here, a predetermined number (100) of articles related to “election violation” is extracted from the text data of the search target database 22 and displayed on the screen of the display unit 12. If an image is included in the extracted article, the corresponding image data is read from the image database 21 based on the link information added to the text data of the article and displayed together with the text data; page 5- The image database 21 stores various image data. The search target database 22 stores data used as a search target.)
and wherein providing the search result to the user comprises presenting the document images corresponding to the relevant documents to the user. ( Ide, page 8, Here, a predetermined number (100) of articles related to “election violation” is extracted from the text data of the search target database 22 and displayed on the screen of the display unit 12. If an image is included in the extracted article, the corresponding image data is read from the image database 21 based on the link information added to the text data of the article and displayed together with the text data; page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; page 8, 4th paragraph, line 6- and displays them in a predetermined format on the display unit 12 so that the user can confirm them. Output (step S16).)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
Claims 4 are rejected under 35 U.S.C. 103 as being unpatentable over JANG (CN1156741C) in view of Ide (JP2011065597) in further view of Yang (“Comparison of shape-based and stroke-based methods for segmenting handwritten Chinese characters”, hereinafter Yang)
Regarding claim 4, JANG in view of Ide discloses all of the features with respect to claim 1 as outlined above. JANG in view of Ide does not clearly disclose: wherein the strokes in the stroke mapping are ordered based on size and encapsulation.
However Yang discloses:
wherein the strokes in the stroke mapping are ordered based on size and encapsulation. (Yang, page 4, section 2.2.2 Merge Stroke-bounding Boxes Step1. With the information of the stroke types assigned (horizontal, vertical, up-left-slanting, up-right-slanting) and the stroke lengths (long stroke, short stroke), we can classify the strokes into eight types. According to the coordinates of their top-left points, all stroke-bounding boxes are sorted by the X-coordinates in non-decreasing order. If the X-coordinates become the same, then the sorting process changes to sort the Y coordinates, also in ascending order. Step2. Merge two boxes using relative positions of their up-left point and bottom-right point. First, only merge those boxes corresponding to short strokes. Second, merge those boxes corresponding to the slanting-strokes. Third, if two boxes are overlapped with each other more than half of the width of either box and the width of the merged box is less than K (determined by experience), then they will be merged. Fourth, similar to the third step except that the overlapped area is one third of the width of either box. A series of bounding-boxes of Chinese characters are created.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide with the teaching of Yang to extract the Chinese characters correctly and efficiently and also to solve the over-splitting and overlapping problems and to find the best way to segment the characters, (Yang, page 1, section 1).
Claims 8 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over JANG (CN1156741C) in view of Ide (JP2011065597) in view of Okamoto (US 2015/0199582 Al ) in view of Lehoux (US20160313881A1)
Regarding claim 8, JANG in view of Ide discloses all of the features with respect to claim 1 as outlined above. Claim 8 further recites: and wherein the character analyzer is further executable to: generate a set of extracted ideogram character stroke id sequences from the set of extracted ideogram characters using the stroke mapping; (JANG, page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively; Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase;)
select a third stroke id sequence, the third stroke id sequence selected from the set of extracted ideogram character stroke id sequences; (JANG, page 5, For example, for Chinese character " life ", suppose that corresponding dictionary entry is k clauses and subclauses in the dictionary, the stroke code sequence in these clauses and subclauses is " 5,1,1,3,1 " (the stroke code is referring to table shown in Figure 7), Sk1=3 then, Sk3=1, Sk5=1.If the user is by the order of strokes writing of Chinese characters " life " of " 5,1,3,1,1 ", x1=3 then, x3=1, x5=1.So therefore X=S retrieves k dictionary entry, corresponding " life " is as candidate Chinese character.)
modify the third stroke id sequence to create a fourth stroke id sequence within the specified distance to the third stroke id sequence; (JANG, page 5- the difference of the number of times that the contained various stroke codes of the stroke code sequence in the dictionary entry and the stroke code sequence of being discerned occur in two stroke code sequences respectively and less than predetermined threshold (corresponding to “within the specified distance”) . This is to occur indivedual mistakes when allowing the user writing Chinese character. This search condition can be expressed as: i=1I(xi-sji),θ is a threshold value …At this moment, if threshold value θ=1, though then the user has wrongly write a stroke, step 13 also can retrieve and " hand " corresponding dictionary entry, with " hand " as candidate Chinese character. Adopt above-mentioned three kinds of search methods, can dynamically revise (corresponding to “modify”) stroke code sequence in the corresponding dictionary entry according to user's writing style. For example in last example, though " 1,1; 1,15 " is wrong stroke code sequence, this is user's a writing style, therefore, can in dictionary, increase by one and contain the dictionary entry of stroke code sequence " 1,1; 1; 15 " accordingly with " hand ", perhaps will be original with " hand " corresponding dictionary entry in the stroke code sequence change " 1,1; 1,15 " into.)
determine that the fourth stroke id sequence corresponds to a fourth ideogram character; (JANG, page 5, Step 14 utilizes the Hanzi internal code that obtains to show corresponding Chinese character. Step 15 judges whether the user continues to write stroke, i.e. the next stroke of institute's writing of Chinese characters. With this step be accordingly, the user checks the candidate Chinese character that demonstrates, to find whether to have shown the Chinese character of writing. If the user finds not show the Chinese character of being write, promptly continue to write next stroke. If the judged result of step 15 is a "Yes", then forward step 17 to. Step 17 is similar with step 11, and difference only is to detect the next stroke that the user writes. If the judged result of step 15 is a "No", then step 16 is selected the user from shown Chinese character a Chinese character as a result of be input in the computing machine or other devices in.)
combine a plurality of ideogram characters to generate a plurality of phrases, wherein the plurality of ideogram characters includes the fourth ideogram character, and wherein at least one of the plurality of phrases includes the fourth ideogram character; (JANG, page 9, Shown in Figure 11 (A), when the user writes the phonetic sign of each word in the phrase " Guizhou ", during such as first letter " g " of "gui ", step 20A is identified as and scribbles letter " g "; Step 20B determines a scope in the dictionary, for example can show simultaneously in this scope corresponding Chinese character " doing a worker ... "When the user continues to write the phonetic sign of second word in the phrase, such as first letter " z " of " zhong ", step 20B determines a scope in the dictionary, for example can show simultaneously in this scope the corresponding candidate phrase " various work transformations ... "It is perpendicular that the user continues to write the first stroke, then step 23 retrieve and step 24 show candidate phrase " the valuable stubbornness in Guizhou ... "At this moment, the user can import this phrase by clicking shown " Guizhou "; page 7, utilizes the recognition system of scribbling of PalmPilot computing machine hand-written stroke is identified as scribbles letter or special symbol. Step 52 will identify scribbles letter and special symbol is converted at least one stroke code; )
select a phrase from the set of candidate phrases;
(JANG, page 9, the selection to shown described at least one candidate Chinese character/phrase according to the user, with a selected Chinese character/phrase as handwritten Chinese character/phrase recognition result.)
However JANG does not clearly disclose:
wherein the data repository further stores a collection of document images and wherein the instructions executable by the processor further comprise: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user; an optical character recognition engine executable to: perform optical character recognition on the selected document image to generate a set of extracted ideogram characters; purge an incorrect phrase from the plurality of phrases to generate a set of candidate phrases; and add the selected phrase as character recognized content extracted from the selected document image to a character recognized document corresponding to the selected document image.
However Ide discloses:
wherein the data repository further stores a collection of document images (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance; page 5, 6th-8th paragraphs, e.g. 6th paragraph, The storage device 15 includes an image database (image DB) 21…7th paragraph, The image database 21 stores various image data. The search target database 22 stores data used as a search target. In the present embodiment, text data obtained as a result of reading a newspaper page by the scanner unit 14 and character recognition of an article on the page is stored in the search target database 22 as a search target. …8th paragraph, Further, the image related to the article is captured by the scanner unit 14 and stored in the image database 21. At that time, as shown in FIG. 2, the search target database 22 stores link information indicating the storage destination of the image data together with the text data that has been character-recognized.
and wherein the instructions executable by the processor further comprise: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user; (Ide, page 5, Further, the image related to the article is captured by the scanner unit 14 and stored in the image database 21. At that time, as shown in FIG. 2, the search target database 22 stores link information indicating the storage destination of the image data together with the text data that has been character-recognized; page 9, 8th paragraph- First, as an initial setting, a user inputs a term related to a desired article as a search word through the input unit 13 (step S21). Thus, the CPU 11 stores the search word input from the input unit 13 in the input buffer 25 of the memory 16 and then executes the following search process; page 7, 5th paragraph- all relevant article data is extracted from the text data)
an optical character recognition engine executable to: perform optical character recognition on the selected document image to generate a set of extracted ideogram characters; (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance; page 5, 6th-8th paragraphs, e.g. 6th paragraph, The storage device 15 includes an image database (image DB) 21…7th paragraph, The image database 21 stores various image data. The search target database 22 stores data used as a search target. In the present embodiment, text data obtained as a result of reading a newspaper page by the scanner unit 14 and character recognition of an article on the page is stored in the search target database 22 as a search target. …8th paragraph, Further, the image related to the article is captured by the scanner unit 14 and stored in the image database 21. At that time, as shown in FIG. 2, the search target database 22 stores link information indicating the storage destination of the image data together with the text data that has been character-recognized.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
However JANG in view of Ide does not clearly disclose: receive a selection of a document image from the collection of document images from a user; purge an incorrect phrase from the plurality of phrases to generate a set of candidate phrases; and add the selected phrase as character recognized content extracted from the selected document image to a character recognized document corresponding to the selected document image.
However Okamoto discloses:
purge an incorrect phrase from the plurality of phrases to generate a set of candidate phrases; (Okamoto, Fig. 12 ; [0086] This lattice structure includes an overall path 1201 corresponding to the four-character term … ( a correct processing result in this case); [0074] As step S804, the lattice search unit 107 follows the lattice structure, and outputs, as a character recognition result, a sequence of a high score that indicates the probability of appearance. This is the termination of the character recognition processing by the lattice generation unit 106 and the lattice search unit 107; [0028] the extraction processing … can be performed utilizing known stroke processing or OCR processing)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide with the teaching of Okamoto so that terms can be recognized correctly, thereby reducing character recognition errors and enhancing character recognition accuracy, (Okamoto, [0089]).
However JANG in view of Ide in view of Okamoto does not clearly disclose: receive a selection of a document image from the collection of document images from a user; and add the selected phrase as character recognized content extracted from the selected document image to a character recognized document corresponding to the selected document image.
However Lehoux discloses:
receive a selection of a document image from the collection of document images from a user; (Lehoux, [0029]-[0030] , this pointer movement (dragging the mouse) selects the data 112 from the input document 110 that is to be processed through optical character recognition processes)
and add the selected phrase as character recognized content extracted from the selected document image to a character recognized document corresponding to the selected document image. (Lehoux, [0034] checks if there is a paste result from the OCR to paste, and pastes the result into the field in item 164. At the same time the client app 130 shows a correction interface on an overlay (item 165) to quickly correct the result if there are any errors 166. Specifically, as shown in FIG. 6, if there is an error in the pasted text, in item 167, the user can provide (draw) the corrected text and then paste the corrected text into the text field to be filled in item 168; [0040]-[0041] compare the different recognition results output by the different OCR engines in item 204 to rank such recognition results, and then choose the highest ranking result that is to be presented in default paste field 174. Further, the ranking of the recognition results from the different OCR engines can be based on individual characters, words, phrases, sentences, etc; [0045] Once selected, such characters, words, phrases, etc., are automatically pasted into the destination entry fields 120 )
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide in view of Okamoto with the teaching of Lehoux to recognize characters in the image, and aggregate the recognition results from the different optical character recognition processes to produce a final character recognition result that includes relatively higher confidence recognized characters and relatively lower confidence recognized characters, (Lehoux, abstract).
Regarding claim 17, JANG discloses: A system comprising: a data repository storing: and a stroke mapping that maps strokes to stroke identifier (ids); (JANG, page 4-The stroke code sequence is that the stroke code that utilizes current stroke and current stroke hand-written stroke before to identify constitutes. Stroke code (corresponding to “stroke identifier (ids)”) sequence …in being stored in the dictionary of computer memory… In addition, for each Chinese character, in dictionary, has at least one dictionary entry. In dictionary entry, the stroke code sequence of the stroke when storing the ISN of corresponding Chinese character and writing this Chinese character .In order to adapt to the writing style of different user, for some Chinese character a plurality of stroke code sequences can be set, therefore, a plurality of dictionary entries are set. Also can comprise other guide in the dictionary entry, dictionary can adopt multiple mode to realize (for example, database), and all these are not construed as limiting the invention.)
and a character analyzer is executable to: generate a second set of ideogram characters, wherein generating the second set of ideogram characters comprises: for each of the set of extracted ideogram characters; partitioning the extracted ideogram character into a respective plurality of strokes; (JANG, page 7, utilizes the recognition system of scribbling of PalmPilot computing machine hand-written stroke is identified as scribbles letter or special symbol. Step 52 will identify scribbles letter and special symbol is converted at least one stroke code; page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively;)
mapping each stroke of the respective plurality of strokes to a corresponding stroke identifier (id) according to the stroke mapping to create a respective stroke id sequence comprising a plurality of stroke identifiers; (JANG, page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively; Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase;)
modifying the respective stroke id sequence to create a variation of the respective stroke id sequence within a specified distance to the respective stroke id sequence; (JANG, page 5- the difference of the number of times that the contained various stroke codes of the stroke code sequence in the dictionary entry and the stroke code sequence of being discerned occur in two stroke code sequences respectively and less than predetermined threshold (corresponding to “within a specified distance”) . This is to occur indivedual mistakes when allowing the user writing Chinese character. This search condition can be expressed as: i=1I(xi-sji),θ is a threshold value …At this moment, if threshold value θ=1, though then the user has wrongly write a stroke, step 13 also can retrieve and " hand " corresponding dictionary entry, with " hand " as candidate Chinese character. Adopt above-mentioned three kinds of search methods, can dynamically revise (corresponding to “modifying”) stroke code sequence in the corresponding dictionary entry according to user's writing style. For example in last example, though " 1,1; 1,15 " is wrong stroke code sequence, this is user's a writing style, therefore, can in dictionary, increase by one and contain the dictionary entry of stroke code sequence " 1,1; 1; 15 " accordingly with " hand ", perhaps will be original with " hand " corresponding dictionary entry in the stroke code sequence change " 1,1; 1,15 " into.)
determining that the variation of the respective stroke id sequence is a valid stroke id sequence that maps to a corresponding ideogram character of the ideogram language; (JANG, Step 14 utilizes the Hanzi internal code that obtains to show corresponding Chinese character. Step 15 judges whether the user continues to write stroke, i.e. the next stroke of institute's writing of Chinese characters. With this step be accordingly, the user checks the candidate Chinese character that demonstrates, to find whether to have shown the Chinese character of writing. If the user finds not show the Chinese character of being write, promptly continue to write next stroke. If the judged result of step 15 is a "Yes", then forward step 17 to. Step 17 is similar with step 11, and difference only is to detect the next stroke that the user writes. If the judged result of step 15 is a "No", then step 16 is selected the user from shown Chinese character a Chinese character as a result of be input in the computing machine or other devices in.)
and adding the variation of the respective stroke id sequence to the second set of ideogram characters; combine the set of extracted ideogram characters and the second set of ideogram characters to generate a plurality of phrases; (JANG, page 9, Shown in Figure 11 (A), when the user writes the phonetic sign of each word in the phrase " Guizhou ", during such as first letter " g " of "gui ", step 20A is identified as and scribbles letter " g "; Step 20B determines a scope in the dictionary, for example can show simultaneously in this scope corresponding Chinese character " doing a worker ... "When the user continues to write the phonetic sign of second word in the phrase, such as first letter " z " of " zhong ", step 20B determines a scope in the dictionary, for example can show simultaneously in this scope the corresponding candidate phrase " various work transformations ... "It is perpendicular that the user continues to write the first stroke, then step 23 retrieve and step 24 show candidate phrase " the valuable stubbornness in Guizhou ... "At this moment, the user can import this phrase by clicking shown " Guizhou "; page 7, utilizes the recognition system of scribbling of PalmPilot computing machine hand-written stroke is identified as scribbles letter or special symbol. Step 52 will identify scribbles letter and special symbol is converted at least one stroke code; )
select a phrase from the set of candidate phrases; (JANG, page 9, the selection to shown described at least one candidate Chinese character/phrase according to the user, with a selected Chinese character/phrase as handwritten Chinese character/phrase recognition result.)
However JANG does not clearly disclose:
a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user; an optical character recognition engine executable to: perform optical character recognition on the selected document image to generate a set of extracted ideogram characters according to an ideogram language; purge incorrect phrases from the plurality of phrases to generate a set of candidate phrases; and add the selected phrase as character recognized content to a character recognized document corresponding to the selected document image.
However Ide discloses:
a collection of document images; (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance)
a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a content extraction user interface executable to: receive a selection of a document image from the collection of document images from a user; (Ide, page 5, Further, the image related to the article is captured by the scanner unit 14 and stored in the image database 21. At that time, as shown in FIG. 2, the search target database 22 stores link information indicating the storage destination of the image data together with the text data that has been character-recognized; page 9, 8th paragraph- First, as an initial setting, a user inputs a term related to a desired article as a search word through the input unit 13 (step S21). Thus, the CPU 11 stores the search word input from the input unit 13 in the input buffer 25 of the memory 16 and then executes the following search process; page 7, 5th paragraph- all relevant article data is extracted from the text data)
an optical character recognition engine executable to: perform optical character recognition on the selected document image to generate a set of extracted ideogram characters according to an ideogram language; (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance; page 5, 6th-8th paragraphs, e.g. 6th paragraph, The storage device 15 includes an image database (image DB) 21…7th paragraph, The image database 21 stores various image data. The search target database 22 stores data used as a search target. In the present embodiment, text data obtained as a result of reading a newspaper page by the scanner unit 14 and character recognition of an article on the page is stored in the search target database 22 as a search target. …8th paragraph, Further, the image related to the article is captured by the scanner unit 14 and stored in the image database 21. At that time, as shown in FIG. 2, the search target database 22 stores link information indicating the storage destination of the image data together with the text data that has been character-recognized.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
However JANG in view of Ide does not clearly disclose:
receive a selection of a document image from the collection of document images from a user; purge incorrect phrases from the plurality of phrases to generate a set of candidate phrases; and add the selected phrase as character recognized content to a character recognized document corresponding to the selected document image.
However Okamoto discloses:
purge incorrect phrases from the plurality of phrases to generate a set of candidate phrases; (Okamoto, Fig. 12; [0086] This lattice structure includes an overall path 1201 corresponding to the four-character term … ( a correct processing result in this case); [0074] As step S804, the lattice search unit 107 follows the lattice structure, and outputs, as a character recognition result, a sequence of a high score that indicates the probability of appearance. This is the termination of the character recognition processing by the lattice generation unit 106 and the lattice search unit 107.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide with the teaching of Okamoto so that terms can be recognized correctly, thereby reducing character recognition errors and enhancing character recognition accuracy, (Okamoto, [0089]).
However JANG in view of Ide in view of Okamoto does not clearly disclose:
receive a selection of a document image from the collection of document images from a user; and add the selected phrase as character recognized content to a character recognized document corresponding to the selected document image.
However Lehoux discloses:
receive a selection of a document image from the collection of document images from a user; (Lehoux, [0029]-[0030] , this pointer movement (dragging the mouse) selects the data 112 from the input document 110 that is to be processed through optical character recognition processes)
and add the selected phrase as character recognized content to a character recognized document corresponding to the selected document image. (Lehoux, [0034] checks if there is a paste result from the OCR to paste, and pastes the result into the field in item 164. At the same time the client app 130 shows a correction interface on an overlay (item 165) to quickly correct the result if there are any errors 166. Specifically, as shown in FIG. 6, if there is an error in the pasted text, in item 167, the user can provide (draw) the corrected text and then paste the corrected text into the text field to be filled in item 168; [0040]-[0041] compare the different recognition results output by the different OCR engines in item 204 to rank such recognition results, and then choose the highest ranking result that is to be presented in default paste field 174. Further, the ranking of the recognition results from the different OCR engines can be based on individual characters, words, phrases, sentences, etc; [0045] Once selected, such characters, words, phrases, etc., are automatically pasted into the destination entry fields 120 )
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide in view of Okamoto with the teaching of Lehoux to recognize characters in the image, and aggregate the recognition results from the different optical character recognition processes to produce a final character recognition result that includes relatively higher confidence recognized characters and relatively lower confidence recognized characters, (Lehoux, abstract).
Regarding claim 18, JANG in view of Ide in view of Okamoto in view of Lehoux discloses all of the features with respect to claim 17 as outlined above. JANG in view of Ide does not clearly disclose: wherein the optical character recognition engine is executable to determine that the set of extracted ideogram characters are grammatically incorrect and wherein the character analyzer analyzes the set of extracted ideogram characters based on a determination that set of ideogram characters are grammatically incorrect.
However Okamoto discloses:
wherein the optical character recognition engine is executable to determine that the set of extracted ideogram characters are grammatically incorrect and wherein the character analyzer analyzes the set of extracted ideogram characters based on a determination that set of ideogram characters are grammatically incorrect. (Okamoto , Fig. 12; [0086]This lattice structure includes an overall path 1201 corresponding to the four-character term … ( a correct processing result in this case); [0074] As step S804, the lattice search unit 107 follows the lattice structure, and outputs, as a character recognition result, a sequence of a high score that indicates the probability of appearance. This is the termination of the character recognition processing by the lattice generation unit 106 and the lattice search unit 107.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide with the teaching of Okamoto so that terms can be recognized correctly, thereby reducing character recognition errors and enhancing character recognition accuracy, (Okamoto, [0089]).
Regarding claim 19, JANG in view of Ide in view of Okamoto in view of Lehoux discloses all of the features with respect to claim 17 as outlined above. JANG in view of Ide in view of Okamoto does not clearly disclose: wherein the content extraction user interface is further executable to receive parameters for performing optical character recognition and wherein the optical character recognition engine performs the optical character recognition on the selected document image according to the optical character recognition parameters.
However Lehoux discloses:
wherein the content extraction user interface is further executable to receive parameters for performing optical character recognition and wherein the optical character recognition engine performs the optical character recognition on the selected document image according to the optical character recognition parameters. (Lehoux, [0031]-[0032] OCR constraints (as illustrated in item 154) that could be shown for selection or directly called by other shortcuts or contextual menus. For example, some OCR constraints in item 154 can specify: that the text of the zone 112 is composed of only digits as a phone number; alphabets (letters) as a last name; that the selected zone 112 is handwritten; that text of the zone follows a specific pattern (like a mail pattern for instance); the language of the text (English/French/ ... etc.) and/or the alphabet (Latin, Arabic, Cyrillic, . . . etc.) of the zone selected.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide in view of Okamoto with the teaching of Lehoux to recognize characters in the image, and aggregate the recognition results from the different optical character recognition processes to produce a final character recognition result that includes relatively higher confidence recognized characters and relatively lower confidence recognized characters, (Lehoux, abstract).
Regarding claim 20, JANG in view of Ide in view of Okamoto in view of Lehoux discloses all of the features with respect to claim 19 as outlined above. JANG in view of Ide in view of Okamoto does not clearly disclose: wherein the parameters specify a portion of the selected document image from which to extract content.
However Lehoux discloses:
wherein the parameters specify a portion of the selected document image from which to extract content. (Lehoux ,Fig. 5; [0031]-[0032] OCR constraints (as illustrated in item 154) that could be shown for selection or directly called by other shortcuts or contextual menus. For example, some OCR constraints in item 154 can specify: that the text of the zone 112 is composed of only digits as a phone number; alphabets (letters) as a last name; that the selected zone 112 is handwritten; that text of the zone follows a specific pattern (like a mail pattern for instance); the language of the text (English/French/ ... etc.) and/or the alphabet (Latin, Arabic, Cyrillic, . . . etc.) of the zone selected.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide in view of Okamoto with the teaching of Lehoux to recognize characters in the image, and aggregate the recognition results from the different optical character recognition processes to produce a final character recognition result that includes relatively higher confidence recognized characters and relatively lower confidence recognized characters, (Lehoux, abstract).
Claims 9-11 and 13-15 are rejected under 35 U.S.C. 103 as being unpatentable over JANG (CN1156741C) in view of Ide (JP2011065597) in view of Henry (US20090228777)
Regarding claim 9 , JANG discloses: A system comprising: a data repository storing: and a stroke mapping that maps strokes to stroke identifier (ids); (JANG page 4-The stroke code sequence is that the stroke code that utilizes current stroke and current stroke hand-written stroke before to identify constitutes. Stroke code (corresponding to “stroke identifier (ids)”) sequence…in being stored in the dictionary of computer memory… In addition, for each Chinese character, in dictionary, has at least one dictionary entry. In dictionary entry, the stroke code sequence of the stroke when storing the ISN of corresponding Chinese character and writing this Chinese character .In order to adapt to the writing style of different user, for some Chinese character a plurality of stroke code sequences can be set, therefore, a plurality of dictionary entries are set. Also can comprise other guide in the dictionary entry, dictionary can adopt multiple mode to realize (for example, database), and all these are not construed as limiting the invention.) a document analyzer computer coupled to the data repository, the document analyzer computer comprising: a processor; a memory storing instructions executable by the processor, the instructions comprising: a search user interface executable to: receive a first phrase extracted from the selected document, the first phrase from the selected document comprising first ideogram character from an ideogram language; (JANG, page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively; Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase; (Note: Examiner interprets that Chinese character corresponds to “ideogram character from an ideogram language”))
a character analyzer executable to identify an ideogram character variation that is a variation of the first ideogram character, wherein identifying the ideogram character variation comprises: (JANG, page 4- In addition, for each Chinese character, in dictionary, has at least one dictionary entry. In dictionary entry, the stroke code sequence of the stroke when storing the ISN of corresponding Chinese character and writing this Chinese character .In order to adapt to the writing style of different user, for some Chinese character a plurality of stroke code sequences can be set, therefore, a plurality of dictionary entries are set. Also can comprise other guide in the dictionary entry, dictionary can adopt multiple mode to realize (for example, database), and all these are not construed as limiting the invention.)
partitioning a first ideogram character into a first plurality of strokes; (JANG, page 7, utilizes the recognition system of scribbling of PalmPilot computing machine hand-written stroke is identified as scribbles letter or special symbol. Step 52 will identify scribbles letter and special symbol is converted at least one stroke code; page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively;)
mapping each stroke of the first plurality of strokes to a corresponding stroke identifier (id) according to the stroke mapping to create a first stroke id sequence comprising a plurality of stroke identifiers; (JANG, page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively; Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase;)
modifying the first stroke id sequence to create a second stroke id sequence within a specified distance to the first stroke id sequence; (JANG, page 5- the difference of the number of times that the contained various stroke codes of the stroke code sequence in the dictionary entry and the stroke code sequence of being discerned occur in two stroke code sequences respectively and less than predetermined threshold (corresponding to “within a specified distance”) . This is to occur indivedual mistakes when allowing the user writing Chinese character. This search condition can be expressed as: i=1I(xi-sji),θ is a threshold value …At this moment, if threshold value θ=1, though then the user has wrongly write a stroke, step 13 also can retrieve and " hand " corresponding dictionary entry, with " hand " as candidate Chinese character. Adopt above-mentioned three kinds of search methods, can dynamically revise (corresponding to “modifying”) stroke code sequence in the corresponding dictionary entry according to user's writing style. For example in last example, though " 1,1; 1,15 " is wrong stroke code sequence, this is user's a writing style, therefore, can in dictionary, increase by one and contain the dictionary entry of stroke code sequence " 1,1; 1; 15 " accordingly with " hand ", perhaps will be original with " hand " corresponding dictionary entry in the stroke code sequence change " 1,1; 1,15 " into.)
determining that the second stroke id sequence is a valid stroke id sequence that maps to a second ideogram character of the ideogram language; (JANG, Step 14 utilizes the Hanzi internal code that obtains to show corresponding Chinese character. Step 15 judges whether the user continues to write stroke, i.e. the next stroke of institute's writing of Chinese characters. With this step be accordingly, the user checks the candidate Chinese character that demonstrates, to find whether to have shown the Chinese character of writing. If the user finds not show the Chinese character of being write, promptly continue to write next stroke. If the judged result of step 15 is a "Yes", then forward step 17 to. Step 17 is similar with step 11, and difference only is to detect the next stroke that the user writes. If the judged result of step 15 is a "No", then step 16 is selected the user from shown Chinese character a Chinese character as a result of be input in the computing machine or other devices in.)
and provide the search result to the search user interface. (JANG, page 2, Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase; Step display in the user writing stroke, dynamically shows described at least one candidate Chinese character/phrase; Transfer step continues to write next stroke if judge the user, then transfers to above-mentioned stroke identification step; Recognition result generates step, has selected at least one shown candidate Chinese character/phrase one if judge the user, then with a selected Chinese character/phrase as handwritten Chinese character/phrase recognition result.)
However, JANG does not clearly disclose:
a collection of character recognized documents; receive a document identifier for a selected document; and a fuzzy search engine executable to: generate a variation of the first phrase using the ideogram character variation; perform a search of the collection of character recognized documents using the variation of the first phrase to obtain a search result identifying documents from the collection of character recognized documents that are similar to the selected document;
However Ide discloses:
a collection of character recognized documents; (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance)
and a fuzzy search engine executable to: generate a variation of the first phrase using the ideogram character variation; (See Ide, page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; See also e.g. page 7, 5th paragraph, line 2- the CPU 11 determines each character constituting the input search word. Is replaced with the corresponding misrecognized character in the misrecognition probability database 23 to generate a search word with a high probability of misrecognition , and text data is generated using the generated new search word (corresponding to “a variation of the first search query”). Search again.)
perform a search of the collection of character recognized documents using the variation of the first phrase to obtain a search result identifying documents from the collection of character recognized documents that are similar to the selected document; (See Ide, page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; See also e.g. page 7, 5th paragraph, line 2- the CPU 11 determines each character constituting the input search word. Is replaced with the corresponding misrecognized character in the misrecognition probability database 23 to generate a search word with a high probability of misrecognition, and text data is generated using the generated new search word (corresponding to “the variation of the first search query"). Search again (step S13); see also page 8, 4th paragraph, line 6- and displays them in a predetermined format on the display unit 12 so that the user can confirm them. Output (step S16).)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
However, JANG in view of Ide does not clearly disclose:
receive a document identifier for a selected document;
However Henry discloses:
receive a document identifier for a selected document; (Henry, Fig. 47 item 4710, “Receive Document Identifier”; [0158] At input information block 110, a user may select the starting documents to be analyzed. In an example, the user may input a patent application and drawings; [0159] For inputs that are in graphical format, such as a TIFF file or PDF file that does not contain metadata, the text and symbol information are converted first using optical character recognition (OCR) and then metadata is captured’ [0175] when the user requests analysis of a published application or patent. In such cases, server processor 210 may receive an identifier, such as a patent number or published application number,)
receive a first phrase extracted from the selected document, (Henry, [0258] For example, the text/graphics is provided by an OCR system that is optimized to detect numbers, words and/or letters in a cluttered image space)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide, with the teaching of Henry to allow other systems and methods to identify which document the embodiment was produced from, (Henry, [0517]) and also to know what document the image is provided from, (Henry, [0534]).
Regarding claim 10, JANG in view of Ide in view of Henry discloses all of the features with respect to claim 9 as outlined above. Claim 10 further recites: wherein the strokes in the stroke mapping are ordered, wherein the ordering of the strokes in the stroke mapping creates a determinable partitioning of ideogram characters into the strokes of the stroke mapping. (JANG, page 8- in order to adapt to the writing style of different user, also can define the another kind of order of strokes of " state", in dictionary, comprise another dictionary entry: Stroke code sequence: 3,8,1,3,1,1,6,1…As shown in Figure 9, it is perpendicular that the user at first writes the first stroke, and then step 51 (first step of step 17) is identified as this stroke and scribbles letter " i "; Step 52 (second step of step 17) will be scribbled letter " i " and will be converted to stroke code 3;)
Regarding claim 11, JANG in view of Ide in view of Henry discloses all of the features with respect to claim 10 as outlined above. Claim 11 further recites: wherein partitioning the first ideogram character into a plurality of strokes comprises determining whether each stroke from the stroke mapping is found in the first ideogram character until either the first ideogram character is covered by strokes from the stroke mapping or all the strokes in the stroke mapping have been processed. (JANG, page 8- in order to adapt to the writing style of different user, also can define the another kind of order of strokes of " state", in dictionary, comprise another dictionary entry: Stroke code sequence: 3,8,1,3,1,1,6,1…As shown in Figure 9, it is perpendicular that the user at first writes the first stroke, and then step 51 (first step of step 17) is identified as this stroke and scribbles letter " i "; Step 52 (second step of step 17) will be scribbled letter " i " and will be converted to stroke code 3; page 8, that the user at first writes the first stroke, and then step 51 (first step of step 17) is identified as this stroke and scribbles letter " i "; Step 52 (second step of step 17) will be scribbled letter " i " and will be converted to stroke code 3; Step 12 forms stroke code sequence " 3 "; Step 13 retrieve some candidate Chinese characters " Bu Shen mouth ... " Step 14 shows these candidate Chinese characters. Because the Chinese character " state " that does not have the user writing in shown Chinese character, so the judged result of step 15 is a "No", process proceeds to step 17. Then, the user writes second cross break, and then step 51 (first step of step 17) is identified as this stroke and scribbles letter " t "; Step 52 (second step of step 17) will be scribbled letter " t " and will be converted to three stroke codes 8,9,10; So step 12 forms three stroke code sequences " 3,8 ", " 3,9 ", " 3,10 "; Step 13 corresponding to three stroke code sequences retrieve three groups of candidate Chinese characters " mouthful seeing ... ", " superfluous writing ... ", " Shen Gang ... " Step 14 shows some Chinese character in above-mentioned three groups of candidate Chinese characters. Because the Chinese character " state " that does not have the user writing in shown Chinese character, so the judged result of step 15 is a "No", process proceeds to step 17. Similarly, the user continues to write the 3rd, the 4th ...After having write the 5th, have in the shown candidate Chinese character " state ". Therefore, the judged result in the step 15 is a "Yes", and process proceeds to step 16; The user can be input to " state " in the computing machine by clicking shown " state " word on the screen.)
Regarding claim 13, JANG in view of Ide in view of Henry discloses all of the features with respect to claim 9 as outlined above. JANG does not clearly disclose:
wherein the search result references the similar documents and wherein the search user interface is executable to store the search result.
However Ide discloses:
wherein the search result references the similar documents and wherein the search user interface is executable to store the search result. (See Ide, page 10, 3rd paragraph- Data that matches either the search word or each search candidate is extracted from the text data stored in 22 as a search result, and stored in a work area (not shown) of the memory 16; page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; page 8, If an image is included in the extracted article, the corresponding image data is read from the image database 21 based on the link information added to the text data of the article and displayed together with the text data. It will be.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
Regarding claim 14, JANG in view of Ide in view of Henry discloses all of the features with respect to claim 9 as outlined above. JANG does not clearly disclose:
wherein the search result references the similar documents and wherein the search user interface is executable to provide the search result to a user.
However Ide discloses:
wherein the search result references the similar documents and wherein the search user interface is executable to provide the search result to a user. (Ide, page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; page 8, 4th paragraph, line 6- and displays them in a predetermined format on the display unit 12 so that the user can confirm them. Output (step S16); page 8, If an image is included in the extracted article, the corresponding image data is read from the image database 21 based on the link information added to the text data of the article and displayed together with the text data. It will be.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
Regarding claim 15, JANG in view of Ide in view of Henry discloses all of the features with respect to claim 9 as outlined above. JANG does not clearly disclose: wherein the data repository further stores document images corresponding to the character recognized documents in the collection of character recognized documents, and wherein providing the search result to the user comprises presenting the document images corresponding to the similar documents to the user.
However Ide discloses:
wherein the data repository further stores document images corresponding to the character recognized documents in the collection of character recognized documents, (Ide, page 10- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance may be acquired from the outside via a recording medium or a communication medium; page 8, Here, a predetermined number (100) of articles related to “election violation” is extracted from the text data of the search target database 22 and displayed on the screen of the display unit 12. If an image is included in the extracted article, the corresponding image data is read from the image database 21 based on the link information added to the text data of the article and displayed together with the text data; page 5- The image database 21 stores various image data. The search target database 22 stores data used as a search target.)
and wherein providing the search result to the user comprises presenting the document images corresponding to the similar documents to the user. ( Ide, page 8, Here, a predetermined number (100) of articles related to “election violation” is extracted from the text data of the search target database 22 and displayed on the screen of the display unit 12. If an image is included in the extracted article, the corresponding image data is read from the image database 21 based on the link information added to the text data of the article and displayed together with the text data; page 7, 4th paragraph- 6th paragraph, e.g. see page 7, 5th paragraph- the CPU searches the search target database 22 based on the search word stored in the input buffer 25, and searches for article data that matches the search word from the text data stored in the search target database 22… In this case, all relevant article data is extracted from the text data; page 8, 4th paragraph, line 6- and displays them in a predetermined format on the display unit 12 so that the user can confirm them. Output (step S16).)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over JANG (CN1156741C) in view of Ide (JP2011065597) in view of Henry (US20090228777)
in further view of Yang (“Comparison of shape-based and stroke-based methods for segmenting handwritten Chinese characters”, hereinafter Yang)
Regarding claim 12, JANG in view of Ide in view of Henry discloses all of the features with respect to claim 10 as outlined above. JANG in view of Ide in view of Henry does not clearly disclose: wherein the strokes in the stroke mapping are ordered based on size and encapsulation.
However Yang discloses:
wherein the strokes in the stroke mapping are ordered based on size and encapsulation. (Yang, page 4, section 2.2.2 Merge Stroke-bounding Boxes Step1. With the information of the stroke types assigned (horizontal, vertical, up-left-slanting, up-right-slanting) and the stroke lengths (long stroke, short stroke), we can classify the strokes into eight types. According to the coordinates of their top-left points, all stroke-bounding boxes are sorted by the X-coordinates in non-decreasing order. If the X-coordinates become the same, then the sorting process changes to sort the Y coordinates, also in ascending order. Step2. Merge two boxes using relative positions of their up-left point and bottom-right point. First, only merge those boxes corresponding to short strokes. Second, merge those boxes corresponding to the slanting-strokes. Third, if two boxes are overlapped with each other more than half of the width of either box and the width of the merged box is less than K (determined by experience), then they will be merged. Fourth, similar to the third step except that the overlapped area is one third of the width of either box. A series of bounding-boxes of Chinese characters are created.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide with the teaching of Yang to extract the Chinese characters correctly and efficiently and also to solve the over-splitting and overlapping problems and to find the best way to segment the characters, (Yang, page 1, section 1).
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over JANG (CN1156741C) in view of Ide (JP2011065597) in view of Henry (US20090228777) in view of Okamoto (US 2015/0199582 Al )
Regarding claim 16, JANG in view of Ide in view of Henry discloses all of the features with respect to claim 9 as outlined above. Claim 16 further recites: and wherein the character analyzer is further executable to: generate a set of extracted ideogram character stroke id sequences from the set of extracted ideogram characters using the stroke mapping; (JANG, page 2- Chinese handwriting identifying method, it is characterized in that may further comprise the steps: the stroke identification step, the motion of starting to write at every turn and lifting pen between the pen when detecting the user by stroke handwritten Chinese character/phrase, with a Motion Recognition of pen is at least one stroke code, utilize each in this at least one stroke code, the stroke code sequence with previous formation constitutes at least one new stroke code sequence respectively; Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase;)
select a third stroke id sequence, the third stroke id sequence selected from the set of extracted ideogram character stroke id sequences; (JANG, page 5, For example, for Chinese character " life ", suppose that corresponding dictionary entry is k clauses and subclauses in the dictionary, the stroke code sequence in these clauses and subclauses is " 5,1,1,3,1 " (the stroke code is referring to table shown in Figure 7), Sk1=3 then, Sk3=1, Sk5=1.If the user is by the order of strokes writing of Chinese characters " life " of " 5,1,3,1,1 ", x1=3 then, x3=1, x5=1.So therefore X=S retrieves k dictionary entry, corresponding " life " is as candidate Chinese character.)
modify the third stroke id sequence to create a fourth stroke id sequence within the specified distance to the third stroke id sequence; (JANG, page 5- the difference of the number of times that the contained various stroke codes of the stroke code sequence in the dictionary entry and the stroke code sequence of being discerned occur in two stroke code sequences respectively and less than predetermined threshold (corresponding to “within the specified distance”) . This is to occur indivedual mistakes when allowing the user writing Chinese character. This search condition can be expressed as: i=1I(xi-sji),θ is a threshold value …At this moment, if threshold value θ=1, though then the user has wrongly write a stroke, step 13 also can retrieve and " hand " corresponding dictionary entry, with " hand " as candidate Chinese character. Adopt above-mentioned three kinds of search methods, can dynamically revise (corresponding to “modify”) stroke code sequence in the corresponding dictionary entry according to user's writing style. For example in last example, though " 1,1; 1,15 " is wrong stroke code sequence, this is user's a writing style, therefore, can in dictionary, increase by one and contain the dictionary entry of stroke code sequence " 1,1; 1; 15 " accordingly with " hand ", perhaps will be original with " hand " corresponding dictionary entry in the stroke code sequence change " 1,1; 1,15 " into.)
determine that the fourth stroke id sequence corresponds to a fourth ideogram character; (JANG, page 5, Step 14 utilizes the Hanzi internal code that obtains to show corresponding Chinese character. Step 15 judges whether the user continues to write stroke, i.e. the next stroke of institute's writing of Chinese characters. With this step be accordingly, the user checks the candidate Chinese character that demonstrates, to find whether to have shown the Chinese character of writing. If the user finds not show the Chinese character of being write, promptly continue to write next stroke. If the judged result of step 15 is a "Yes", then forward step 17 to. Step 17 is similar with step 11, and difference only is to detect the next stroke that the user writes. If the judged result of step 15 is a "No", then step 16 is selected the user from shown Chinese character a Chinese character as a result of be input in the computing machine or other devices in.)
combine a plurality of ideogram characters to generate a plurality of phrases, wherein the plurality of ideogram characters includes the fourth ideogram character, and wherein at least one of the plurality of phrases includes the fourth ideogram character; (JANG, page 9, Shown in Figure 11 (A), when the user writes the phonetic sign of each word in the phrase " Guizhou ", during such as first letter " g " of "gui ", step 20A is identified as and scribbles letter " g "; Step 20B determines a scope in the dictionary, for example can show simultaneously in this scope corresponding Chinese character " doing a worker ... "When the user continues to write the phonetic sign of second word in the phrase, such as first letter " z " of " zhong ", step 20B determines a scope in the dictionary, for example can show simultaneously in this scope the corresponding candidate phrase " various work transformations ... "It is perpendicular that the user continues to write the first stroke, then step 23 retrieve and step 24 show candidate phrase " the valuable stubbornness in Guizhou ... "At this moment, the user can import this phrase by clicking shown " Guizhou "; page 7, utilizes the recognition system of scribbling of PalmPilot computing machine hand-written stroke is identified as scribbles letter or special symbol. Step 52 will identify scribbles letter and special symbol is converted at least one stroke code; )
select a phrase from the set of candidate phrases as the first phrase; (JANG, page 9, the selection to shown described at least one candidate Chinese character/phrase according to the user, with a selected Chinese character/phrase as handwritten Chinese character/phrase recognition result.)
and provide the first phrase to the search user interface. (JANG, page 2, Searching step, in being stored in the dictionary/dictionary of computer memory, corresponding at least one the dictionary/dictionary entry of at least one the new stroke code sequence that is constituted in retrieval and the stroke identification step, thus obtain at least one candidate Chinese character/phrase; Step display in the user writing stroke, dynamically shows described at least one candidate Chinese character/phrase; Transfer step continues to write next stroke if judge the user, then transfers to above-mentioned stroke identification step; Recognition result generates step, has selected at least one shown candidate Chinese character/phrase one if judge the user, then with a selected Chinese character/phrase as handwritten Chinese character/phrase recognition result.)
However JANG does not clearly disclose:
wherein the data repository further stores a document image of the selected document and wherein the instructions executable by the processor further comprise: an optical character recognition engine executable to: perform optical character recognition on the document image of the selected document to generate a set of extracted ideogram characters; purge an incorrect phrase from the plurality of phrases to generate a set of candidate phrases;
However Ide discloses:
wherein the data repository further stores a document image of the selected document (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance; page 5, 6th-8th paragraphs, e.g. 6th paragraph, The storage device 15 includes an image database (image DB) 21…7th paragraph, The image database 21 stores various image data. The search target database 22 stores data used as a search target. In the present embodiment, text data obtained as a result of reading a newspaper page by the scanner unit 14 and character recognition of an article on the page is stored in the search target database 22 as a search target. …8th paragraph, Further, the image related to the article is captured by the scanner unit 14 and stored in the image database 21. At that time, as shown in FIG. 2, the search target database 22 stores link information indicating the storage destination of the image data together with the text data that has been character-recognized.)
and wherein the instructions executable by the processor further comprise: an optical character recognition engine executable to: perform optical character recognition on the document image of the selected document to generate a set of extracted ideogram characters; (Ide, page 10, 7th paragraph- The present invention can be similarly applied to text data obtained by character recognition of some document. Alternatively, text data that has been character-recognized in advance; page 5, 6th-8th paragraphs, e.g. 6th paragraph, The storage device 15 includes an image database (image DB) 21…7th paragraph, The image database 21 stores various image data. The search target database 22 stores data used as a search target. In the present embodiment, text data obtained as a result of reading a newspaper page by the scanner unit 14 and character recognition of an article on the page is stored in the search target database 22 as a search target. …8th paragraph, Further, the image related to the article is captured by the scanner unit 14 and stored in the image database 21. At that time, as shown in FIG. 2, the search target database 22 stores link information indicating the storage destination of the image data together with the text data that has been character-recognized.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG, with the teaching of Ide to obtain a desired search result with less search omission even when an error recognition character is included in text data, (Ide, abstract).
However JANG in view of Ide in view of Henry does not clearly disclose:
purge an incorrect phrase from the plurality of phrases to generate a set of candidate phrases;
However Okamoto discloses:
purge an incorrect phrase from the plurality of phrases to generate a set of candidate phrases; (Okamoto, Fig. 12; [0086] This lattice structure includes an overall path 1201 corresponding to the four-character term … ( a correct processing result in this case); [0074] As step S804, the lattice search unit 107 follows the lattice structure, and outputs, as a character recognition result, a sequence of a high score that indicates the probability of appearance. This is the termination of the character recognition processing by the lattice generation unit 106 and the lattice search unit 107.)
Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to incorporate the teaching of JANG in view of Ide in view of Henry with the teaching of Okamoto so that terms can be recognized correctly, thereby reducing character recognition errors and enhancing character recognition accuracy, (Okamoto, [0089]).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Faezeh Forouharnejad whose telephone number is (571)270-7416. The examiner can normally be reached on generally Monday through Friday.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Shah Sanjiv can be reached on (571)272-4098. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from Patent Center and the Private Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from Patent Center or Private PAIR. Status information for unpublished applications is available through Patent Center and Private PAIR to authorized users only. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free)
/F.F. /
Examiner, Art Unit 2166
/SANJIV SHAH/ Supervisory Patent Examiner, Art Unit 2166