Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This Office Action is sent in response to Applicant’s Communication received
on 12/11/2025 for application number 17/350060 and the conference decision of 01/30/2026.
Response to Amendments
By the Amendment of 07/17/2025, Claims 1, 6, 8, and 15 have been amended.
Claims 1, 3, 6-8, 10, 13-15, 17, and 20-29 remain pending in the application.
Applicant’s response filed 7/17/2025 is sufficient to overcome the 35 U.S.C. 102
rejections of claims 1, 8, 15, 23, 24, 26, and 28 over Liu, the 35 U.S.C. 103 rejections of claims 3, 10, 17, 22, 25, and 27 over Liu in view of Wieman, the 35 U.S.C. 103 rejections of claims 6, 13, 20, and 29 over Liu in view of Chen and Jin, and the 35 U.S.C. 103 rejections of claims 7, 14, and 21 over Liu in view of Jiang. The previous rejections have been withdrawn.
Response to Arguments
Applicant argues that the cited portions of Liu does not teach “receiving plain text labels… and the plain text labels comprising machine-encoded text corresponding to the multiple words…wherein the training of the convolutional recurrent neural network comprises performing a first loss reduction for the plain text labels and a second loss reduction for vector labels that include the semantic feature vectors”.
Examiner notes Bui et al (Pub. No.: US 20220114476 A1), hereafter Bui teaches receiving plain text labels (text sequence labeling loss model 614 receives predicted text sequence labels 612, P0097)…and the plain text labels comprising machine-encoded text corresponding to the multiple words (predicted text sequence labels 612 may be encoded in binary, P0103)… wherein the training of the convolutional recurrent neural network comprises performing a first loss reduction for the plain text labels and a second loss reduction for vector labels that include the semantic feature vectors (neural network layers are trained and optimized via backpropagation and/or end-to-end learning to minimize the label prediction loss. In general, a loss model identifies classification loss amounts between predicted text sequence labeling model and ground truth text sequence labeling model. In some implementations, the loss model compares corresponding feature vectors in multidimensional vector space, P0097-P0098).
The full prior art rejections are outlined below.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.
The following is a quotation of the first paragraph of pre-AIA 35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.
Claims 1, 8, and 15 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA 35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. The specification (see P0033-P0038) does not disclose a first and second loss reduction. Thus, a person of
ordinary skill in the art cannot determine how to perform the claimed functions, and the
specification fails to demonstrate that the inventor was in possession of the claimed
invention at the time of filing. Claims 3, 6, 7, 10, 13, 14, 17, and 20-29 incorporate by reference all limitations of claims 1, 8, and 15 and are rejected under 35 U.S.C. 112(a) for similar reasons.
The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.
The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.
Claims 1, 3, 6-8, 10, 13-15, 17, and 20-29 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph.
Claims 1, 8, and 15 recite the limitation "wherein the training of the convolutional recurrent neural network comprises performing a first loss reduction for the plain text labels and a second loss reduction for vector labels that include the semantic feature vectors". The specification recites the use of backpropagation and constraint being harnessed to reduce loss, but this is not reflected in the claims. The claims merely recite a first loss reduction for plain text labels and a second loss reduction for vector labels, but do not recite how loss reduction is performed.
Dependent claims 3, 6, 7, 10, 13, 14, 17, and 20-29 inherit the deficiency and are rejected for the same rationale. Appropriate action is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 8, 15, 23, 24, 26, and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Liu et al (WO 2020199730 A1), hereafter Liu in view of Bui et al (Pub. No.: US 20220114476 A1), hereafter Bui.
Regarding claims 1, 8, and 15, Liu teaches receiving a text image …, the text image comprising multiple words, (feature extraction is done on image, page 2, paragraph 1 under summary of invention) … training an encoder network using the text image so that the encoder network generates respective semantic feature vectors for the multiple words (convolutional neural network encodes an input text image and generates semantic vectors for the words in the image, page 14, full paragraph 1. Performing feature extraction processing on an image to be detected to obtain a plurality of semantic vectors, wherein the plurality of semantic vectors correspond to a plurality of characters of a text sequence, page 2, paragraph 1 under summary of invention); and inputting, into multiple channels of a convolutional recurrent neural network, the text image, the plain text labels, and the semantic feature vectors to train the convolutional recurrent neural network for optical character recognition (“the multiple semantic vectors are sequentially recognized through a convolutional neural network to obtain a recognition result of the text sequence”, page 5, paragraph 7 under detailed description. Two blocks of the neural network may be processed in parallel, page 19, paragraph 4), wherein the plain text labels and the semantic feature vectors are constraints for the training of the convolutional recurrent neural network (the semantic vector is processed during training, and training the feature extraction network is done by the difference between label information of the characters and the output result, page 8, first full paragraph), … wherein the trained encoder network and the trained convolutional recurrent neural network together constitute a machine learning model for performing optical character recognition (convolutional neural network and encoder are used to recognize characters in text, page 14, full paragraph 3).
Liu does not appear to explicitly teach “receiving plain text labels…and the plain text labels comprising machine-encoded text corresponding to the multiple words…wherein the training of the convolutional recurrent neural network comprises performing a first loss reduction for the plain text labels and a second loss reduction for vector labels that include the semantic feature vectors”.
Bui et al (Pub. No.: US 20220114476 A1), hereafter Bui teaches receiving plain text labels (text sequence labeling loss model 614 receives predicted text sequence labels 612, P0097)…and the plain text labels comprising machine-encoded text corresponding to the multiple words (predicted text sequence labels 612 may be encoded in binary, P0103)… wherein the training of the convolutional recurrent neural network comprises performing a first loss reduction for the plain text labels and a second loss reduction for vector labels that include the semantic feature vectors (neural network layers are trained and optimized via backpropagation and/or end-to-end learning to minimize the label prediction loss. In general, a loss model identifies classification loss amounts between predicted text sequence labeling model and ground truth text sequence labeling model. In some implementations, the loss model compares corresponding feature vectors in multidimensional vector space, P0097-P0098).
Accordingly, it would have been obvious to a person having ordinary skill in the
art before the effective filing date of the claimed invention, having the teachings of
Liu and Bui before them, to include Bui’s specific teachings of performing loss reduction on predicted text labels and vector labels in Liu’s method of Text Recognition. One would have been motivated to make such a combination of performing loss reduction on predicted text labels and vector labels (see Bui P0097-P0098) and backpropagating network parameters of a recurrent neural network to minimize network loss (see Liu page 7 paragraph 6, page 8 paragraphs 1-2).
Regarding claims 23, 24, and 26, Liu in view of Bui teaches the limitations of claims 15, 1, and 8 as outlined above. Liu further teaches wherein the operations further comprise inputting a new text image into the trained machine learning model (multiple images may be used as input for the machine learning model, page 7, paragraph 2) such that, in response, the trained machine learning model performs optical character recognition on the new text image and generates machine-encoded text representing one or more words that are in the new text image (convolutional neural network and encoder are used to recognize characters in text images and generate semantic vectors for the text, page 14, full paragraph 3).
Regarding claim 28, Liu in view of Bui teaches the limitations of claim 1 as outlined above. Liu further teaches storing the trained machine learning model in computer data storage (“a non-transitory computer-readable storage medium (for example, the memory 1932) may also be provided, on which computer program instructions are stored. The computer program instructions, when executed by the processor (for example, the processing component 1922), enable the processor to implement any of the foregoing text recognition methods”, page 19, paragraph 2).
Claims 3, 10, 17, 22, 25, and 27 are rejected under 35 U.S.C. 103 as being unpatentable over Liu in view of Bui and further in view of Wieman et al (US 11392833 B2), hereafter Wieman.
Regarding claims 3, 10, and 17, Liu in view of Bui teaches the elements of claims 1, 8, and 15 as outlined above. Liu does not appear to explicitly teach: “wherein the machine learning model comprises at least one member selected from the group consisting of a convolutional recurrent neural network and a connectionist temporal classification function”.
Wieman teaches: wherein the machine learning model comprises at least one member selected from the group consisting of a convolutional recurrent neural network and a connectionist temporal classification function (“processing at least an output of the recurrent neural network architecture using a feed-forward neural network architecture to determine a set of classification scores for a plurality of sound units associated with speech”, C:24, L:26-30. “the classification scores may be used by further neural network architectures so as to determine a text transcription of an utterance that is present within recorded audio data. For example, the audio processing systems described herein may be used as part of a larger Connectionist Temporal Classification (CTC) neural model to provide transcription and/or command translation”, C:24, L:56-63).
Accordingly, it would have been obvious to a person having ordinary skill in the
art before the effective filing date of the claimed invention, having the teachings of Liu, Bui, and Wieman before them, to include Wieman’s specific teaching of using a CTC neural model to provide transcription and translation in Liu’s system of text recognition and storage. One would have been motivated to make such a combination of using a CTC neural model to provide transcription and translation (see Wieman C:24, L:56-63) and accurate text recognition to provide translation (see Liu page 2).
Regarding claims 22 and 25, Liu in view of Bui and further in view of Wieman teaches the elements of claims 3 and 17 as outlined above. Wieman further teaches wherein the connectionist temporal classification function receives an output matrix from the convolutional recurrent neural network (machine learning model may include Connectionist Temporal Classification (CTC), C:24, L:56-63. Feed forward neural network layer may be included which outputs matrix data. In view of the model including a Connectionist Temporal Classification (CTC), the feed forward neural network layer may output matrix data to the CTC, C8:L28-34).
Regarding claim 27, Liu in view of Bui and further in view of Wieman teaches the elements of claim 3 as outlined above. Wieman further teaches wherein the connectionist temporal classification function outputs a matrix that is a character score for each time step, and wherein the matrix is used for the first loss reduction and the second loss reduction (The audio processing systems may be used as part of a larger Connectionist Temporal Classification (CTC) neural model to provide transcription and/or command translation, C:24, L:56-63. The audio processing systems may be used to multiply key and/or query matrices with an input and taking a softmax of the dot product to obtain weights or scores, C21:L20-23. Multidimensional matrices may be used to minimize the loss function, C8:L6-13).
Claims 6, 13, 20, and 29 are rejected under 35 U.S.C. 103 as being unpatentable by Liu in view of in view of Bui and further in view of Chen et al (US 20210174781 A1), hereafter Chen, and Jin et al (CN 112417097 A), hereafter Jin.
Regarding claims 13 and 20, Liu in view of Bui teaches the elements of claims 8, and 15 as outlined above. Liu does not appear to explicitly teach: “wherein the encoder network comprises an attention mechanism that generates correlation scores for word element pairs of the plain text label and wherein the generating the semantic feature vectors further comprises using the correlation scores as a regression label”.
Chen teaches: wherein the encoder network comprises an attention mechanism that generates correlation scores for word element pairs of the plain text label (“the corresponding semantic vector may be obtained by adopting the formula (1) of attention mechanism:… where C.sub.i represents the i-th semantic vector, N represents the number of hidden nodes, and h.sub.j represents the hidden node of the j-th character in coding. The attention mechanism refers to that a.sub.i,j represents the correlation between the j-th phase in coding and the i-th phase in decoding, so the most appropriate context information for the current output is selected for each semantic vector”, P0046-P0047).
Accordingly, it would have been obvious to a person having ordinary skill in the
art before the effective filing date of the claimed invention, having the teachings of Liu, Bui, and Chen before them, to include Chen’s specific teaching of obtaining a semantic vector with the use of the formula for the attention mechanism in Liu’s system of text recognition and storage. One would have been motivated to make such a combination of using an attention mechanism for generating a semantic vector (see Chen P0046-P0047) and decoding an attention vector to determine a text recognition result corresponding to the target semantic vector (see Liu page 10).
Jin teaches: wherein the generating the semantic feature vectors further comprises using the correlation scores as a regression label (“defining the relevance model of the image and the text as follows: S (v, t) = Sv-t + St-v… S (v, t) represents the final relevance of the visual graph and the text graph”, formula listed on bottom of page 16, under the second full paragraph).
Accordingly, it would have been obvious to a person having ordinary skill in the
art before the effective filing date of the claimed invention, having the teachings of Liu, Bui, Chen, and Jin before them, to include Jin’s specific teaching of correlation of text and image being the label of the correlation model in Liu’s system of text recognition and storage and in Chen’s system of text-based speech synthesis. One would have been motivated to make such a combination correlation of text and image being the label of the correlation model (see Jin formula listed on bottom of page 16, under the second full paragraph) and obtaining the weight parameter of the target semantic vector to determine the correlation with the target semantic vector (see Liu page 13, paragraph 5). One would have been motivated to make such a combination correlation of text and image being the label of the correlation model (see Jin bottom of page 16, under the second full paragraph) and the correlation between the j-th phase in coding and the i-th phase in decoding of the attention mechanism to determine the most appropriate context information as the output of the semantic vector (see Chen P0047).
Regarding claim 29, Liu in view of Bui teaches the elements of claim 1 as outlined above. Liu does not appear to explicitly teach: “wherein the encoder network comprises an attention mechanism that generates correlation scores for word element pairs of the plain text label”.
Chen teaches: wherein the encoder network comprises an attention mechanism that generates correlation scores for word element pairs of the plain text label (“the corresponding semantic vector may be obtained by adopting the formula (1) of attention mechanism:… where C.sub.i represents the i-th semantic vector, N represents the number of hidden nodes, and h.sub.j represents the hidden node of the j-th character in coding. The attention mechanism refers to that a.sub.i,j represents the correlation between the j-th phase in coding and the i-th phase in decoding, so the most appropriate context information for the current output is selected for each semantic vector”, P0046-P0047).
Accordingly, it would have been obvious to a person having ordinary skill in the
art before the effective filing date of the claimed invention, having the teachings of Liu, Bui, and Chen before them, to include Chen’s specific teaching of obtaining a semantic vector with the use of the formula for the attention mechanism in Liu’s system of text recognition and storage. One would have been motivated to make such a combination of using an attention mechanism for generating a semantic vector (see Chen P0046-P0047) and decoding an attention vector to determine a text recognition result corresponding to the target semantic vector (see Liu page 10).
Regarding claim 6, Liu in view of Bui and further in view of Chen teaches the elements of claim 29 as outlined above. Liu does not appear to explicitly teach: “using the correlation scores as a regression label”.
Jin teaches: using the correlation scores as a regression label (“defining the relevance model of the image and the text as follows: S (v, t) = Sv-t + St-v… S (v, t) represents the final relevance of the visual graph and the text graph”, formula listed on bottom of page 16, under the second full paragraph).
Accordingly, it would have been obvious to a person having ordinary skill in the
art before the effective filing date of the claimed invention, having the teachings of Liu, Bui, Chen, and Jin before them, to include Jin’s specific teaching of correlation of text and image being the label of the correlation model in Liu’s system of text recognition and storage and in Chen’s system of text-based speech synthesis. One would have been motivated to make such a combination correlation of text and image being the label of the correlation model (see Jin formula listed on bottom of page 16, under the second full paragraph) and obtaining the weight parameter of the target semantic vector to determine the correlation with the target semantic vector (see Liu page 13, paragraph 5). One would have been motivated to make such a combination correlation of text and image being the label of the correlation model (see Jin bottom of page 16, under the second full paragraph) and the correlation between the j-th phase in coding and the i-th phase in decoding of the attention mechanism to determine the most appropriate context information as the output of the semantic vector (see Chen P0047).
Claims 7, 14, and 21 are rejected under 35 U.S.C. 103 as being unpatentable by Liu in view of Bui and further in view of Jiang et al (CN 112698833 A), hereafter Jiang.
Regarding claims 7, 14, and 21, Liu in view of Bui teaches the elements of claims 1, 8, and 15 as outlined above. Liu does not appear to explicitly teach: “wherein the generating the semantic feature vectors comprises inputting the plain text labels into at least one member selected from the group consisting of an encoder and a cosine similarity discriminator”.
Jiang teaches: wherein the generating the semantic feature vectors comprises inputting the plain text labels into at least one member selected from the group consisting of an encoder (“the attention weight at each hidden layer state is defined according to formula: e (ht) = W1tanh (W2ht) … ht represents a hidden layer state corresponding to t-th word in the sentence sequence… hk represents the kth input at the output of the encoder; after the step 1 to step 6, finishing the code function representation, namely the code function for characteristic representation and semantic feature extraction” page 4) and a cosine similarity discriminator (“namely obtaining the characteristic vector of 3 word sequence, then inputting it into the feed-forward neural network; calculating the cosine similarity” page 5).
Accordingly, it would have been obvious to a person having ordinary skill in the
art before the effective filing date of the claimed invention, having the teachings of Liu, Bui, and Jiang before them, to include Jiang’s specific teaching of using an encoder and calculating a cosign similarity with the text inputs in Liu’s system of text recognition and storage. One would have been motivated to make such a combination of using an encoder and calculating a cosign similarity with the text inputs (see Jiang pages 4-5) and
encoding the prior information of the target semantic vector by the neural network to obtain the information (see Liu page 20).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 20210125034 A1 (Nguyen) teaches a system of extracting text from documents using a convolutional neural network and recurrent neural network.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ISHAN MOUNDI whose telephone number is (703)756-1547. The examiner can normally be reached 8:30 A.M. - 5 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Ell can be reached at (571) 270-3264. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/I.M./Examiner, Art Unit 2141
/ANDREW L TANK/Primary Examiner, Art Unit 2141