Last updated: May 29, 2026
Application No. 18/623,332
METHOD AND APPARATUS FOR TRAINING NAMED ENTITY RECOGNITION MODEL AND NON-TRANSITORY COMPUTER-READABLE MEDIUM

Non-Final OA §102
Filed
Apr 01, 2024
Priority
Apr 06, 2023 — CN 202310362781.8
Examiner
KIM, ETHAN DANIEL
Art Unit
2658
Tech Center
2600 — Communications
Assignee
Ricoh Company Ltd.
OA Round
1 (Non-Final)
Interview Optional

— +29.5% interview lift. Examiner has a relatively high allowance rate (78%); +29.5% interview lift. A written response may suffice.
Based on 107 resolved cases, 2023–2026
Examiner Intelligence

KIM, ETHAN DANIEL View full profile →
Grants 78% — above average
Career Allowance Rate
83 granted / 107 resolved
+15.6% vs TC avg
Strong +30% interview lift
Without
With
+29.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 10m
Avg Prosecution
11 currently pending
Career history
122
Total Applications
across all art units
Statute-Specific Performance

§101
3.4%
-36.6% vs TC avg
§103
69.8%
+29.8% vs TC avg
§102
23.4%
-16.6% vs TC avg
§112
0.8%
-39.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 107 resolved cases
Office Action

§102
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
2.	The information disclosure statement (IDS) submitted on April 1, 2024 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 102
3.	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

4.	Claims 1, 3-9, and 11-17 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Elisco (U.S. Publication No. 20220300711).
	Regarding claim 1, Elisco discloses a method of training a named entity recognition model, wherein, the named entity recognition model includes an encoder and a decoder, and the encoder contains a pre-trained language model and an attention mechanism model ([0041] - computing device 104 may train neural network 108 by performing named entity recognition as a function of corpus of documents 112. [0073] - Input 716 is entered to a multi-head-attention 708 such that encoder 704 produces output encodings that are provided to the next encoder element and/or a decoder element 720), the method comprising: 
acquiring a plurality of training texts, wherein, each training text is pre-marked with tags, and the tags are used to mark named entity types to which tokens in the training text belong, and constructing a tag annotation for each of the tags, wherein, in response to each of the tags being a tag corresponding to a named entity, the tag annotation includes a position indication token indicating a position of the token in the named entity, corresponding to the tag ([0028] - neural network 108 and/or a training data classifier as described in further detail below may be trained to perform sentence taxonomy, or classification of sentences into a pre-defined list of categories relevant to one or more social services domains. Taxonomy, and/or training data train neural network 108 and/or a training data classifier to perform taxonomy, may be built based on information from academic publications, social service program guidelines, and/or feedback from case workers/supervisors; that is, such information and/or feedback may be correlated and/or paired with sentences in one or more corpuses to form training examples that may be used to train neural network 108 to perform sentence taxonomy [0073] - As used in this disclosure “feed-forward network” is a network that has identical parameters for each position, such that each normalized weighted vector can be a separate, identical linear transformation of each vector in each sequence); 
generating a weight matrix on the basis of all the tag annotations, wherein, each row of the weight matrix corresponds to one tag annotation, respective elements in the row sequentially correspond to the tokens in the tag annotation, values of the elements corresponding to the position indication tokens in the tag annotation are k, values of the elements corresponding to the tokens other than the position indication tokens in the tag annotation are 0, and k is a learnable parameter during a process of training the named entity recognition model ([0024] - Transformer model may include scaled dot-product attention units that learn three weight matrices including query weights, key weights, and value weights. [0072] -  Multi-head attention may be calculated as a function of a matrix, Q, that contains vector representation of a term in a sequence, a vector representation of all the words in the sequence, K, and the value associated with the values of the vectors that represent all of the words in the sequence, V, according to the function); 
inputting the training text and the tag annotations into the pre-trained language model to obtain a first vector representation of the training text and a first vector representation of the tag annotations ([0033] -  labeled data sets as described above may be used as training data to train neural network 108 to label sentences, classify sentences and/or other semantic units to taxonomies, and/or to identify and/or recognize entities, as described below. Any of these processes may be performed with agency-specific training sets and/or corpuses, which may be selected as described above [0037] - dimensions of vector space may not represent distinct terms, in which case elements of a vector representing a first term may have numerical values that together represent a geometrical relationship to a vector representing a second term, wherein the geometrical relationship represents and/or approximates a semantic relationship between the first term and the second term); 
inputting the first vector representation of the training text and the first vector representation of the tag annotations into the attention mechanism model to calculate a first relationship between the training text and the tag annotations, weighting the first relationship by using the weight matrix to obtain a second relationship, and generating a final vector representation of the training text on the basis of the second relationship ([0072] - Encoder element 704 may be comprised of a multi-head attention 708 and a feed-forward neural network 712. As used in this disclosure “multi-head attention” is a scaled dot-product attention that establishes a weight for every input in the sequence, such that a relative position in an n-dimensional space is established for an n-dimensional vector of each significant term, word, and/or semantic unit. Multi-head attention may be calculated as a function of a matrix, Q, that contains vector representation of a term in a sequence, a vector representation of all the words in the sequence, K, and the value associated with the values of the vectors that represent all of the words in the sequence, V, according to the function:…);
inputting the final vector representation of the training text into the decoder to obtain a tag corresponding to each token in the training text, output by the decoder ([0073] - Decoder element 720 may be comprised of multi-head attention 708, feed-forward network 712, and/or an attention mechanism over the encodings. Decoder element 720 functions in a similar fashion to encoder element 704, but an additional attention mechanism is inserted, wherein the attention mechanism draws relevant information from the encodings generated by encoder element 704. Decoder element 720 takes position and/or direction information of the normalized weighted vectors in n-dimensional space and predicts an output sequence. Decoder element 720 includes a linear transformation element 728. As used in this disclosure “linear transformation” is a linear transformation that at least maps two or more vectors in an n-dimensional vector space. Decoder element 720 includes a softmax layer 732. As used in this disclosure “softmax layer” is a generalization of the logistic function to multiple dimensions. Softmax layer 732 may normalize the output of a network to a probability distribution of the decoder element 720 output); 
and optimizing the named entity recognition model on the basis of the tag corresponding to each token in the training text, output by the decoder and the pre-marked tags in the training text to obtain a trained named entity recognition model ([0043] - Algorithm to generate language processing model may include a stochastic gradient descent algorithm, which may include a method that iteratively optimizes an objective function, such as an objective function representing a statistical estimation of relationships between terms, including relationships between input terms and output terms, in the form of a sum of relationships to be estimated).
Regarding claim 3, Elisco discloses the method, wherein, the obtainment of the first vector representation of the training text and the first vector representation of the tag annotations includes inputting the training text and the tag annotations into the pre-trained language model to obtain IDs of the training text and IDs of the tag annotations both represented by numerical values ([0026] - actions performed by agency-specific ETLs may include such steps as removing HTML, tags, note formatting, new line prediction, and/or maintaining structured data, such as case identifiers, note identifiers, dates, authors, or the like, to accompany each note [0033] -  labeled data sets as described above may be used as training data to train neural network 108 to label sentences, classify sentences and/or other semantic units to taxonomies, and/or to identify and/or recognize entities, as described below. Any of these processes may be performed with agency-specific training sets and/or corpuses, which may be selected as described above);
and generating the first vector representation of the training text on the basis of the IDs of the training text, and generating the first vector representation of the tag annotations on the basis of the IDs of the tag annotations ([0026] - actions performed by agency-specific ETLs may include such steps as removing HTML, tags, note formatting, new line prediction, and/or maintaining structured data, such as case identifiers, note identifiers, dates, authors, or the like, to accompany each note [0033] -  labeled data sets as described above may be used as training data to train neural network 108 to label sentences, classify sentences and/or other semantic units to taxonomies, and/or to identify and/or recognize entities, as described below. Any of these processes may be performed with agency-specific training sets and/or corpuses, which may be selected as described above).
Regarding claim 4, Elisco discloses the method, wherein, the calculation of the first relationship between the training text and the tag annotations includes weighting the first vector representation of the training text by using a first weight parameter to obtain a second vector presentation of the training text, and weighting the first vector representation of the tag annotations by using a second weight parameter to obtain a second vector representation of the tag annotations, wherein, the first weight parameter and the second weight parameter are learnable parameters ([0024] - Transformer model may include scaled dot-product attention units that learn three weight matrices including query weights, key weights, and value weights. [0072] -  Multi-head attention may be calculated as a function of a matrix, Q, that contains vector representation of a term in a sequence, a vector representation of all the words in the sequence, K, and the value associated with the values of the vectors that represent all of the words in the sequence, V, according to the function);
and calculating the first relationship between the training text and the tag annotations on the basis of the second vector representation of the training text and the second vector representation of the tag annotations ([0037] - dimensions of vector space may not represent distinct terms, in which case elements of a vector representing a first term may have numerical values that together represent a geometrical relationship to a vector representing a second term, wherein the geometrical relationship represents and/or approximates a semantic relationship between the first term and the second term).
Regarding claim 5, Elisco discloses the method, wherein, the obtainment of the second relationship by using the weight matrix to weight the first relationship includes dimensionally expanding the weight matrix so that dimensions of the expanded weight matrix are the same as dimensions of the first relationship, and adding the expanded weight matrix and the first relationship to obtain the second relationship ([0037] - dimensions of vector space may not represent distinct terms, in which case elements of a vector representing a first term may have numerical values that together represent a geometrical relationship to a vector representing a second term, wherein the geometrical relationship represents and/or approximates a semantic relationship between the first term and the second term [0056] - a vector may be represented, without limitation, in n-dimensional space using an axis per category of value represented in n-tuple of values, such that a vector has a geometric direction characterizing the relative quantities of attributes in the n-tuple as compared to each other).
Regarding claim 6, Elisco discloses the method, wherein, the generation of the final vector representation of the training text on the basis of the second relationship includes calculating a third vector representation of the training text on the basis of the second relationship and the second vector representation of all the tag annotations, wherein, the third vector representation of the training text is represented as a token level vector representation ([0033] -  labeled data sets as described above may be used as training data to train neural network 108 to label sentences, classify sentences and/or other semantic units to taxonomies, and/or to identify and/or recognize entities, as described below. Any of these processes may be performed with agency-specific training sets and/or corpuses, which may be selected as described above [0037] - dimensions of vector space may not represent distinct terms, in which case elements of a vector representing a first term may have numerical values that together represent a geometrical relationship to a vector representing a second term, wherein the geometrical relationship represents and/or approximates a semantic relationship between the first term and the second term); 
converting the third vector representation of the training text into a sentence level vector representation to obtain a fourth vector representation of the training text ([0033] -  labeled data sets as described above may be used as training data to train neural network 108 to label sentences, classify sentences and/or other semantic units to taxonomies, and/or to identify and/or recognize entities, as described below. Any of these processes may be performed with agency-specific training sets and/or corpuses, which may be selected as described above);
and combining the fourth vector representation of the training text and the second vector representation of the training text to obtain the final vector representation of the training text ([0033] -  labeled data sets as described above may be used as training data to train neural network 108 to label sentences, classify sentences and/or other semantic units to taxonomies, and/or to identify and/or recognize entities, as described below. Any of these processes may be performed with agency-specific training sets and/or corpuses, which may be selected as described above [0037] - dimensions of vector space may not represent distinct terms, in which case elements of a vector representing a first term may have numerical values that together represent a geometrical relationship to a vector representing a second term, wherein the geometrical relationship represents and/or approximates a semantic relationship between the first term and the second term).
Regarding claim 7, Elisco discloses the method, wherein, the tags are BIO tags, BMES tags, or BIOSE tags ([0049] - Each token in a note may be annotated using beginning-inside-outside (BIO) notation and/or tagging. In BIO notation, a “B” may denote a start of an entity, an “I” may represent a continuation of an entity, and an “O” may represent non-entity tokens).
Regarding claim 8, Elisco discloses the method, further comprising: performing named entity recognition by utilizing the trained named entity recognition model ([0042] - neural network 108 may perform named entity recognition by producing a language processing model. Language processing model may include a program automatically generated by computing device 104 and/or named entity recognition to produce associations between one or more significant terms extracted from corpus of documents 112 and detect associations, including without limitation mathematical associations, between such significant terms).
Regarding claim 9, Elisco discloses an apparatus for training a named entity recognition model, wherein, the named entity recognition model includes an encoder and a decoder, and the encoder contains a pre-trained language model and an attention mechanism model ([0041] - computing device 104 may train neural network 108 by performing named entity recognition as a function of corpus of documents 112. [0073] - Input 716 is entered to a multi-head-attention 708 such that encoder 704 produces output encodings that are provided to the next encoder element and/or a decoder element 720), the apparatus comprising: 
a first acquisition part configured to acquire a plurality of training texts, wherein, each training text is pre-marked with tags, and the tags are used to mark named entity types to which tokens in the training text belong, and construct a tag annotation for each of the tags, wherein, in response to each of the tags being a tag corresponding to a named entity, the tag annotation includes a position indication token indicating a position of the token in the named entity, corresponding to the tag ([0028] - neural network 108 and/or a training data classifier as described in further detail below may be trained to perform sentence taxonomy, or classification of sentences into a pre-defined list of categories relevant to one or more social services domains. Taxonomy, and/or training data train neural network 108 and/or a training data classifier to perform taxonomy, may be built based on information from academic publications, social service program guidelines, and/or feedback from case workers/supervisors; that is, such information and/or feedback may be correlated and/or paired with sentences in one or more corpuses to form training examples that may be used to train neural network 108 to perform sentence taxonomy [0073] - As used in this disclosure “feed-forward network” is a network that has identical parameters for each position, such that each normalized weighted vector can be a separate, identical linear transformation of each vector in each sequence);
a first generation part configured to generate a weight matrix on the basis of all the tag annotations, wherein, each row of the weight matrix corresponds to one tag annotation, respective elements in the row sequentially correspond to the tokens in the tag annotation, values of the elements corresponding to the position indication tokens in the tag annotation are k, values of the elements corresponding to the tokens other than the position indication tokens in the tag annotation are 0, and k is a learnable parameter during a process of training the named entity recognition model ([0024] - Transformer model may include scaled dot-product attention units that learn three weight matrices including query weights, key weights, and value weights. [0072] -  Multi-head attention may be calculated as a function of a matrix, Q, that contains vector representation of a term in a sequence, a vector representation of all the words in the sequence, K, and the value associated with the values of the vectors that represent all of the words in the sequence, V, according to the function);
a first obtainment part configured to input the training text and the tag annotations into the pre-trained language model to obtain a first vector representation of the training text and a first vector representation of the tag annotations ([0033] -  labeled data sets as described above may be used as training data to train neural network 108 to label sentences, classify sentences and/or other semantic units to taxonomies, and/or to identify and/or recognize entities, as described below. Any of these processes may be performed with agency-specific training sets and/or corpuses, which may be selected as described above [0037] - dimensions of vector space may not represent distinct terms, in which case elements of a vector representing a first term may have numerical values that together represent a geometrical relationship to a vector representing a second term, wherein the geometrical relationship represents and/or approximates a semantic relationship between the first term and the second term);
a second obtainment part configured to input the first vector representation of the training text and the first vector representation of the tag annotations into the attention mechanism model to calculate a first relationship between the training text and the tag annotations, weight the first relationship by using the weight matrix to obtain a second relationship, and generate a final vector representation of the training text on the basis of the second relationship ([0072] - Encoder element 704 may be comprised of a multi-head attention 708 and a feed-forward neural network 712. As used in this disclosure “multi-head attention” is a scaled dot-product attention that establishes a weight for every input in the sequence, such that a relative position in an n-dimensional space is established for an n-dimensional vector of each significant term, word, and/or semantic unit. Multi-head attention may be calculated as a function of a matrix, Q, that contains vector representation of a term in a sequence, a vector representation of all the words in the sequence, K, and the value associated with the values of the vectors that represent all of the words in the sequence, V, according to the function:…);
a third obtainment part configured to input the final vector representation of the training text into the decoder to obtain a tag corresponding to each token in the training text, output by the decoder ([0073] - Decoder element 720 may be comprised of multi-head attention 708, feed-forward network 712, and/or an attention mechanism over the encodings. Decoder element 720 functions in a similar fashion to encoder element 704, but an additional attention mechanism is inserted, wherein the attention mechanism draws relevant information from the encodings generated by encoder element 704. Decoder element 720 takes position and/or direction information of the normalized weighted vectors in n-dimensional space and predicts an output sequence. Decoder element 720 includes a linear transformation element 728. As used in this disclosure “linear transformation” is a linear transformation that at least maps two or more vectors in an n-dimensional vector space. Decoder element 720 includes a softmax layer 732. As used in this disclosure “softmax layer” is a generalization of the logistic function to multiple dimensions. Softmax layer 732 may normalize the output of a network to a probability distribution of the decoder element 720 output);
and an optimization part configured to optimize the named entity recognition model on the basis of the tag corresponding to each token in the training text, output by the decoder and the pre-marked tags in the training text to obtain a trained named entity recognition model ([0043] - Algorithm to generate language processing model may include a stochastic gradient descent algorithm, which may include a method that iteratively optimizes an objective function, such as an objective function representing a statistical estimation of relationships between terms, including relationships between input terms and output terms, in the form of a sum of relationships to be estimated).
Dependent claims 11-15 are analogous in scope to claims 3-6 and 8, and are rejected according to the same reasoning.
Regarding claim 16, Elisco discloses a non-transitory computer-readable medium having a computer program for execution by a processor, wherein, the computer program causes, when executed by the processor, the processor to implement the method according to claim 1 ([0093] - a machine-readable storage medium does not include transitory forms of signal transmission).
Regarding claim 17, Elisco discloses an apparatus comprising: 
a processor ([0096] - Computer system 1300 includes a processor 1304 and a memory 1308 that communicate with each other); 
and a storage storing a computer program, coupled to the processor, wherein, the computer program causes, when executed by the processor, the processor to implement the method according to claim 1 ([0096] - Computer system 1300 includes a processor 1304 and a memory 1308 that communicate with each other).
Allowable Subject Matter
5.	Claims 2 and 10 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. 
The following is a statement of reasons for the indication of allowable subject matter:  The prior art could not overcome or render obvious the limitation of “the generation of the weight matrix includes unifying numbers of the tokens of all the tag annotations on the basis of a maximum number of tokens in all the tag annotations; initializing a zero matrix, wherein, each row of the zero matrix corresponds to one tag annotation, and respective elements in each row sequentially correspond to the tokens in the tag annotation” as claimed. 
Conclusion
6.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
	He (U.S. Publication No. 20230153532) teaches language-model pretraining with gradient-disentangled embedding sharing. Zhang (U.S. Publication No. 20210012199) teaches address information feature extraction method based on deep neural network model. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ETHAN DANIEL KIM whose telephone number is (571) 272-1405.  The examiner can normally be reached on Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/ETHAN DANIEL KIM/
Examiner, Art Unit 2658

/RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658
Read full office action
Prosecution Timeline

Apr 01, 2024
Application Filed
Apr 02, 2026
Non-Final Rejection mailed — §102 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/309,496
Patent 12639532
GENERATING MULTI-ORDER TEXT QUERY RESULTS UTILIZING A CONTEXT ORCHESTRATION ENGINE
3y 0m to grant Granted May 26, 2026
18/405,269
Patent 12640162
System and Method for Podcast Repetitive Content Detection
2y 4m to grant Granted May 26, 2026
17/588,241
Patent 12626049
System and Method for Automatic Summarization in Interlocutor Turn-Based Electronic Conversational Flow
4y 3m to grant Granted May 12, 2026
18/303,524
Patent 12626712
SPEECH ENHANCEMENT SYSTEM
3y 0m to grant Granted May 12, 2026
17/895,715
Patent 12620397
SYSTEMS AND METHODS FOR PERFORMING LIVE TRANSCRIPTION
3y 8m to grant Granted May 05, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
78%
Grant Probability
99%
With Interview (+29.5%)
2y 10m (~8m remaining)
Median Time to Grant
Low
PTA Risk
Based on 107 resolved cases by this examiner. Grant probability derived from career allowance rate.