Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
2. Applicant’s arguments directed to newly added limitations of claim 1 are now moot based on new grounds of rejection. Claim 1 now recites “calculating a degree of character string similarity between a character string indicating the first named entity and a character string indicating each of a plurality of named entities in the first language included in first dictionary data.”
Whereas former claim 3 recited similarity between the first named entity and the third named entity. The amended language changes the interpretation of the limitation. Accordingly, the Examiner’s analysis of amended claim 1 is not limited to the prior art treatment of former claim 3.
The newly added limitations are taught by newly cited Sun (US 2019/02510085).
Claim Rejections - 35 USC § 103
3. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
4. Claims 1-2, 4-8 are rejected under 35 U.S.C. 103 as being unpatentable over Li (US 2021/0124880) in view of Sun (US 2019/02510085).
Regarding Claim 1:
Li discloses a non-transitory computer-readable recording medium storing therein a computer program that causes a computer to execute a process comprising:
acquiring a first parallel corpus in which a first sentence including a first named entity in a first language and a second sentence including a second named entity in a second language corresponding to the first named entity are associated (Li: p[0017]-[0018] discloses obtaining a bilingual training set made up of sentence pairs where each pair has a first sentence in a first language and a corresponding second sentence in a second language with the same meaning; p[0066] discloses named entities in the sentences);
extracting, from the first dictionary data, a third named entity whose degree of the character string similarity with the first named entity exceeds a threshold (Li: p[0016] discloses the bilingual dictionary or vocabulary data structure that contains a multitude of named entities in the first and second languages; p[0066] and table 4 show what would be a first and third entities that share similarity and also have corresponding second language translations);
specifying a fourth named entity in the second language that corresponds to the third named entity using second dictionary data indicating correspondence between named entities in the first language and named entities in the second language (Li: p[0016] the bilingual dictionary establishes correspondence between the named entities in the first and second languages; p[0066] table 4 discloses names that are extremely similar so much so that they are often confused in translation or named entity recognition, and as seen are all within the first and second languages and within each respective language are grouped together; p[0082] goes on to say that because of this similarity they can even replace each other, in this example the English variant of Li could be the first named entity in the first language and the second the Chinese variant, the third named entity would be Zhang or Liu English variants and the fourth named entity would be the Chinese (second language) variant to Zhang and Liu); and
generating a second parallel corpus, which differs from the first parallel corpus, by replacing the first named entity included in the first sentence with the third named entity and replacing the second named entity included in the second sentence with the fourth named entity (Li: p[0018]-[0020] discloses a augmented bilingual training set separate from the original bilingual training set and shows both sets co-exist when later fused in p[0085] which is equivalent to a second and first parallel corpus; p[0078] discloses that it replaces the matched name in both the source and target sentence with a placeholder to form a generalized pattern, it then re-inserts bilingual vocabulary pairs into the aligned placeholders in p[0082]-[0085] because this reinsertion step simultaneously replaces the first named entity in the first sentence with the third named entity and the second named entity with the fourth named entity, it therefore generates the second (augmented) parallel corpus that differs from the original).
Li does not explicitly disclose calculating a degree of character string similarity between a character string indicating the first named entity and a character string indicating each of a plurality of named entities in the first language included in first dictionary data;
However, Sun discloses calculating a degree of character string similarity between a character string indicating the first named entity and a character string indicating each of a plurality of named entities in the first language included in first dictionary data (Sun: ¶[0005] discloses that person-name fuzzy matching uses a string matching algorithm and a string matching degree threshold, ¶[0009] further discloses determining a standard name set used to match the name to be matched. ¶[0043] performing similarity matching on each word included in the name to be matched and each word included in a name in the first name set. Further ¶[0095] teaches performing similarity matching on each word included in the name to be matched and each index by using a string matching algorithm. ¶[0134] discloses that each of the plurality of words is matched with each of the plurality of elements based on similarity degree and that the algorithm may include a string similarity matching algorithm. Therefore, Sun teaches calculating a degree of character string similarity between a first named entity and each of a plurality of candidate names in dictionary/name-set data);
Li and Sun are combinable because their disclosures align with pertinent subject matter. Li teaches a bilingual dictionary and parallel corpus, while Sun discloses string similarity matching when comparing entities against a dictionary, i.e., both disclose methods for matching text to its corresponding translation or similar piece of text. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to disclose string similarity matching when comparing a first named entity against a plurality of candidate named entities in a dictionary. The suggestion/motivation for doing so is disclosed in Sun ¶[0005] “a string matching algorithm is usually used to perform person name fuzzy matching, and a string matching degree threshold determines a fuzzy degree. However, the string matching degree threshold is all set according to experience. To reduce omission, the string matching degree threshold is usually set to a relatively low value. Consequently, the matching accuracy is relatively low.” Therefore, combining Sun’s similarity based candidate matching with Li’s bilingual corpus augmentation would have predictably improved the accuracy of selecting named entities while preserving Li’s bilingual training-data generation process.
Regarding Claim 2:
The proposed combination of Li and Sun further discloses the non-transitory computer-readable recording medium according to claim 1, wherein the process further includes determining a named entity class of the first named entity using a trained named entity recognition model and selecting the first dictionary data based on the named entity class out of the first dictionary data including different named entities (Li: p[0066] discloses the process of identifying a category (personal names) as a type of named entity and building a dictionary specifically for that class of named entities; p[0021] discloses the word pairs may be based on categories and fields of words, This inherently requires determining the named entity class before selecting the proper dictionary containing entities of that type).
Regarding Claim 4:
Li further discloses the non-transitory computer-readable recording medium according to claim 1, wherein the second dictionary data is multilingual terminology dictionary data in which named entities in a plurality of languages that express a concept are written in association with an identifier that identifies the concept (Li: p[0066] discloses a multilingual dictionary containing named entities in two languages, these named entities are linked to on another to represent the same underlying concept such as the name seen in table 4, there is also Table 6 explained in p[0070] to further support this by disclosing how these dictionary entries are applied to actual sentences and how the first and second languages contain the same topic).
Regarding Claim 5:
The proposed combination of Li and Sun further discloses the non-transitory computer-readable recording medium according to claim 1, wherein the specifying includes using, upon detecting that the second dictionary data includes a plurality of fourth named entities that are associated with the third named entity, a distributed representation vector of a word included in the third named entity and distributed representation vectors of words included in the plurality of fourth named entities to select the fourth named entity out of the plurality of fourth named entities (Li: p[0071] discloses each bilingual vocabulary is structured as lex_i = (lex_xi, lex_yi) where lex_xi is a first language word (third named entity) and lex_yi is a second-language word (fourth named entity). These pairings are stored and traversed because there may be multiple possible second language words for a single first language word. The original bilingual set D is searched to generate Dmatch = {(x1, y1), … (xs, ys)} where each x and y is aligned as a pair in a two dimensional data structure. This is a distributed representation vector, each word’s relationship to others is encoded by its positioning this array. The process then applies conditions as disclosed in p[0072]-[0073] to correct the second language match for a given third named entity, essentially comparing candidates (in the vector space D) and selecting the most suitable one).
Regarding Claim 6:
The proposed combination of Li and Sun further discloses the non-transitory computer-readable recording medium according to claim 1, wherein the first language is a language used in an original text inputted into a machine translation model and the second language is a language used in translated text outputted from the machine translation model (Li: p[0059] discloses the target translation model may be for translating corpus data between the first and the second language).
Regarding Claim 7:
Claim 7 has been analyzed with regard to claim 1 (see rejection above) and
is rejected for the same reasons of obviousness used above. It is noted that Li discloses a processor at least at [0005].
Regarding Claim 8:
Claim 8 has been analyzed with regard to claim 1 (see rejection above) and is rejected for the same reasons of obviousness used above. It is noted that Li discloses a processor coupled to the memory at least at [0005].
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to IAN SCOTT MCLEAN whose telephone number is (703)756-4599. The examiner can normally be reached "Monday - Friday 8:00-5:00 EST, off Every 2nd Friday".
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hai Phan can be reached at (571) 272-6338. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/IAN SCOTT MCLEAN/Examiner, Art Unit 2654
/HAI PHAN/Supervisory Patent Examiner, Art Unit 2654