Last updated: April 19, 2026
Application No. 18/580,554
DEVICE AND METHOD FOR PROCESSING VOICES OF SPEAKERS

Final Rejection §101§103§112§DP
Filed
Jan 18, 2024
Examiner
SHAH, PARAS D
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Amosense Co. Ltd.
OA Round
2 (Final)
Interview Optional

— +31.1% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 645 resolved cases, 2023–2026
Examiner Intelligence

SHAH, PARAS D View full profile →
Grants 74% — above average
Career Allow Rate
474 granted / 645 resolved
+11.5% vs TC avg
Strong +31% interview lift
Without
With
+31.1%
Interview Lift
resolved cases with interview
Typical timeline
3y 9m
Avg Prosecution
24 currently pending
Career history
669
Total Applications
across all art units
Statute-Specific Performance

§101
20.3%
-19.7% vs TC avg
§103
44.9%
+4.9% vs TC avg
§102
13.8%
-26.2% vs TC avg
§112
10.5%
-29.5% vs TC avg
Black line = Tech Center average estimate • Based on career data from 645 resolved cases
Office Action

§101 §103 §112 §DP
DETAILED ACTION
This communication is in response to the Amendments and Arguments filed on 11/28/2025. Claims 1, 4-8 and 10-14 are pending and have been examined.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Response to Applicant’s Amendments and Arguments 
With respect to the Double Patenting Rejections, the Applicant requests that these rejections be held in abeyance. Hence, the Double Patenting rejections have been maintained and updated below.
With respect to the 35 USC 101 abstract rejections, the Applicant asserts that the claims as amended are drawn toward patent eligible subject matter and notes that the amendment “produced a tangible physical file embodying the translation results, thereby transforming the processing from a conceptual translation activity into a concrete, technical process executed by hardware to generate structure multi-language minutes files” and further assert that “the claimed invention performs technical operations that cannot practically be performed mentally, including determining sound source positions of concurrently speaking participants using signals captured by a microphone, generating separated voice signals for each detected sound source, automatically identifying current languages based on stored position- language information, and producing multiple sets of translated minutes-of-meeting files corresponding to different languages for distribution. These operations constitute advanced signal processing and multilingual file generation capabilities that require specialized hardware execution and are not analogous to human mental activity. The USPTO's 2019 Revised Patent Subject Matter Eligibility Guidance states that a claim is not abstract if it cannot practically be performed in the human mind, and Example 37 (Speech Processing) specifically recognizes eligibility for inventions that improve the processing of speech signals. Likewise, the Federal Circuit has repeatedly confirmed eligibility where specific improvements to computer systems are claimed, as in McRO v. Bandai (837 F.3d 1299), which held that automated rule-based processing producing a tangible output is not an abstract idea, and in Thales Visionix v. U.S. (850 F.3d 1343), which found patent eligibility based on an unconventional use of sensor data to solve a technical problem.”
The Examiner respectfully disagrees. The hardware as noted by the Applicant comprises usage of a “processor”, “microphone”, and “file”. The examiner notes that the usage of a “microphone” is only to receive audio data and thus only acting as pre-solution activity of data gathering. The microphone is not being improved in any way. Further, the “processor” is solely being used as a mechanism on which the stated steps operate. The Specification admits in para [0041] where the processor and memory, of the as filed specification, describes the CPU and storage as exemplary language which is interpreted to include all known CPU/Storage elements available. Therefore, the “processor” as claimed is simply a general purpose computing element which does not tie the abstract idea to a practical application. Furthermore, the usage of the word “file” would have been known based on the usage of a general purpose processor in order to create a transcript and this file has not been improved an any way and is easy understood to be similar to a human transcribing someone speaking during a conference and translating at the same time. Therefore, each of the limitations as also mentioned in the 35 USC 101 rejections below and in the prior rejection. The limitations reasonably relate to a human being able to localize speech based on pre-determined information and relationships of the different speakers and where they are located as well as the language. Then, once each speaker is identified, determining the language and creating a separate transcript for each speaker in their language. 
With respect to the 35 USC 103 rejections, the Applicant incorporated claim 2 and also added a new limitation at then end of the claim and the Applicant argues that Nakadai does not specifically teach “translating based on source languages stored in the memory corresponding to sound source positions, this is closer to separating voices by speaker positions and automatically detecting the languages” and further notes that the present invention “can provide translations of the speaker’s voice with less time and fewer resources, without additional processes such as language analysis”. The Applicant further notes that the amended claims 1 and 8 “can immediately output the translation for the voice of the speaker at the corresponding position without additional steps by mapping the stored source language information to the stored voice-source position (P1-P4) information in the memory.
	The Examiner respectfully disagrees with this assertion. The Applicant’s arguments center around the notion that each sound source position is associated with the translation language and such is stored in advance. However, the current claim language does not provide any information of what the position-language information comprises. The BRI of the claim allows for the interpretation of this position language information to be information that is determined prior to the conversation. For example, Nakadai in para [0138], the language information detecting unit outputs information related to language and the azimuth information. This is inherently stored in order to be used further downstream such as for ASR processing. Also as noted by the Applicant, the speaker can preselect a language in [0158] for which translation result should be displayed in. Again, even in this example the speaker location must be known in order to provide the translation result to the proper area and also results in knowing this relationship in advance of presenting the translation result. Therefore, both options allow and teach the concept of storing the position of the speaker and the language information to be used in the translation process in advance.
	Therefore, the Applicant’s arguments are not persuasive. 

Claim Objections
Claim 8 is objected to because of the following informalities: “the voice processing method of claim 8” appears in the first line of the amended portion that was brought in. This should be removed.  Appropriate correction is required.
Claim 10-11, 13 is objected to because of the following informalities: they are dependent upon cancelled claim 9 and should be dependent upon claim 8. This should be removed.  Appropriate correction is required.
Applicant is advised that should claim 1 and 8 be found allowable, claims 7 and 14 will be objected to under 37 CFR 1.75 as being a substantial duplicate thereof. When two claims in an application are duplicates or else are so close in content that they both cover the same thing, despite a slight difference in wording, it is proper after allowing one claim to object to the other as being a substantial duplicate of the allowed claim. See MPEP § 608.01(m). The issue occurs with respect to the last limitation of claims 1 and 8 and claims 7 and 14.
 
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 1, 5, 6-8, 10, and 12-14 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 5 and 14 of US 12347450 in view of Shin (US 2021/0374362). The issued patent teach all of the limitations except for “generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language.”  Shin does teach this limitation in para [0094] (see claim 7 mapping for detailed explanation under prior art). Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the issued patent with generation of meeting minutes in the current languages of speakers as taught by Shin in order to enhance convenience and expandability and for minimizing erroneous expression of opinion by occurrence of mistranslation (see Shin [0125]).
See Table below.
Instant Application (I) maps to US Patent (P)
Claim 1 (I): Claim 5 (P); Claim 2 (I): Claim 5 (P); Claim 3 (I): Claim 5 (P); Claim 5 (I): Claim 5 (P); Claim 6 (I): Claim 5 (P) in view of Shin (see prior art mapping for Shin’s teaching); Claim 7 (I): Claim 5 (P)) in view of Shin (see prior art mapping for Shin’s teaching);   Claim 8 (I): Claim 14 (P); Claim 9 (I): Claim 14 (P); Claim 10 (I): Claim 14 (P); Claim 12 (I): Claim 14 (P);Claim 13 (I): Claim 14 (P) in view of Shin (see prior art mapping for Shin’s teaching); Claim 14 (I): Claim 14 (P) in view of Shin (see prior art mapping for Shin’s teaching).  

Instant Application: 18/580554
US 12347450

A voice processing device configured to generate translation results for voices of speakers, comprising:

a microphone configured to generate voice signals associated with the voices of the speakers in response to the voices of the speakers; 

a memory configured to store position-language information representing languages corresponding to sound source positions of the voices of the speakers; and 

a processor configured to generate translation results obtained by translating the language of the voice of each of the speakers using the voice signal and the position- language information and generate translated minutes of meeting including a voice content of each of the speakers expressed in different languages using the translation results,

wherein the processor is configured to:

determine the sound source positions of the voices of the speakers using the voice signals generated from the microphone and generate sound source position information representing the determined sound source positions;

generate a separation voice signal associated with the voice pronounced at each sound source position from the voice signal;

determine current languages of the voices of the speakers using the position-language information stored in the memory;

generate the translation results obtained by translating the current languages of the voices of the speakers into the different languages using the separation voice signal and the determined current languages; and

generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language.
1. A voice processing device comprising: 

See claim 3.

a voice processing circuit configured to: 

generate separated voice signals related to voices by performing voice source separation of voice signals related to the voices pronounced at a plurality of voice source positions in a vehicle based on the voice source positions of the voices, and output translation results for the voices based on the separated voice signals; 

a memory configured to store source language information representing source languages for translating the voices related to the separated voice signals and target language information representing target languages; and 

See also claim 5

a communication circuit configured to output the translation results, wherein the voice processing circuit is configured to generate the translation results in which the languages of the voices corresponding to the separated voice signals are translated from the source languages into the target languages with reference to the memory:

wherein the communication circuit is configured to transmit the translation results to the vehicle in accordance with the control of the voice processing circuit, and wherein the transmitted translation results are output through loudspeakers of the vehicle as voices, wherein the 

voice processing circuit is configured to generate output position information representing positions of the loudspeakers to output the translation results in the vehicle, and transmit the generated output position information to the vehicle, wherein the voice processing circuit is configured to generate the output position information so that the target languages corresponding to the voice source positions of the voices related to the separated voice signals and the source languages corresponding to the positions of the loudspeakers are the same.

3. The voice processing device of claim 1, comprising a plurality of microphones disposed to form an array, wherein the plurality of microphones are configured to generate the voice signals in response to the voices.

5. The voice processing device of claim 3, wherein the voice processing circuit is configured to generate voice source position information representing the voice source positions of the voices based on a time delay among a plurality of voice signals generated from the plurality of microphones, and match and store, in the memory, the voice source position information for the voices with the separated voice signals for the voices. 

See limitation in claim 1

See limitation in claim 1

See claim 1 comment

See Shin 

Claims 4 and 11 are rejected on the ground of nonstatutory double patenting as being unpatentable over claim 2, 9 of US 12347450 in view of Shin and further in view of Nakadai in view of Asthana.
Please see the Mapping in the prior art sections for mapping.
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the instant application with the features as taught by claim 4 as taught by Nakadai in order to be able to address the difficulties when multiple speech pieces are present (see Nakadai [0006]-[0007]).
With respect to the motivation from Asthana please see mapping below for respective claim as same motivation applies.

Claims 1, 5-8, and 12-14 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 6 and 12 of copending Application No. 18029060 (reference application) in view of Shin (US 2021/0374362). The co-pending application teaches all of the limitations except for “generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language.”  Shin does teach this limitation in para [0094] (see claim 7 mapping for detailed explanation under prior art). Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the issued patent with generation of meeting minutes in the current languages of speakers as taught by Shin in order to enhance convenience and expandability and for minimizing erroneous expression of opinion by occurrence of mistranslation (see Shin [0125]).
 See Table below.
Each of the claims listed above from the instant application map to each of the claims of the copending application.
Instant Application (I) maps to Copending Application (P)
Claim 1 (I): Claim 6 (P); Claim 5 (I): Claim 5 (P); Claim 6 (I): Claim 6 (P) in view of Shin (see prior art mapping for Shin’s teaching); Claim 7 (I): Claim 6 (P)) in view of Shin (see prior art mapping for Shin’s teaching);  Claim 8 (I): Claim 12 (P); Claim 12 (P); Claim 12 (I): Claim 12 (P); Claim 13 (I): Claim 14 (P) in view of Shin (see prior art mapping for Shin’s teaching); Claim 14 (I): Claim 14 (P) in view of Shin (see prior art mapping for Shin’s teaching).
Instant Application: 18/580554
Co-Pending Application: 18029060
1. A voice processing device configured to generate translation results for voices of speakers, comprising:

a microphone configured to generate voice signals associated with the voices of the speakers in response to the voices of the speakers; 

a memory configured to store position-language information representing languages corresponding to sound source positions of the voices of the speakers; and 

a processor configured to generate translation results obtained by translating the language of the voice of each of the speakers using the voice signal and the position- language information and generate translated minutes of meeting including a voice content of each of the speakers expressed in different languages using the translation results.

generate a separation voice signal associated with the voice pronounced at each sound source position from the voice signal;

determine current languages of the voices of the speakers using the position-language information stored in the memory;

generate the translation results obtained by translating the current languages of the voices of the speakers into the different languages using the separation voice signal and the determined current languages; and

generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language.
1. A voice processing device comprising: 

a voice receiving circuit configured to receive voice signals related to voices pronounced by speakers; 

a voice processing circuit configured to: 
generate separated voice signals related to voices by performing voice source separation of the voice signals based on voice source positions of the voices, and generate translation results for the voices by using the separated voice signals; 

a memory; and 
 see claim 6

an output circuit configured to output the translation results for the voices, wherein an output order of the translation results is determined based on pronouncing time points of the voices such that the translation results are output seqiuentially when the speakers overlappingly pronounce the voices.

6. The voice processing device of claim 1, wherein the voice processing circuit is configured to: determine the source languages for translating the voices related to the separated voice signals and the target languages with reference to the source language information corresponding to the voice source positions of the separated voice signals stored in the memory and the target language information, and generate the translation results by translating languages of the voices from the source languages to the target languages.

See claim 1

See claim 6

See claim 6

See Shin

This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claims 4 and 11 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 2, 9 of 18/029060 in view of Shin in view of Nakadai in view of Asthana.
Please see the Mapping in the prior art sections for mapping.
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the instant application with the features as taught by claims 3 and 4 as taught by Nakadai in order to be able to address the difficulties when multiple speech pieces are present (see Nakadai [0006]-[0007]).
Please see prior art rejection below for motivation for combining Asthana.

Claims 10 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 9 of 18/029060 in view of Shin in view of Buckler (US 2008/0218582).
Please see mapping below with respect to Buckley as well as motivation.

Claims 1, 5-8, 10, and 12-14 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 and 8  of copending Application No. 18034626 (reference application) in view of Shin (US 2021/0374362). The co-pending application teaches all of the limitations except for “generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language.”  Shin does teach this limitation in para [0094] (see claim 7 mapping for detailed explanation under prior art). Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the issued patent with generation of meeting minutes in the current languages of speakers as taught by Shin in order to enhance convenience and expandability and for minimizing erroneous expression of opinion by occurrence of mistranslation (see Shin [0125]).
 See Table below.
Each of the claims listed above from the instant application map to each of the claims of the copending application.
Instant Application (I) maps to Copending Application (P)
Claim 1 (I): Claim 1 (P); Claim 5 (I): Claim 1 (P); Claim 6 (I): Claim 1 (P) in view of Shin (see prior art mapping for Shin’s teaching); Claim 7 (I): Claim 1 (P)) in view of Shin (see prior art mapping for Shin’s teaching); Claim 8 (I): Claim 8 (P); Claim 10 (I): Claim 9 (P); Claim 12 (I): Claim 8 (P); Claim 13 (I): Claim 8 (P) in view of Shin (see prior art mapping for Shin’s teaching); Claim 14 (I): Claim 8 (P) in view of Shin (see prior art mapping for Shin’s teaching).
Instant Application: 18/580554
Co-Pending Application: 18034626
1. A voice processing device configured to generate translation results for voices of speakers, comprising:

a microphone configured to generate voice signals associated with the voices of the speakers in response to the voices of the speakers; 

a memory configured to store position-language information representing languages corresponding to sound source positions of the voices of the speakers; and 

a processor configured to generate translation results obtained by translating the language of the voice of each of the speakers using the voice signal and the position- language information and generate translated minutes of meeting including a voice content of each of the speakers expressed in different languages using the translation results.

generate a separation voice signal associated with the voice pronounced at each sound source position from the voice signal;

determine current languages of the voices of the speakers using the position-language information stored in the memory;

generate the translation results obtained by translating the current languages of the voices of the speakers into the different languages using the separation voice signal and the determined current languages; and

generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language.
1. A mobile terminal comprising:

a microphone configured to generate voice signals in response to voices of speakers;

a processor configured to generate separated voice signals related to the respective voices by performing voice source separation of the voice signals based on respective voice source positions of the voices, and output translation results for the respective voices based on the separated voice signals; and

a memory configured to store source language information representing source languages that are pronounced languages of the voices of the speakers,

wherein the processor is configured to output the translation results in which the languages of the voices of the speakers have been translated from the source languages into target languages to be translated based on the source language information and the separated voice signals,wherein the processor is configured to:

determine the source languages corresponding to positions of the voices based on the source language information by comparing the respective voice source positions of the voices with position information included in the source language information stored in the memory, 

reading the determined source language information corresponding to positions of the voices, and output the translation results for the respective voices in accordance determined source languages represented by the read source language information.

See 2nd limitation of claim 1

See 3rd to last limitation of claim 1

See 3rd to last limitation of claim 1

See Shin

This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Claims 4 and 11 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 2 and 9 of 18/034626 in view of Shin in view of Nakadai in view of Asthana.
Please see the Mapping in the prior art sections for mapping.
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the instant application with the features as taught by claims 3 and 4 as taught by Nakadai in order to be able to address the difficulties when multiple speech pieces are present (see Nakadai [0006]-[0007]).
With respect to the motivation from Asthana please see mapping below for respective claim as same motivation applies.

Claims 1, 5-8 and 12-14 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 5 of copending Application No. 18015472 (reference application) in view of Shin (US 2021/0374362). The co-pending application teaches all of the limitations except for “generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language.”  Shin does teach this limitation in para [0094] (see claim 7 mapping for detailed explanation under prior art). Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the issued patent with generation of meeting minutes in the current languages of speakers as taught by Shin in order to enhance convenience and expandability and for minimizing erroneous expression of opinion by occurrence of mistranslation (see Shin [0125]).
 See Table below.
Each of the claims listed above from the instant application map to each of the claims of the copending application.
Instant Application (I) maps to Copending Application (P)
Claim 1 (I): Claim 5 (P); Claim 2 (I): Claim 5 (P); Claim 3 (I): Claim 5 (P); Claim 5 (I): Claim 5 (P); Claim 6 (I): Claim 1 (P) in view of Shin (see prior art mapping for Shin’s teaching); Claim 7 (I): Claim 1 (P)) in view of Shin (see prior art mapping for Shin’s teaching); Claim 8 (I): Claim 5 (P); Claim 9 (I): Claim 5 (P); Claim 12 (I): Claim 5 (P); Claim 13 (I): Claim 8 (P) in view of Shin (see prior art mapping for Shin’s teaching); Claim 14 (I): Claim 8 (P) in view of Shin (see prior art mapping for Shin’s teaching).
Instant Application: 18/580554
Co-Pending Application: 18015472
1. A voice processing device configured to generate translation results for voices of speakers, comprising:

a microphone configured to generate voice signals associated with the voices of the speakers in response to the voices of the speakers; 

a memory configured to store position-language information representing languages corresponding to sound source positions of the voices of the speakers; and 

a processor configured to generate translation results obtained by translating the language of the voice of each of the speakers using the voice signal and the position- language information and generate translated minutes of meeting including a voice content of each of the speakers expressed in different languages using the translation results

generate a separation voice signal associated with the voice pronounced at each sound source position from the voice signal;

determine current languages of the voices of the speakers using the position-language information stored in the memory;

generate the translation results obtained by translating the current languages of the voices of the speakers into the different languages using the separation voice signal and the determined current languages; and

generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language.
1. A voice processing device, comprising:

See Secondary reference

Claim 5

a processor configured to sound source isolation on voice signals associated with voices of speakers based on sound source positions of the respective voices;

 and

a memory,

wherein the processor is configured to: 

generate sound source position information indicating the sound source positions of the respective voices using the voice signals associated with the voices; 

generate isolated voice signals associated with the voices of the respective speakers from the voice signals based on the sound source position information; and

match the isolated voice signals and the voice sound source position information and store the matched isolated voice signals and voice sound source position information in the memory,

wherein the processor generates the sound source position information indicating the sound source positions of the respective voices using the voice signal and converts the sound source position information in the memory as reference sound source position information, in a position register mode, and

wherein the processor generates the sound source position information indicating the sound source positions of the respective voices using the voice signal and converts the sound source position information in the memory as reference sound source position information, in a position register mode, and 

stores the isolated voice signal associated with the voice corresponding to the sound source position within a reference range from the reference sound source position in the voice separation mode..

Claim 5: The voice processing device of claim 1, wherein the memory stores source language information indicating source language which are pronounced languages of the voices of the speakers, and the processor outputs a translation result in which the languages of the voices of the speakers are translated from the source language to target language, which are languages to be translated, based on the source language information and the isolated voice signal.

See claim 1

See claim 5

See claim 5

See Shin 

This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Claims 4 and 11 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 2, 9 of 18/015472 in view of Shin in view of Nakadai in view of Asthana.
Please see the Mapping in the prior art sections for mapping.
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the instant application with the features as taught by claims 3 and 4 as taught by Nakadai in order to be able to address the difficulties when multiple speech pieces are present (see Nakadai [0006]-[0007]).
With respect to the motivation from Asthana please see mapping below for respective claim as same motivation applies.

Claims 10 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 9 of 18/015472 in view of Shin in view of Buckler (US 2008/0218582).
Please see mapping below with respect to Buckley as well as motivation.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1, 4-8, and 10-14 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. The independent claims 1 and 8 relate to system and method thus relating to a statutory category. The claims further recite per claims 1 and 8 “[[a microphone]] configured to generate voice signals associated with the voices of the speakers in response to the voices of the speakers; [[a memory]] configured to store position-language information representing languages corresponding to sound source positions of the voices of the speakers; and [[a processor]] configured to generate translation results obtained by translating the language of the voice of each of the speakers using the voice signal and the position- language information and generate translated minutes of meeting including a voice content of each of the speakers expressed in different languages using the translation results determine the sound source positions of the voices of the speakers using the voice signals generated from the microphone and generate sound source position information representing the determined sound source positions; generate a separation voice signal associated with the voice pronounced at each sound source position from the voice signal; determine current languages of the voices of the speakers using the position- language information stored in the memory; and generate the translation results obtained by translating the current languages of the voices of the speakers into the different languages using the separation voice signal and the determined current languages; generate, for each of the different languages, a minutes of meeting file including a text converted from the translation result on the corresponding language.”
The limitation of claims 1 and 8 of “storing…”, “generating…”, “generating…”, “generating…”, “determining…”, “generating…”, “generating…”, “determining…”, “generating…”, and “generating…” as drafted covers mental activities. More specifically, a meeting coordinator already having information about various speakers’ position and primary  language of each speaker, hearing various sounds signals from various speakers in the meeting, meeting coordinator separating and determining locations from which the sound originated, determining in which language were they speaking and translating the different voices into a uniform voice. Then generating translation for each of the different voices heard and then creating a written document that captures the transcript of the meeting while translating to the appropriate speaker the specific document so they have it in their language.
This judicial exception is not integrated into a practical application. In particular, claims 1 and 8 recite the additional elements of “microphone” (claim 1 and 8), “processor” (claim 1) and “memory” (claim  1). For example, paragraph [0041] addresses the processor and memory, of the as filed specification, describes the CPU and storage as exemplary language which is interpreted to include all known CPU/Storage elements available. Furthermore, the microphones per the claims are being used as pre-solution activity of data gathering the speech from the various speakers.  Accordingly, these additional elements does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea as the CPU is being used as a mere tool on which the abstract idea is applied. The claim is directed to an abstract idea.
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional element of using a computer is noted as a general computer as noted. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Further, the additional limitation in the claims noted above are directed towards insignificant solution activity. The claims are not patent eligible.
With respect to claim 4 and 11, the claims relate to “wherein the processor is configured to: determine different languages into which the current language of the voice of each of the speakers is translated using the position-language information stored in the memory; and generate a translation result obtained by translating the current language of the voice of the speaker into different languages according to the determined current language and different languages”. This relates to a meeting coordinator determining the different languages and which language to translate into based on the position of each speaker and generating a translation result. No additional limitations are present. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 5 and 12, the claim relates to “generate first sound source position information representing a sound source position of a voice of a first speaker among the speakers using the voice signals associated with the voices of the speakers; generate a first separation voice signal associated with the voice of the first speaker using the voice signals and the first sound source position information; determine a language of the voice of the first speaker corresponding to the first sound source position information with reference to the position-language information stored in the memory; determine languages of the voices of the remaining speakers except for the first speaker among the speakers with reference to the position-language information stored in the memory; and generate translation results obtained by translating the language of the voice of the first speaker into the languages of the voices of the remaining speakers using the first separation voice signal”. Similar reasoning as provided in claims 2-3 applies here but with focus on one speaker relative to other speakers. No additional limitation is present. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 6-7 and 13-14, the claim relates to “generates original minutes of meeting including a voice content of each of the speakers expressed in the current languages of the voices of the speakers using the separation voice signal” and “generates the translated minutes of meeting, converts the translation results into texts, and records text data in the translated minutes of meeting”. This relates to a meeting coordinator separating each of the speaker’s voices and writing the respective speech down on a piece of paper for the meeting. No additional limitations are present. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
With respect to claim 10, the claim relates to “wherein the microphone includes a plurality of microphones disposed to form an array, and the determining of the sound source positions of the speakers includes determining the sound source position based on a time delay between a plurality of voice signals generated from the plurality of microphones.” This relates to computing time difference or delay based on the difference in captured audio of the plural mics. The additional element on of “microphone array” is used as pre-solution activity for gathering the sound. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
These claims further do not remedy the judicial exception being integrated into a practical application and further fail to include additional elements that are sufficient to amount to significantly more than the judicial exception.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1, 4-8, 10-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1 and 8 recites the limitation "the corresponding language" in the last limitation of the claim.  There is insufficient antecedent basis for this limitation in the claim.
Dependent claims are also rejected based on their dependency to indefinite independent claims 1 and 8.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4-8, and 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Nakadai (US 2015/0154957) in view of Asthana (US 2007/0143103) in view of Shin (US 2021/0374362).
As to claim 1 and 8, Nakadai teaches a voice processing device configured to generate translation results for voices of speakers, comprising: 
a microphone configured to generate voice signals associated with the voices of the speakers in response to the voices of the speakers (see [0060]-[0062], where N microphones are positioned and where each microphone records sounds to acquire N signals from plurality of speakers); 
a memory (see [0227]-[0228], describes various memory and storage) configured to store position-language information representing languages corresponding to sound source positions of the voices of the speakers (see [0136]-[0139], where variety of information such as azimuth, and language information for each speaker is stored for further processing); and 
a processor (see [0227], computer system) configured to generate translation results obtained by translating the language of the voice of each of the speakers using the voice signal and the position- language information and generate translated minutes of meeting [[including a voice content of each of the speakers expressed in different languages using the translation results]] (see [0139]-[0140], where translation of the speech is performed based on the direction, azimuth, speakers, recognition data, and language),
wherein the processor is configured to:
determine the sound source positions of the voices of the speakers using the voice signals generated from the microphone and generate sound source position information representing the determined sound source positions (see [0134]-[0135], where sound source localizing unit estimates an azimuth of a sound source on the basis of an input signal and estimates for each sound signals of N channels); 
generate a separation voice signal associated with the voice pronounced at each sound source position from the voice signal (see [0136]-[0137], where sound source separating unit separates the sound signals of N channels into sound signals using GHDSS based on the sound localizing information); 
determine current languages of the voices of the speakers using the position- language information stored in the memory (see [0138], where language information detecting unit detects language of each speaker for each sound signal); and 
generate the translation results obtained by translating the current languages of the voices of the speakers into the different languages using the separation voice signal and the determined current languages (see [0139]-[0140], where translation unit translated the speech details and is displayed.
However, Nakadai does not specifically teach generation of minutes of meeting including voice content of each of the speakers.
Asthana does teach generate translated minutes of meeting including a voice content of each of the speakers expressed in different languages using the translation results (see [0067], where spoken information or text is translated and provided to the different users based on their language preference and see [0083], where transcript of the conference is maintained)
generate the translation results obtained by translating the current languages of the voices of the speakers into the different languages using the separation voice signal and the determined current languages  (see [0067], where spoken information or text is translated and provided to the different users based on their language preference in audio or text and see [0083], where transcript of the conference is maintained).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the conversation system as taught by Nakadai with the translation into voice as taught by Asthana in order to address the ineffectiveness of other systems regarding communication issues between speakers having different accents and speaking in different languages (see Asthana [0005]).
However, Nakadai in view of Asthana do not specifically teach generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language.
Shin does teach generate, for each of the different languages, a minutes-of-meeting file including a text converted from the translation result expressed in the corresponding language (see [0075], where original text refers to an original text input in a language used by each corresponding conference participant and see [0094] where all three types of text of the conversation original text, translation texts and reference text provided and see Figure 6 which shows which one is the original based on the shading and also see [0093], where the processor 212  may display the reference text 630 such that the conference participants may refer to a translation in a third language in addition to main conference languages that is Korean, Japanese)
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the conversation system as taught by Nakadai in view of Asthana with generation of meeting minutes in the corresponding language as taught by Shin in order to enhance convenience and expandability and for minimizing erroneous expression of opinion by occurrence of mistranslation (see Shin [0125]).
As to claim 8, apparatus claim 1 and method claim 8 are related as apparatus and the method of using same, with each claimed element's function corresponding to the claimed method step. Accordingly claim 8  is similarly rejected under the same rationale as applied above with respect to apparatus claim.

As to claim 4 and 11, Nakadai in view of Asthana in view of Shin teach all of the limitations as in claim 2 and 9. 
Furthermore, Nakadai teaches wherein the processor is configured to: determine different languages into which the current language of the voice of each of the speakers is translated using the position-language information stored in the memory (see [0138]-[0139], where the speech recognition unit performs speech processing based on the language for each speaker and then azimuth information and where language is detected of each speaker and information indicating detected language for each speaker based on information from source speaker separating unit where the sound source separating unit receives the azimuth information from the sound source localizing unit); and 
generate a translation result obtained by translating the current language of the voice of the speaker into different languages according to the determined current language and different languages (see [0140], where translation of the speech details is performed from the speech recognition unit based on information of the speakers, and information regarding the language of each speaker).
Furthermore, Asthana teaches generate a translation result obtained by translating the current language of the voice of the speaker into different languages according to the determined current language and different languages(see [0067], where spoken information or text is translated and provided to the different users based on their language preference).

As to claim 5 and 12, Nakadai in view of Asthana in view of Shin teach all of the limitations as in claim 2 and 11. 
Furthermore, Nakadai teaches: generate first sound source position information representing a sound source position of a voice of a first speaker among the speakers using the voice signals associated with the voices of the speakers (see [0134]-[0135], where sound source localizing unit estimates an azimuth of a sound source on the basis of an input signal and estimates for each sound signals of N channels, where the first position can be the azimuth estimate for one of the N channels);
generate a first separation voice signal associated with the voice of the first speaker using the voice signals and the first sound source position information (see [0136]-[0137], where sound source separating unit separates the sound signals of N channels into sound signals using GHDSS based on the sound localizing information , where one of the N separated sound signals can be the separate voice signal); 
determine a language of the voice of the first speaker corresponding to the first sound source position information with reference to the position-language information stored in the memory (see [0138], where language information detecting unit detects language of each speaker for each sound signal, where language of the first speaker corresponding to the first sound source position can be a language that was detected for one of the speakers); 
determine languages of the voices of the remaining speakers except for the first speaker among the speakers with reference to the position-language information stored in the memory (see [0138], where language information detecting unit detects language of each speaker for each sound signal, where the other languages remaining apart from one of the detected speaker language is interpreted to be the languages of the remaining speakers); and 
Furthermore, Asthana teaches generate translation results obtained by translating the language of the voice of the first speaker into the languages of the voices of the remaining speakers using the first separation voice signal (see [0067], where when one speaker speaks or enters text for the conference and this is then translated selectively based on language understandable by the user of the endpoint.). 

With respect to claim 6 and 13, Nakadai in view of Asthana in view of Shin teach all of the limitations as in claim 2 and 9.
The Examiner notes both Nakadai and Asthana Shin teach the translation of speech into text to generate meeting minutes, where Nakadai described how such is done based on separating speaker signals.
Furthermore, Shin does teach wherein the processor generates original minutes of meeting including a voice content of each of the speakers expressed in the current languages of the voices of the speakers using the separation voice signal (see [0075], where original text refers to an original text input in a language used by each corresponding conference participant and see [0094] where all three types of text of the conversation original text, translation texts and reference text provided and see Figure 6 which shows which one is the original based on the shading).

With respect to claim 7 and 14, Nakadai in view of Asthana in view if Shin teach all of the limitations as in claim 1 and 8.
The Examiner notes both Nakadai and Asthana teach the translation of speech into text to generate meeting minutes, where Nakadai described how such is done based on separating speaker signals. 
Furthermore, Shin does teach wherein the processor generates the translated minutes of meeting, converts the translation results into texts, and records text data in the translated minutes of meeting (see [0075], where original text refers to an original text input in a language used by each corresponding conference participant and see [0094] where all three types of text of the conversation original text, translation texts and reference text provided and see Figure 6 which shows which one is the original based on the shading)

Claim(s) 10 is rejected under 35 U.S.C. 103 as being unpatentable over Nakadai in view of Asthana in view of Shin as applied to claim 9 above, and further in view of Buckler (US 2008/0218582).
With respect to claim 10, Nakadai in view of Asthana in view of Shin teach all of the limitations as in claim 9.
However, Nakadai in view of Asthana in view of Shin do not specifically teach wherein the microphone includes a plurality of microphones disposed to form an array, and the determining of the sound source positions of the speakers includes determining the sound source position based on a time delay between a plurality of voice signals generated from the plurality of microphones.
Buckler does teach wherein the microphone includes a plurality of microphones disposed to form an array (see [0043], where array of microphone 113 Is used in a conference setting), and the determining of the sound source positions of the speakers includes determining the sound source position based on a time delay between a plurality of voice signals generated from the plurality of microphones (see [0088], delay estimation can be used to determine location of a user based plural microphones 113).
Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed inventions to have modified the conversation system as taught by Nakadai in view of Asthana in view of Shin with time delay for position estimation as taught by Buckler in order to be able to choose a camera and its viewing direction based on the person or persons speaking (see Buckler [0018]).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PARAS D SHAH whose telephone number is (571)270-1650. The examiner can normally be reached Monday-Thursday 7:30AM-2:30PM, 5PM-7PM (EST), Friday 8AM-noon (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                                        
01/28/2026
Read full office action
Prosecution Timeline

Jan 18, 2024
Application Filed
Aug 25, 2025
Non-Final Rejection — §101, §103, §112
Nov 28, 2025
Response Filed
Jan 29, 2026
Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/011,128
Patent 12586591
SOUND SIGNAL DECODING METHOD, SOUND SIGNAL DECODER, PROGRAM, AND RECORDING MEDIUM
2y 5m to grant Granted Mar 24, 2026
18/388,726
Patent 12579367
TWO-TOWER NEURAL NETWORK FOR CONTENT-AUDIENCE RELATIONSHIP PREDICTION
2y 5m to grant Granted Mar 17, 2026
18/457,460
Patent 12579360
LEARNING SUPPORT APPARATUS FOR CREATING MULTIPLE-CHOICE QUIZ
2y 5m to grant Granted Mar 17, 2026
18/064,669
Patent 12562173
WEARABLE DEVICE CONTROL BASED ON VOICE COMMAND OF VERIFIED USER
2y 5m to grant Granted Feb 24, 2026
18/075,716
Patent 12559026
VEHICLE AND CONTROL METHOD THEREOF
2y 5m to grant Granted Feb 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
74%
Grant Probability
99%
With Interview (+31.1%)
3y 9m
Median Time to Grant
Moderate
PTA Risk
Based on 645 resolved cases by this examiner. Grant probability derived from career allow rate.