Last updated: May 29, 2026
Application No. 18/022,255
VOICE PROCESSING DEVICE FOR PROCESSING VOICE SIGNAL AND VOICE PROCESSING SYSTEM COMPRISING SAME

Final Rejection §103
Filed
Feb 20, 2023
Priority
Aug 19, 2020 — RE 10-2020-0103909 +2 more
Examiner
SOLAIMAN, FOUZIA HYE
Art Unit
2653
Tech Center
2600 — Communications
Assignee
Amosense Co. Ltd.
OA Round
4 (Final)
Interview Optional

— +52.5% interview lift. Examiner has a relatively high allowance rate (68%); +52.5% interview lift. A written response may suffice.
Based on 68 resolved cases, 2023–2026
Examiner Intelligence

SOLAIMAN, FOUZIA HYE View full profile →
Grants 68% — above average
Career Allowance Rate
46 granted / 68 resolved
+5.6% vs TC avg
Strong +52% interview lift
Without
With
+52.5%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
8 currently pending
Career history
Total Applications
across all art units
Statute-Specific Performance

§101
7.4%
-32.6% vs TC avg
§103
91.2%
+51.2% vs TC avg
§102
0.7%
-39.3% vs TC avg
Black line = Tech Center average estimate • Based on career data from 68 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Applicant’s Arguments and Amendments
This communication is in response to continuation  filed on 04/03/2026. Applicant filed an amendment on 4/3/2026., amending independent claims 1. Claims 2-4, 6, and 8-15, are cancelled.  The pending claims are 1, 5, and 7.
Applicant's arguments have been fully considered but they are moot because examiner used new prior art for amended claim limitation.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1, and 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zunn Choi,  KR 101989127 B1 {IDS provided} in view of PAEK MIN HO, KR 20200012104 A  {IDS provided}, and further view of  Ekkizogloy et al. US 20180332389 A1
Regarding Claim 1, Zunn Choi teaches:
[Claim 1] A voice processing device comprising: 
a voice data receiving circuit configured to receive input voice data associated with voices of speakers;  Zunn Choi teaches (“The translator 10 includes a microphone module 11 and a speaker 12. The translator 10 determines a speaker's direction according to an embodiment of the present invention, and converts a voice signal received from the speaker's direction among input voice signals. …” page 2, 2nd. Para from bottom.) by Zunn Choi,  KR 101989127 B1 {IDS provided} 
wherein the input voice data is generated from voice signals generated by a plurality of microphones.  Zunn Choi teaches  (“And the microphone module is a microphone module including a plurality of individual directional microphones.”) (“Referring to FIG. 6B, the processor 22 of the translation apparatus 10 obtains an input voice signal from the microphone module 11. In this case, the input voice signal may be a voice signal obtained from individual microphones of the microphone module 11.”) by Zunn Choi,  KR 101989127 B1 {IDS provided}
  a voice data output circuit configured to output voice data associated with the voices of the speakers; Zunn Choi teaches (“The translator 10 includes a microphone module 11 and a speaker 12. The translator 10 determines a speaker's direction according to an embodiment of the present invention, and converts a voice signal received from the speaker's direction among input voice signals. By extracting to generate the translation target data, and to obtain the translation data for the translation target data to output the output voice signal to the speaker. The detailed operation of the translation apparatus 10 will be described later.”) page 2, 2nd. Para from bottom.) (“The translation apparatus 10 may include a microphone module 11, a speaker 12, a memory 21, a processor 22, a communication module 23, and an input / output interface 24. …” page 3, para 6) by Zunn Choi,  KR 101989127 B1 {IDS provided}
and a processor configured to generate a control command for outputting the output voice data, wherein the processor is further configured to: Zunn Choi teaches (“The translator 10 includes a microphone module 11 and a speaker 12. The translator 10 determines a speaker's direction according to an embodiment of the present invention, and converts a voice signal received from the speaker's direction among input voice signals. By extracting to generate the translation target data, and to obtain the translation data for the translation target data to output the output voice signal to the speaker. The detailed operation of the translation apparatus 10 will be described later.” Page 2, 2nd paragraph from bottom.) (“The processor 22 may be configured to process instructions of a computer program by performing basic arithmetic, logic, and input / output operations. Instructions may be provided to the processor 22 by the memory 21 or the communication module 23. For example, the processor 22 may be configured to execute a command received according to a program code stored in a recording device such as the memory 21.” Page 3, 3rd. paragraph from bottom.) 
   generate plural speakers position data Zunn Choi teaches  (“Abstract  The present invention relates to a translation apparatus, a translation method, and a computer program for a translation method. More particularly, after determining the direction of a plurality of speakers from an input speech signal, the speech is extracted by considering each speaker's direction and translated. The ability to output a signal relates to an apparatus, a method and a program.” Page 2 first para.)  (“Referring to FIG. 4, the direction determiner 311 first analyzes an input voice signal obtained from the microphone module 11 to determine a direction in which a speaker exists (S41). In this case, the input voice signal for determining the direction in which the speaker exists may be a signal of an initial setting period by the user's setting or the internal setting of the translation apparatus 10 among the input voice signals collected by the microphone module 11. More specifically, the direction determiner 311 determines a first direction in which the first speaker exists by analyzing the voice signal of the initial setting section acquired from the microphone module 11, and a second direction in which the second speaker exists. Can be determined. In this case, the first direction and the second direction may be determined using a known sound source location algorithm. The direction determiner 311 may determine the direction in which the speaker exists using the sound source location algorithm from the directivity information of the individual microphones included in the microphone module 11 and the voice signals acquired by the individual microphones. In this case, the direction in which the speaker exists may be a direction in which dB (decibels) of the input voice signal becomes a maximum value. In addition, the first direction and the second direction may be determined based on the translation apparatus 10.”) (“Next, the speaker determiner 312 determines whether the input direction of the input voice signal is within the first direction and the error range or within the second direction and the error range to determine the speaker of the input voice signal. At this time, the input direction is within the error range with the first direction or the second direction, the angle (for example, the direction derived from the sound source location algorithm) of the current input voice signal is a predetermined angle from the first direction or the second direction. It can mean mine. That is, the speaker determiner 312 of the present invention determines whether the input voice signal currently collected and recorded by the first speaker is based on the first direction which is the direction of the first speaker and the second direction which is the direction of the second speaker. Or by the second speaker.”) by Zunn Choi,  KR 101989127 B1 {IDS provided}
Fig. 4-5, Zunn Choi,  teaches (“At this time, the input direction is within the error range with the first direction or the second direction, the angle (for example, the direction derived from the sound source location algorithm) of the current input voice signal is a predetermined angle from the first direction or the second direction. It can mean mine. That is, the speaker determiner 312 of the present invention determines whether the input voice signal currently collected and recorded by the first speaker is based on the first direction which is the direction of the first speaker and the second direction which is the direction of the second speaker. Or by the second speaker.” Page 6, last para and page 7, first para) (“… Similarly, when the speaker determiner 312 determines that the speaker of the input voice signal is the second speaker, the translation target data generator 313 extracts the voice signal received in the second direction based on the input voice signal, The second translation target data is generated from the voice signal received in the second direction.” Page 7, 2nd para.) by Zunn Choi,  KR 101989127 B1 {IDS provided} 
first output voice data associated with a voice of the first speaker by using the input voice data, Zunn Choi teaches  (“… . It can mean mine. That is, the speaker determiner 312 of the present invention determines whether the input voice signal currently collected and recorded by the first speaker is based on the first direction which is the direction of the first speaker and the second direction which is the direction of the second speaker. Or by the second speaker.”) (“Next, the translation data acquisition unit 314 may obtain the translation data for the translation target data and output the output voice signal to the speaker. In more detail, the translation data acquisition unit 314 obtains the first translation data for the first translation target data and outputs an output speech signal to the speaker, or obtains the first translation data for the second translation target data and outputs the output speech. The signal can be output to the speaker. In this case, the first translation data may be a translation of the first translation target data into a language of the second translation target data, and the second translation data may be a translation of the second translation target data into a language of the first translation target data.”)  by Zunn Choi,  KR 101989127 B1 {IDS provided}

memory  configured to store position data and Zunn Choi teaches the memory 21 may store program codes and settings for controlling the translation apparatus 10 and input voice signals temporarily or permanently.” Page 3, para 4 from bottom page)  (“… the speaker determiner 312 of the present invention determines whether the input voice signal currently collected and recorded by the first speaker is based on the first direction which is the direction of the first speaker and the second direction which is the direction of the second speaker. Or by the second speaker. (“Next, the translation target data generation unit 313 extracts the voice signal received in the direction in which the speaker exists (first direction or second direction) based on the input voice signal, and from the voice signal received in the direction. The target data for translation is generated (S42). At this time, when the speaker determiner 312 determines that the speaker of the input voice signal is the first speaker, the translation target data generator 313 extracts the voice signal received in the first direction based on the input voice signal, The first translation target data is generated from the voice signal received in the first direction. Similarly, when the speaker determiner 312 determines that the speaker of the input voice signal is the second speaker, the translation target data generator 313 extracts the voice signal received in the second direction based on the input voice signal, The second translation target data is generated from the voice signal received in the second direction.” Page 7, 2nd. paragraph) (“In addition, the translation target data generated by the translation target data generation unit 313 may include not only a voice signal but also speaker information corresponding to the voice signal, direction information of the speaker, or speaker information. Accordingly, the user terminal 110 or the server 150 may determine the language and the translation language of the translation target data based on the speaker information or the direction information without having to recognize the language of the translation target data each time the acquisition target data is acquired. Translation time can be reduced.” Page 7, 3rd. paragraph) (“The translation apparatus 10 may include a microphone module 11, a speaker 12, a memory 21, a processor 22, a communication module 23, and an input / output interface 24. The memory 21 is a computer-readable recording medium, and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. In addition, the memory 21 may store program codes and settings for controlling the translation apparatus 10 and input voice signals temporarily or permanently.” Page 3, 3rd para from bottom.) (“In this case, the first direction and the second direction may be determined using a known sound source location algorithm. The direction determiner 311 may determine the direction in which the speaker exists using the sound source location algorithm from the directivity information of the individual microphones included in the microphone module 11 and the voice signals acquired by the individual microphones. In this case, the direction in which the speaker exists may be a direction in which dB (decibels) of the input voice signal becomes a maximum value. In addition, the first direction and the second direction may be determined based on the translation apparatus 10. …” page 6, 2nd para, line 13-6, from bottom page ) (“Next, the speaker determiner 312 determines whether the input direction of the input voice signal is within the first direction and the error range or within the second direction and the error range to determine the speaker of the input voice signal. At this time, the input direction is within the error range with the first direction or the second direction, the angle (for example, the direction derived from the sound source location algorithm) of the current input voice signal is a predetermined angle from the first direction or the second direction. It can mean mine. That is, the speaker determiner 312 of the present invention determines whether the input voice signal currently collected and recorded by the first speaker is based on the first direction which is the direction of the first speaker and the second direction which is the direction of the second speaker. Or by the second speaker.” Page 6, last paragraph)  by Zunn Choi,  KR 101989127 B1 {IDS provided}

second output voice data associated with a voice of the second speaker by using the input voice data,  Zunn Choi teaches  (“… . It can mean mine. That is, the speaker determiner 312 of the present invention determines whether the input voice signal currently collected and recorded by the first speaker is based on the first direction which is the direction of the first speaker and the second direction which is the direction of the second speaker. Or by the second speaker.”) (“Next, the translation data acquisition unit 314 may obtain the translation data for the translation target data and output the output voice signal to the speaker. In more detail, the translation data acquisition unit 314 obtains the first translation data for the first translation target data and outputs an output speech signal to the speaker, or obtains the first translation data for the second translation target data and outputs the output speech. The signal can be output to the speaker. In this case, the first translation data may be a translation of the first translation target data into a language of the second translation target data, and the second translation data may be a translation of the second translation target data into a language of the first translation target data.”) by Zunn Choi,  KR 101989127 B1 {IDS provided}
 determine first position data corresponding to the first speaker position data among stored position data,  Zunn Choi,  teaches , the direction derived from the sound source location algorithm. (“At this time, the input direction is within the error range with the first direction or the second direction, the angle (for example, the direction derived from the sound source location algorithm) of the current input voice signal is a predetermined angle from the first direction or the second direction. It can mean mine. That is, the speaker determiner 312 of the present invention determines whether the input voice signal currently collected and recorded by the first speaker is based on the first direction which is the direction of the first speaker and the second direction which is the direction of the second speaker. Or by the second speaker.” Page 6, last para and page 7, first para) Zunn Choi teaches (“In this case, the first direction and the second direction may be determined using a known sound source location algorithm. The direction determiner 311 may determine the direction in which the speaker exists using the sound source location algorithm from the directivity information of the individual microphones included in the microphone module 11 and the voice signals acquired by the individual microphones. In this case, the direction in which the speaker exists may be a direction in which dB (decibels) of the input voice signal becomes a maximum value. In addition, the first direction and the second direction may be determined based on the translation apparatus 10. …” page 6, 2nd para, line 13-6, from bottom page ) (“Next, the speaker determiner 312 determines whether the input direction of the input voice signal is within the first direction and the error range or within the second direction and the error range to determine the speaker of the input voice signal. At this time, the input direction is within the error range with the first direction or the second direction, the angle (for example, the direction derived from the sound source location algorithm) of the current input voice signal is a predetermined angle from the first direction or the second direction. It can mean mine. That is, the speaker determiner 312 of the present invention determines whether the input voice signal currently collected and recorded by the first speaker is based on the first direction which is the direction of the first speaker and the second direction which is the direction of the second speaker. Or by the second speaker.” Page 6, last paragraph)  by Zunn Choi,  KR 101989127 B1 {IDS provided}  

determine second position data corresponding to the second speaker position data among stored position data,  Zunn Choi teaches the direction derived from the sound source location algorithm.  (“At this time, the input direction is within the error range with the first direction or the second direction, the angle (for example, the direction derived from the sound source location algorithm) of the current input voice signal is a predetermined angle from the first direction or the second direction. It can mean mine. That is, the speaker determiner 312 of the present invention determines whether the input voice signal currently collected and recorded by the first speaker is based on the first direction which is the direction of the first speaker and the second direction which is the direction of the second speaker. Or by the second speaker.” Page 6, last para and page 7, first para)

Zunn Choi does not explicitly teach determine a target language data of the first speaker based on the source language data of speakers including the second speaker and excluding the first speaker.
PAEK MIN HO  teaches: 
determine a first source language data matched and stored with the first position data among the source language data, PAEK MIN HO  teaches (“The second header analyzer 132 analyzes the ID data to extract and match the corresponding language information. In other words, by receiving data for various languages propagated from the translation server 200, only the head portion of the transmission data sequence is quickly analyzed to easily extract the corresponding language information of the ID data contained in the head portion. By doing so, the translator can quickly interpret the desired language without recognizing the voice itself and comparing it with a pre-stored language.”)  (“ID definition step (S340) is a step in which the translation server 200 defines the ID, after determining the communication method by identifying the earset 100 as in the previous step, and receives the data transmitted from the earset 100 According to the ID data stored in each earset 100, the data is generated in each language to prepare for transmission. Therefore, when a signal carrying voice data is transmitted from any one of the earsets 100, the language-specific data of all the ID data determined in the ID definition step S340 is generated and transmitted to the earset 100 in real time.”) (“The reception ID analysis step (S520) is a step of analyzing ID data to extract corresponding language information and selecting ID data and voice data matching the ID set in the earset 100. In other words, after receiving data signals for each language transmitted simultaneously, only ID data is analyzed quickly from each data signal to select a data signal including language ID data required by the corresponding earset 100.”) by PAEK MIN HO, KR 20200012104 A  

 determine a second source language data matched and stored with the second position data among the source language data, and PAEK MIN HO  teaches (“The second header analyzer 132 analyzes the ID data to extract and match the corresponding language information. In other words, by receiving data for various languages propagated from the translation server 200, only the head portion of the transmission data sequence is quickly analyzed to easily extract the corresponding language information of the ID data contained in the head portion. By doing so, the translator can quickly interpret the desired language without recognizing the voice itself and comparing it with a pre-stored language.”)  (“ID definition step (S340) is a step in which the translation server 200 defines the ID, after determining the communication method by identifying the earset 100 as in the previous step, and receives the data transmitted from the earset 100 According to the ID data stored in each earset 100, the data is generated in each language to prepare for transmission. Therefore, when a signal carrying voice data is transmitted from any one of the earsets 100, the language-specific data of all the ID data determined in the ID definition step S340 is generated and transmitted to the earset 100 in real time.”) (“The reception ID analysis step (S520) is a step of analyzing ID data to extract corresponding language information and selecting ID data and voice data matching the ID set in the earset 100. In other words, after receiving data signals for each language transmitted simultaneously, only ID data is analyzed quickly from each data signal to select a data signal including language ID data required by the corresponding earset 100.”)  by PAEK MIN HO, KR 20200012104 A  
    determine a target language data of the first speaker based on the source language data of speakers including the second speaker and excluding the first speaker, and PAEK MIN HO teaches the translated Korean is delivered to a user who speaks Korean, and the user hears the contents and answers the language in Korean. The Korean-speaking user's pendant recognizes the Korean language and delivers Korean to the English-speaking user's pendant. The pendant is translated into English and output to the user.  (“As shown in Figure 1, the real-time multi-party interpretation wireless earset according to an embodiment of the present invention, the interpretation in real time using the earset. That is, through the earset, it is possible to speak and listen at the same time. In addition, the earset can be connectedto one-to-one, one-to-many or many-to-many at the same time, so that several people can talk indifferent languages at the same time. …” page 3, lines 16-21) (“The encoding step S430 is a step of converting and combining the voice transmitted in the voice input step S410 and the signal analyzed in the transmission ID analysis step S420 into data. As described above, the data signal sequence transmitted and received by the earset 100 to the translation server200 is composed of ID data of the head unit and voice data of the voice unit so that the data signal corresponds to any language without directly recognizing the voice later. …“ PAGE 7, last paragraph) (“That is, the earset including the microphone and the earphone for recognizing the user's voice is composed of one-to-one, one-to-many or many-to-many, and the voice recognized by the microphone is composed of ID data and language data including language information. And a receiving end for encoding and a decoding end for decoding ID data and voice data voice data received from the transmitting end. The apparatus may further include a translation server 200 which receives the ID data and the voice data transmitted from the transmitter and transmits the ID data and the voice data to the receiver of the corresponding tire set for each of a plurality of predefined languages.” Page 4, lines 3-8)  (“The translated Korean is delivered to a user who speaks Korean, and the user hears the contents and answers the language in Korean. The Korean-speaking user's pendant recognizes the Korean language and delivers Korean to the English-speaking user's pendant. The pendant is translated into English and output to the user.” page 4,  Lines 25-28) by PAEK MIN HO et al. KR 20200012104 A
transmit, to the voice data output circuit, the control command for outputting the first output voice data to a translation environment for translating   first source language into the  target language corresponding the target language data  PAEK MIN HO teaches (“The present invention relates to a real-time multi-interpretation wireless earset and method, and more particularly, the real-time multi-interpretation wireless earset transmits and receives the data combined with the ID signal for the corresponding language information through the translation server translation in real time”) (“That is, the second earset 112 transmits the identification ID (language) and voice data received from the first earset 111 to the translation server 200, and the TTS which is a value processed by the translation server 200. The data is received and output as voice data.”) (“ID definition step (S340) is a step in which the translation server 200 defines the ID, after determining the communication method by identifying the earset 100 as in the previous step, and receives the data transmitted from the earset 100 According to the ID data stored in each earset 100, the data is generated in each language to prepare for transmission. Therefore, when a signal carrying voice data is transmitted from any one of the earsets 100, the language-specific data of all the ID data determined in the ID definition step S340 is generated and transmitted to the earset 100 in real time.”)  by PAEK MIN HO, KR 20200012104
PAEK MIN HO is considered to be analogous to the claimed invention because it relates to a real-time multi-interpretation wireless earset and method, and more particularly, the real-time multi-interpretation wireless earset transmits and receives the data combined with the ID signal for the corresponding language information through the translation server translation in real time.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Zunn Choi to incorporate the teachings of PAEK MIN HO in order to add feature determine a first and second source language data matched and stored with the first position data among the source language data.
One could have been motivated to do so because the system can conduct useful conversation. (“… here is a problem that the cost of text to use. In addition, although using the smart phone to use the interpretation service, this also is useful for the conversation through meetings, ….”)  by PAEK MIN HO, KR 20200012104 A  

The combination does not teach generate first/ second speaker position data representing a position of a first/second speaker among the speakers based on a distance between the plurality of microphones and times when the voice signals are received by the plurality of microphones. 
generate first speaker position data representing a position of a first speaker among the speakers based on a distance between the plurality of microphones and times when the voice signals are received by the plurality of microphones. 
Ekkizogloy teaches (“[0029] In a multi-passenger vehicle there may be multiple sources of audio at any one time.”) (“[0032] FIG. 2 shows a simplified diagram 200 of an array of microphones M1-M3 placed around an audio source 210, according to certain embodiments. Using multiple microphones disposed at different positions from an audio source can both improve the fidelity of the recording and provide audio location capabilities using audio phase and/or timing analysis, as further discussed below. Referring to FIG. 2, audio source 210 emits an audio signal 220, which is picked up by microphones M1-M3. Microphone M1 is at a distance L1 from audio source 210, microphone M2 is at a distance L2 from audio source 210, and microphone M3 is at a distance L3 from audio source 210. Each microphone M1-M3 can be disposed at a different position relative to audio source 210. As shown in the example portrayed in FIG. 2, M1 is the closest and M2 is the farthest way from audio source 210. Each microphone M1-M3 may receive audio signal 220 (i.e., audio data) at a different time depending on their relative position with respect to audio source 210. These time differences can be calculated (i.e., as phase differences), and used to determine the location of audio source 210, such as by trilateration, as would be understood by one of ordinary skill in the art.”)  (“[0034] FIG. 3 shows a graph 300 of audio recordings for a multi-microphone array, according to certain embodiments. Returning to a simple three-microphone example, graph 300 depicts amplitude vs. time for the audio data received by each of microphones M1-M3, as shown in FIG. 2. Microphone M1 receives audio signal 220 at time t1, microphone M2 receives audio signal 220 at time t2, and microphone M3 receives audio signal 220 at time t3. As mentioned above, the time deltas between the received signals (e.g., ΔM1-M2, ΔM1-M3, ΔM2-M3) can be used to determine a location of audio source 210 using conventional trilateration techniques including time difference of arrival (TDOA), cross-correlation functions between audio signals, and geometric principles, as would be understood by one of ordinary skill in the art.”) by Ekkizogloy et al. US 20180332389 A1

Ekkizogloy further  teaches:
generate second speaker position data representing a position of a second speaker among the speakers based on the distance between the plurality of microphones and the times when the voice signals are received by the plurality of microphones, Ekkizogloy teaches (“[0029] In a multi-passenger vehicle there may be multiple sources of audio at any one time.”) (“[0032] FIG. 2 shows a simplified diagram 200 of an array of microphones M1-M3 placed around an audio source 210, according to certain embodiments. Using multiple microphones disposed at different positions from an audio source can both improve the fidelity of the recording and provide audio location capabilities using audio phase and/or timing analysis, as further discussed below. Referring to FIG. 2, audio source 210 emits an audio signal 220, which is picked up by microphones M1-M3. Microphone M1 is at a distance L1 from audio source 210, microphone M2 is at a distance L2 from audio source 210, and microphone M3 is at a distance L3 from audio source 210. Each microphone M1-M3 can be disposed at a different position relative to audio source 210. As shown in the example portrayed in FIG. 2, M1 is the closest and M2 is the farthest way from audio source 210. Each microphone M1-M3 may receive audio signal 220 (i.e., audio data) at a different time depending on their relative position with respect to audio source 210. These time differences can be calculated (i.e., as phase differences), and used to determine the location of audio source 210, such as by trilateration, as would be understood by one of ordinary skill in the art.”)  (“[0034] FIG. 3 shows a graph 300 of audio recordings for a multi-microphone array, according to certain embodiments. Returning to a simple three-microphone example, graph 300 depicts amplitude vs. time for the audio data received by each of microphones M1-M3, as shown in FIG. 2. Microphone M1 receives audio signal 220 at time t1, microphone M2 receives audio signal 220 at time t2, and microphone M3 receives audio signal 220 at time t3. As mentioned above, the time deltas between the received signals (e.g., ΔM1-M2, ΔM1-M3, ΔM2-M3) can be used to determine a location of audio source 210 using conventional trilateration techniques including time difference of arrival (TDOA), cross-correlation functions between audio signals, and geometric principles, as would be understood by one of ordinary skill in the art.”) by Ekkizogloy et al. US 20180332389 A1

Ekkizogloy is considered to be analogous to the claimed invention because it relates to  vehicular systems, and in particular to systems and methods to detect and isolate audio in a vehicle using multiple microphones. 
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Zunn Choi and PAEK MIN HO to incorporate the teachings of Ekkizogloy in order to add the localization module which is configure to determine user location of the user relative to microphones.
One could have been motivated to do so because the system can improve audio reception. (“[0032] … Using multiple microphones disposed at different positions from an audio source can both improve the fidelity of the recording and provide audio location capabilities using audio phase and/or timing analysis,...”)  by Ekkizogloy et al. US 20180332389 A1
Regarding Claim 7, the combination teaches the device claim 1 as identified above. 
Zunn Choi further teaches:
7. (Previously Presented) The voice processing device of claim 1 wherein the processor is configured to: generate second output voice data associated with a voice of the second speaker by using the input voice data, and Zunn Choi,  teaches   (“That is, the translation apparatus 10 of the present invention may translate and output the voice of the first speaker into the language of the second speaker, and may translate and output the voice of the second speaker into the language of the first speaker. At this time, the voice of each speaker is automatically extracted in consideration of the direction, so that the translation device 10 does not need to exist near the speaker who is currently speaking like the existing microphone, so that the translation device 10 is located near the first and second speakers. The voice of the first speaker can be translated into the voice of the language of the second speaker and vice versa just by placing it therein.” page 7, last para)  
transmit, to the voice data output circuit, the control command for outputting the first output voice data to a translation environment for translating the second source language into the first source language.  Zunn Choi,  teaches  (“ The language recognition unit 321 recognizes a language of the translation target data based on the translation target data obtained from the translation apparatus 10. The character conversion unit 322 converts the translation target data, which is voice data, into characters based on the recognized language or based on a language corresponding to the speaker's direction information or speaker information. The translation unit 323 translates the translation target data converted into text into another language. The voice converter converts the text data translated into another language into voice data, generates the translation target data, and transmits the translation target data to the translation apparatus 10. For example, an application of the user terminal 110 may obtain translation target data from the translation apparatus 10 and transmit the translation target data to the server 150. The processor 222 of the server 150 recognizes the language of the data to be translated, converts the data to be translated into text data based on the recognized language, translates the translated text data into another language, and then generates translated data. The translation data may be transmitted to the user terminal 110. The user terminal 110 may transmit the obtained translation data to the translation apparatus 10.” page 6, paragraph 3.) (“The translated Korean is delivered to a user who speaks Korean, and the user hears the contents and answers the language in Korean. The Korean-speaking user's pendant recognizes the Korean language and delivers Korean to the English-speaking user's pendant. The pendant is translated into English and output to the user.” page 4,  Lines 25-28)  (“That is, the translation apparatus 10 of the present invention may translate and output the voice of the first speaker into the language of the second speaker, and may translate and output the voice of the second speaker into the language of the first speaker. At this time, the voice of each speaker is automatically extracted in consideration of the direction, so that the translation device 10 does not need to exist near the speaker who is currently speaking like the existing microphone, so that the translation device 10 is located near the first and second speakers. The voice of the first speaker can be translated into the voice of the language of the second speaker and vice versa just by placing it therein.” page 7, last para)  (“Referring to FIG. 5A, the translation apparatus 10 and the user terminal 110 are placed on the table 53, and the first speaker 51 is positioned in the first direction 51 ′ based on the translation apparatus 10. The second speaker 52 may be located in the second direction 52 ′. According to the present invention, the conversation between the speakers is translated with the translator 10 within 5m distance, preferably 1 to 2m distance from the speaker, without having to take the microphone near the speaker when collecting the speaker's voice. can do. The translator 10 extracts a voice signal in the first direction 51 ', converts it into a language of the second speaker 52, outputs it to the speaker 12, and extracts a voice signal in the second direction 52'. Can be converted into the language of the first speaker 51 and output to the speaker 12. In addition, a signal for controlling the translation apparatus 10 may be input with the user terminal 110 within a predetermined distance from the translation apparatus 10, or information indicating the current state of the translation apparatus 10 may be displayed.” Page 2nd. para) by Zunn Choi,  KR 101989127 B1 {IDS provided}


Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zunn Choi,  PAEK MIN HO and Ekkizogloy in view of NAKADAI et al.  US 20150154957 A1. 
Regarding Claim 5, the combination teaches the device claim 1 as identified above. 
Zunn Choi further teaches:

5. (Original) The voice processing device of  claim 1, wherein the processor is configured to convert the first output voice data associated with the voice of the first speaker into text data that is expressed in the first source language, and Zunn Choi teaches (“The existing translation apparatus processes the voice of a user input through a microphone and outputs the translated text or voice. …” page 2, 2nd. Para.)(“The language recognition unit 321 recognizes a language of the translation target data based on the translation target data obtained from the translation apparatus 10. The character conversion unit 322 converts the translation target data, which is voice data, into characters based on the recognized language or based on a language corresponding to the speaker's direction information or speaker information. The translation unit 323 translates the translation target data converted into text into another language.” Page 6, 3rd paragraph.) (“…For example, when the user terminal 1 110 transmits a text data conversion and translation request of the translation target data to the server 150 through a network under the control of the application, the server 150 converts the translation target data into the text data. After the translation, the translation data may be transmitted to the user terminal 1 110, and the user terminal 1 110 may provide the translation device 10 under the control of the application. …” page 3, lines 11-16) by Zunn Choi,  KR 101989127 B1 
PAEK MIN HO  teaches:

wherein the voice data output circuit is configured to transmit the text data converted under the control of the processor to the translation environment. PAEK MIN HO  teaches (“The translation server 200 analyzes the received identification ID and voice data to perform language translation. The translated language is to be converted to TTS. The data converted into TTS by the translation server 200 is transferred to the second earset 112, and the second earset 112 outputs the received TTS data as voice data. The translation server 200 may be a method of communicating with the ear set by having a separate server.” Page 3, 4th paragraph from bottom page) by PAEK MIN HO, KR 20200012104 A  
PAEK MIN HO is considered to be analogous to the claimed invention because it relates to a real-time multi-interpretation wireless earset and method, and more particularly, the real-time multi-interpretation wireless earset transmits and receives the data combined with the ID signal for the corresponding language information through the translation server translation in real time.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Zunn Choi to incorporate the teachings of PAEK MIN HO in order to add feature determine a first and second source language data matched and stored with the first position data among the source language data.
One could have been motivated to do so because the system can conduct useful conversation. (“… here is a problem that the cost of text to use. In addition, although using the smart phone to use the interpretation service, this also is useful for the conversation through meetings, ….”)  by PAEK MIN HO, KR 20200012104 A  
The combination does not explicitly teach transmission of  text data conversion.
NAKADAI teaches:
transmission of  text data conversion FIG. 10,  NAKADAI teaches (“[0005] … The transmitter includes a microphone, a speech recognition circuit, and a transmitter unit and the transmitter unit transmits character information corresponding to the details recognized on the basis of the speech recognition result to the receiver. The receiver includes a receiver unit, a central processing unit (CPU), and a display unit and displays characters on the display unit when character information is received from the transmitter.”)  (“[0175] This embodiment describes the example where the plurality of speakers use the conversation support apparatus 1A, but the invention is not limited to this example. The conversation support apparatus 1A may be used by a single speaker. For example, when the speaker registers Japanese as the language in the initial state and utters speech in English, the conversation support apparatus 1A may translate the English speech uttered by the speaker into Japanese which is the registered language and may display the Japanese text in the presentation area of the image corresponding to the speaker. Accordingly, the conversation support apparatus 1A according to this embodiment can achieve an effect of foreign language learning support.”) (“[0101] (Step S5) The display image generating unit 142 generates character data corresponding to the recognition data for each speaker input from the speech recognizing unit 13 and outputs the generated character data for each speak to the image display unit 15. The display image generating unit 142 generates an image of the information indicating the direction of each speaker on the basis of the information indicating the directions of the speakers, which is input from the speech recognizing unit 13, and outputs the image of the generated information indicating the direction of each speaker to the image synthesizing unit 143.”)  (“[0104] An example of the result of an experiment which is performed using the conversation support apparatus 1 according to this embodiment will be described below. FIG. 10 is a diagram showing an experiment environment.”) (“[0140] The translation unit 24 translates the speech details if necessary on the basis of the speech details, the information indicating the speakers, and the information indicating a language for each speaker which are input from the speech recognizing unit 13A, adds or replaces information indicating the translated speech details to or for the information input from the speech recognizing unit 13A, and outputs the resultant to the image processing unit 14. Specifically, an example where two speakers of the first speaker Sp1 and the second speaker Sp2 are present as the speakers, the language of the first speaker Sp1 is Japanese, the language of the second speaker Sp2 is English will be described below with reference to FIG. 14. In this case, the translation unit 24 translates the speech details so that the images 534A to 534D displayed in the second character presentation image 532 are translated from Japanese in which the first speaker Sp1 utters speech to English which is the language of the second speaker Sp2 and are then displayed. The translation unit 24 translates the speech details so that the images 524A to 524C displayed in the first character presentation image 522 are translated from English in which the second speaker Sp2 utters speech to Japanese which is the language of the first speaker Sp1 and are then displayed.”)  by NAKADAI et al.  US 20150154957 A1
NAKADAI is considered to be analogous to the claimed invention because it  relates to a speech translation apparatus, a speech translation method, and a recording medium.
Therefore, it would have been obvious for someone of ordinary skill in the art before the effective filing date of the claimed invention to modify Zunn Choi and PAEK MIN HO and Ekkizogloy to incorporate the teachings of NAKADAI in order to add a translation environment in the system.
	One could have been motivated to do so because the system can display the conversation details on the screen can be maintained and thus the convenience for the speakers is improved. (“[0095] … Accordingly, even when the positions of the speakers are interchanged, the conversation details displayed on the screen can be maintained and thus the convenience for the speakers is improved.”)  by NAKADAI et al.  US 20150154957 A1

Conclusion

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to FOUZIA HYE SOLAIMAN whose telephone number is (571)270-5656. The examiner can normally be reached M-F (8-5)AM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Paras D. Shah can be reached at (571) 270-1650. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/F.H.S./Examiner, Art Unit 2653                                                                                                                                                                                                        

/Paras D Shah/Supervisory Patent Examiner, Art Unit 2653                                                                                                                                                                                                        
04/29/2026
Read full office action
Prosecution Timeline

Show 1 earlier event
Mar 20, 2025
Non-Final Rejection mailed — §103
Jun 26, 2025
Response Filed
Sep 22, 2025
Final Rejection mailed — §103
Dec 19, 2025
Request for Continued Examination
Jan 13, 2026
Response after Non-Final Action
Feb 05, 2026
Non-Final Rejection mailed — §103
Apr 03, 2026
Response Filed
May 01, 2026
Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/458,142
Patent 12639528
LARGE LANGUAGE MODEL AND DETERMINISTIC CALCULATOR SYSTEMS AND METHODS
2y 9m to grant Granted May 26, 2026
18/132,723
Patent 12626066
EXTRACTING CONVERSATIONAL RELATIONSHIPS BASED ON SPEAKER PREDICTION AND TRIGGER WORD PREDICTION
3y 1m to grant Granted May 12, 2026
18/174,120
Patent 12592217
SYSTEM AND METHOD FOR SPEECH PROCESSING
3y 1m to grant Granted Mar 31, 2026
18/116,994
Patent 12579976
USER TERMINAL, DIALOGUE MANAGEMENT SYSTEM, CONTROL METHOD OF USER TERMINAL, AND DIALOGUE MANAGEMENT METHOD
3y 0m to grant Granted Mar 17, 2026
17/888,243
Patent 12555563
SYSTEMS AND METHODS FOR CHARACTER-TO-PHONE CONVERSION
3y 6m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

5-6
Expected OA Rounds
68%
Grant Probability
99%
With Interview (+52.5%)
2y 11m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 68 resolved cases by this examiner. Grant probability derived from career allowance rate.