DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to a claim amendment filed on November 26, 2025 and wherein claims 1-2, 16-18, 19, 20 amended, claim 15 canceled, and claim 21 newly added.
in virtue of this communication, claims 1-14, 16-21 are currently pending in this Office Action.
With respect to the objection of drawings due to formality issues, as set forth in the previous Office Action, applicant argument, see paragraphs 3-4 of page 10 and paragraphs 1-2 of page 11 in Remarks filed on November 26, 2025, has been fully considered and the argument is persuasive and wherein the argument is on the ground of application drawing figure 5A-5C, etc. Therefore, the objection of drawings due to the formality issues, as set forth in the previous Office Action, has been withdrawn.
With respect to the objection of claims 1-20 due to formality issues, as set forth in the previous Office Action, the claim amendment, including the cancelation of claim 15, and argument, see paragraph 2 of page 11 in Remarks filed on November 26, 2025, have been fully considered and the argument is persuasive. Therefore, the objection of claims 1-20 due to the formality issues, as set forth in the previous Office Action, has been withdrawn.
With respect to the rejection of claims 1-20 under 35 USC §112(a), as set forth in the previous Office Action, the claim amendment, including the cancelation of claim 15, and argument, see paragraphs 4-5 of page 12 in Remarks filed on November 26, 2025, have been fully considered and the argument is persuasive. Therefore, the rejection of claims 1-20 under 35 USC § 112(a), as set forth in the previous Office Action, has been withdrawn.
With respect to the rejection of claims 1-20 under 35 USC §112(b), as set forth in the previous Office Action, the claim amendment, including the cancelation of claim 15, and argument, see paragraph 1 of page 13 in Remarks filed on November 26, 2025, have been fully considered and the argument is persuasive. Therefore, the rejection of claims 1-20 under 35 USC § 112(b), as set forth in the previous Office Action, has been withdrawn.
The Office appreciates the explanation of the amendment and analyses of the prior arts, and however, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and MPEP 2145.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2, 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao et al (CN 108712551 A translation by ESPAN, hereinafter Zhao, recorded in IDS) and in view of Every et al (US 8606571 B1, hereinafter Every).
Claim 1: Zhao teaches a position detection method (title and abstract, ln 1-7, method steps shown in fig. 3 and implemented by a smart electronic device such as smartphone, para 1) performed by a terminal device (such as smartphone, para 1, and other smart device with a call function, para 40) comprising:
obtaining voice signals during a voice call by at least two voice collecting devices of the terminal device (acquiring a call voice by a first and a second microphones of the electronic device, in a call mode, para 7);
obtaining information on position energies of the voice signals (loudness differences of audio signals between the first and the second microphones, as claimed information, are detected to monitor a posture of the electronic device, para 30); and
identifying one of a plurality of predefined positions as a position of the terminal device relative to the user during the voice call, based on the information (based on the detect loudness difference of the first and the second microphone signals during the call, and the electronic device is at a first posture with low sound quality, i.e., the loudness difference is below a threshold value, and the electronic device is at a second posture with a better sound quality, i.e., the loudness difference is equal to or higher than the threshold, para 30), wherein the plurality of predefined positions comprise the first predefined position (the first posture with the poor sound quality above), and a second predefined position (the second posture with the better sound quality above) and further teaches a central axis of the terminal device (the electronic device discussed above, can be a smartphone, para 1, and a central axis of the smartphone is inherency) and a central axis of a face of a user (user in a call has multiple gestures, para 1 and thus, a central axis of a face of the user is also inherency).
However, Zhao does not explicitly teach wherein the plurality of predefined positions, including the first predefined position and the second predefined position, is respectively corresponding to an angle between the central axis of the terminal device and the central axis of the face of the user and does not explicitly teach a third predefined position of the plurality of predefined positions corresponding to a face angle of the user.
Every teaches an analogous field of endeavor by disclosing a positional method (title and abstract, ln 1-13 and a method in fig. 10) and wherein a terminal device is disclosed (an audio device 104a in fig. 1A, e.g., telephone handsets, col 13, ln 54-59) having at least two voice collecting devices (microphones M1/M2 on the telephone handsets in fig. 1) and wherein a central axis of the terminal device is disclosed (a nominal dashed line crossing two microphones M1/M2 in figs. 4A/4B) and a central axis of a face of a user (the boarders of each of regions 400, 420 in figs. 4A/4B and the boarder of the region 500 in fig. 5 and crossing the center of the face of the user 102 in figs. 4A/4B and 5) and an angle between the central axis of the face of the user and the central axis of the terminal device is disclosed (angles θ is an angle of the terminal device or central axis of the terminal device away from the centra axis of the user’s mouth or face and moving in horizontal direction and ψ is an angle between the centra axis of the terminal device and the central axis of the user’s face and moving around the user’s ear in fig. 3 and the angles in similar definition in figs. 4-5), by which the plurality of predetermined positions as the position of the terminal device relative to the user during the voice call, including a first predefined position with a first angle, a second predefined position with a second angle (a nominal “close-talk” usage position in fig. 3, and an example of variations in position from this nominal usage position, respectively, and wherein the usage position is measured with angles θ and ψ in fig. 3, col 5, ln 66-67, col 6, ln 1-2), and a third predefined position with a third angle is also disclosed (a position at 1-2 cm apart and with an angle between the device and the a boarder of the region 500 in fig. 5 and predetermined as a similar signal level received by the primary microphone 106 and by the secondary microphone 108, col 7, ln 48-55) in which the angles measured between the central axis of the terminal device and the central axis of the face of the user (an angle defined by σ(k) and calculated via ILD and IPD, col 6, ln 48-51 and represented in figs. 3-5, so that the angle defined by σ(k) is an orientation of the mouth as speech source relative to the terminal device, in fig. 5) for benefits of improving speech quality in a more robust manner by effectively and adaptively measuring noises to be suppressed (col 1, ln 26-41) and obtaining better performance (by balancing between positional robustness and noise reduction robustness, col 1, ln 45-53).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the first predefined position with the first angle, the second predefined position with the second angle, and third position with the third angle, measured as the angle between the central axis of the terminal device and the central axis of the face of the user corresponding to the first angle, the second angle, and the third angle, respectively, as taught by Every, to the first position with the first predefined position and the second predefined position of the plurality of the defined positions in the position detection method performed by the terminal device, as taught by Zhao, for the benefits discussed above.
Claim 2: the combination of Zhao and Every further teaches, according to claim 1 above, obtaining the information on position energies of the voice signals comprises:
obtaining projection energies of the voice signals (the difference in loudness between the sound collected by the first microphone and the sound collected by the second microphone is detected at S302, para 29, the loudness of the sounds collected by both first and second microphones as claimed projection energies) corresponding to each of the plurality of predefined positions (Zhao, the electronic device in its posture, being equivalent to a position, wherein the electronic device is griped by user’s head side or user’s shoulder side as a first posture or position of the electronic device, and in a position wherein the electronic device is held by user’s hand as a second posture or position, para 37 and the posture or position of the electronic device is determined by using the sound loudness collected by the first and the second microphones, para 29-30 above).
Claim 18: the combination of Zhao and Every teaches, according to claim 1, the first angle, the second angle, and the third angle (the first angle and the second angle by Zhao and Every, and the third angle by Every and discussed in claim 1 above) above, except explicitly teach wherein the second angle is between the first angle and the third angle.
It has been a recognized problem and need in the art, which may include a design need to solve the problem for determining the predefined plurality of positions including the first angle, the second angle, and the third angle, e.g., human habit and environment situation during the call, so that one of the angles is whether between the other two angles of the plurality of positions would have been a finite number of identified, predictable potential solutions for a better position used for a clear telephone call or conversation, which includes
the first angle is between the second and the third angles,
the second angle is between the first and the third angles,
the third angle is between the first and the second angles,
it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to have pursued the known potential solutions with a reasonable expectation of success or obvious to try, see MPEP 2141, III.
Therefore, it would have been obvious for one having ordinary skill in the art before the effective filing date of the claimed invention to have applied the second angle being between the first and the third angles, as taught by the obvious to try above, to the first, the second, and third angles, as taught by the combination of Zhao and Every, for the benefits discussed above.
Claim 19 has been analyzed and rejected according to claim 1 above and the combination of Zhao and Every further teaches, a terminal device (Zhao, electronic device, para 1 and Every, an audio device) comprising
at least two voice collecting devices (Zhao, a first and a second microphones, para 5 and Every, a first microphone 106 and a second microphone 108 in fig. 1A);
one or more processors (Zhao, a processor, para 12, and Every, a processor 202);
a memory configured to store one or more application programs (Zhao, programmable ROM, CD-ROM, hard disk, etc., para 62 with software module, para 62 and Every, a memory stored with execute instructions and modules, col 5, ln 16-22) that, when executed by the one or more processors, cause the one or more processors to implement the position detection method of claim 1 (Zhao, the software module executed by the processor, para 62 and Every, as processing unit to execute the software, col 5, ln 16-22).
Claim 20 has been analyzed and rejected according to claims 1, 19 above.
Claims 3, 8, 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (above) and in view of references Every (above) and Herre et al (US 20130259243 A1, hereinafter Herre).
Claim 3: the combination of Zhao and Every teaches all the elements of claim 3, according to claim 2 above, including obtaining a projection energy of frequency bins corresponding to each of the plurality of predefined positions, wherein the plurality of frequency bins are included in the voice signals (Zhao, a frequency and time domain analysis of the call voice is performed while electronic device in the first posture, para 52-54 and Every, via frequency analysis module 602, col 8, ln 14-42), voice signals according to each of the plurality of predefined positions (Zhao, the frequency analysis is performed at the first posture of the electronic device, para 52-54, and Every, through frequency analysis 602 in fig. 6, such as short-time Fourier transform STFT for frequency bins, col 8, ln 14-35 and further processing including speech/noise reserved or canceled upon the position of the electronic device 108 in fig. 5, e.g., in region 500, speech is preserved and noise is not suppressed in frequency domain due to high SNR in fig. 5), except obtaining a weight of each of the plurality of frequency bins; and identifying the projection energies of the voice signals corresponding to each of the plurality of predefined positions, based on the projection energy of each of the plurality of frequency bins corresponding to each of the plurality of predefined positions and the weight of each of the plurality of frequency bins.
Herre teaches an analogous field of endeavor by disclosing a position detection method performed by a device (title and abstract, ln 1-16 and an apparatus in fig. 1 and a method, para 2) and wherein
obtaining a projection energy of each of a plurality of frequency bins (sound pressure level Pv(k,n) at the virtual microphone position obtained by a formula 12, and projected to a real reference microphone signal level Pref(k,n) at a reference position pref, para 122-123, para 128) corresponding to each of the plurality of predefined positions (wherein d1(k,n) is the distance between a first real spatial microphone and the position of the sound event near to the virtual microphone located at, para 47), wherein the plurality of frequency bins are included in the voice signals (frequency bin by k and time frame by n in Pv(k,n) and Pref(k,n) above);
obtaining a weight of each of the plurality of frequency bins (the sound pressure level is modified by beam patterns of microphones, e.g., cardioid pattern, defined by [1+cos(φv(k,n)] in a formula 14, para 128); and
identifying the projection energies of the voice signals corresponding to each of the plurality of predefined positions (modified Ṕv(k,n) by the formula 14 at cardioid directivity of the virtual microphone, para 145-146), based on the projection energy of each of the plurality of frequency bins corresponding to each of the plurality of predefined positions (Pv(k,n) by the formula 12, wherein s(k,n) is the distance as the claimed each of predefined positions upon the frequency bin k and time frame n, para 128) and the weight of each of the plurality of frequency bins ([1+cos(φv(k,n)] as the claimed weight of each of the frequency bin and discussion above) for benefits of improving the sound quality of the recorded audio signals by compensating for different propagation paths of the soundwaves (para 133, 148) with less number of microphones and simpler microphone setup (para 23).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied obtaining weight of each of the plurality of frequency bins; and identifying the projection energies of the voice signals corresponding to each of the plurality of predefined positions, based on the projection energy of each of the plurality of frequency bins corresponding to each of the plurality of predefined positions and the weight of each of the plurality of frequency bins, as taught by Herre, to obtaining the projection energies of the voice signals corresponding to each of the predefined positions in the position detection method performed by the terminal device, as taught by combination of Zhao and Every, for the benefits discussed above.
Claim 8: the combination of Zhao, Every, and Herre further teaches, according to claim 3 above, wherein the obtaining weight of each of the plurality of frequency bins comprises: obtaining a predefined weight of each of the plurality of frequency bins (Herre, the discussion in claim 3 above, the weight determined by configuration of the virtual microphone to be dedicated and predetermined directivity, e.g., cardioid directivity shape, para 142).
Claim 16: the combination of Zhao, Every, and Herre further teaches, according to claim 1 above, wherein the identifying the position of the terinal device is based on a feature vector corresponding the user (Herre, the DOA is determined in positions 610, 620 pointed by vectors p1 and p2, para 104), and wherein the number of dimensions of the feature corresponding to a number of the at least two voice collective devices (Zhao, the two microphones, discussed in claim 19 above and Every, the primary and the secondary microphones, discussed in claim 19 above, and Herre, two real spatial microphones corresponding to two vectors p1 and p2 with the positions 610, 620 in fig. 6, para 104).
Claim 17: the combination of Zhao, Every, and Herre further teaches, according to claim 1 above, wherein obtaining a plurality of feature vectors respectively corresponding to the plurality of predefined positions based on the position energy information (Herre, fig. 6, the two vectors p1 and p2 pointed to the two positions 610m, 620 from the origin point O in a X-Y coordinate in fig. 6, para 104 and ψ(VM) is determined by energy combinations of
PNG
media_image1.png
42
133
media_image1.png
Greyscale
, para 163 and the energies
PNG
media_image2.png
60
122
media_image2.png
Greyscale
, para 159-160).
Claims 4-6, 13, 21 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (above) and in view of references Every(above), Herre (above) and Kong et al. (US 20040220800 A1, hereinafter Kong).
Claim 4: the combination of Zhao, Every, and Herre teaches all the elements of claim 4, according to claim 3 above, except explicitly teaching
normalizing a plurality of feature vectors to obtain normalized feature vectors corresponding to the voice signals; and identifying the projection energy of each of the plurality of frequency bins corresponding to each of the plurality of predefined positions, based on the normalized feature vectors and feature matrixes corresponding to each of the plurality of predefined positions.
Kong teaches an analogous field of endeavor by disclosing a position detection method (title and abstract, ln 1-18 and method steps in fig. 6-7) and wherein
obtaining a plurality of feature vectors corresponding to the voice signals, wherein the feature vectors include a respective feature value corresponding to each of the plurality of frequency bins (the feature vector values E[] in the formula 16);
normalizing the plurality of feature vectors to obtain normalized feature vectors corresponding to the voice signals (averaging E[] to obtain Rk in the formula 16, para 72); and
identifying the projection energy of each of the plurality of frequency bins corresponding to each of the plurality of predefined positions, based on the normalized feature vectors and feature matrixes corresponding to each of the plurality of predefined positions (beamforming being performed according to Rk, para 73) for benefits of achieving a high quality speech signal recording by avoiding the echo interference of the direct target signals (para 20-21) with less computation complexes (para 22).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the plurality of feature vectors and wherein normalizing the plurality of feature vectors to obtain the normalized feature vectors corresponding to the voice signals; and identifying the projection energy of each of the plurality of frequency bins corresponding to each of the plurality of predefined positions, based on the normalized feature vectors and feature matrixes corresponding to each of the plurality of predefined positions, as taught by Kong, to obtaining the projection energy of each of the plurality of frequency bins corresponding to each of the plurality of predefined positions in the position detection method, as taught by the combination of Zhao, Every, and Herre, for the benefits discussed above.
Claim 5: the combination of Zhao, Every, Herre, and Kong further teaches, according to claim 4 above, wherein the obtaining the plurality of feature vectors corresponding to the voice signals comprises:
obtaining at least two frequency domain signals corresponding to the voice signals (Zhao, the frequency analysis is performed at the first posture of the electronic device, para 52-54, and Every, through frequency analysis 602 in fig. 6, such as short-time Fourier transform STFT for frequency bins, col 8, ln 14-35 and further processing including speech/noise reserved or canceled upon the position of the electronic device 108 in fig. 5, e.g., in region 500, speech is preserved and noise is not suppressed in frequency domain due to high SNR in fig. 5 and Kong, X1,k, XL,k, XL+1,k, XL+p-1,k distributed in X(1)-k, X(2)-k, …, X(p)-k in equation 15, k=frequency bin index, para 71; the feature vector values being [X1,k, XL,k], [X2,k, XL+1,k], …, [Xp,k, XL+p-1,k], etc. in equation 15 and discussion in claim 4 above); and combining feature values of the at least two frequency domain signals of the plurality of frequency bins to obtain the plurality of feature vectors of the voice signals (Kong, by combining [X1,k, XL,k], [X2,k, XL+1,k], …, [Xp,k, XL+p-1,k] to form vector E[X(i)-k, X(i)-k] i=1, …, p in equation 16, and p is number of microphone sets in fig. 5, para 72-73).
Claim 6: the combination of Zhao, Every, Herre, and Kong further teaches, according to claim 4 above, before normalizing the plurality of feature vectors, performing frequency response compensation on the plurality of feature vectors based on a predefined compensation parameter to obtain amplitude-corrected feature vectors (Zhao, in frequency domain, the voice signals are noise-reduced according to the first control manner when the electronic device is at the first posture for improving call quality, para 32-33 and Kong, the audio signals are combined to form the plurality of feature vectors and the discussion in claim 5 above).
Claim 13 has been analyzed and rejected according to claims 3-4 above (the claimed summating by the averaging in the formula 16 of the Kong’s disclosure and discussion in claim 4 above).
Claim 21: the combination of Zhao, Every, and Kong further teaches, according to claim 1 above, obtaining a plurality of feature vectors respectively corresponding to the plurality of predefined positions based on the information (Zhao, the loudness difference between the signals received by the two microphones is below the threshold and equals/above the threshold, respectively, corresponding to the first posture and the second posture based on the level difference, the discussed in claim 1 above, and Every, angles defined by the position of the termina device in figs. 3, and further angles in fig. 4, and a distance such as 1-2 cm of two microphones, and further angle in fig. 5, and discussed in claim 1 above and Kong, spatial covariance matrixes over frequency components corresponding to location of target signal source relative to the listening devices at step S4-S5, and weight is determined to be applied so that specific sound source location is determined at S6 in fig. 6).
Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Zhao (above) and in view of references Every (above), Herre (above), and Mekata et al (US 5335312 A, hereinafter Mekata).
Claim 9: the combination of Zhao, Every, and Herre teaches all the elements of claim 9, according to claim 3 above, including obtaining weight of each of the frequency bins (the discussion in claim 3 above), except wherein the obtaining weight of each of the frequency bins comprises: identifying the weight of each of the plurality of frequency bins through a weight identification neural network, based on the projection energy of each of the plurality of frequency bins corresponding to each of the predefined positions or the position energy information of the voice signals.
Mekata teaches an analogous field of endeavor by disclosing method associated with a relative position change of the audio sound sources (title and abstract, ln 1-12 and figs. 9, 15) and wherein obtaining weight of each of the frequency bins disclosed (for each of frequency sub-band through the filter bank 120 in fig. 9, and the element 510, the weight is outputted from the neural network 520 in figs. 9, 15, col 7, ln 22-44) comprises: identifying the weight of each of the plurality of frequency bins through a weight identification neural network (neural network 520 by which, a weight for each of frequency sub-bands outputted from an output layer 550 and applied to each of audio signal of the frequency subband by multipliers 560a, …, in figs. 9, 15), based on the projection energy of each of the plurality of frequency bins corresponding to each of the predefined positions or the position energy information of the voice signals (corresponding to the each of the microphones associated with relative position change of noise sources and voice sources via phase difference between noise and the voice, col 11, 29-36) for benefits of improving sound quality in dynamic change of the sound source locations (positional relationship between the noise source and the voice source changes to distort the voice signal during conventional noise reduction, col 2, ln 1-12).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein identifying the weight of each of the plurality of frequency bins through the weight identification neural network, based on the projection energy of each of the plurality of frequency bins corresponding to each of the predefined positions or the position energy information of the voice signals, as taught by Mekata, to obtaining the weight of each of the frequency bins in the position detection method, as taught by the combination of Zhao, Every, and Herre, for the benefits discussed above.
Claim 10: the combination of Zhao, Every, Herre, and Mekata The method of claim 9, further comprising: identifying, by a control subnetwork, signal-to-noise ratio characteristic values of the voice signals based on the position energy information of the voice signals (Zhao, the loudness of each of microphone signals is determined, the discussion in claim 1 above, Every, spectrum energy is enhanced at the specified sound direction by applying mask related to subband spectrum energy, col 10, ln 50-63, Herre, modified Ṕv(k,n) by the formula 14 at cardioid directivity of the virtual microphone, para 145-146; the predefined position represented by a distance s(k,n) as function of at least frequency bin index k and time frame number n in Pv(k,n) formula 12, para 128, and SNR is used to select a audio signal for processing, para 134, and thus, identifying a SNR is inherency for the selection above, and Mekata, via noise addition means 320 to determine an average S/N ratio within the voice section of the time length T about 6dB, and the noise-added audio signal inputted to the noise suppression apparatus 300 having the neural network to calculate the weight in the weight efficiency renewal means 330 in fig. 4); identifying whether the weight of each of the plurality of frequency bins is a predefined weight based on the signal-to-noise ratio characteristic value (Herre, the modified audio signals is weighted in the linear combination, para 134, and Mekata, the weight coefficient of the neural network 140 and then trained by using noisy audio signal to judge whether the adjusted weight is acceptable based on the error E, and judged by adjust-end judge means 350 in fig. 4, col 5, ln 50-67); and based on the weight of each of the plurality of frequency bins not being the predefined weight, identifying, by a calculation subnetwork, the weight of each of the frequency bins based on the projection energy of each of the plurality of frequency bins corresponding to each of the predefined positions (through a loop for adjusting weight by element 350, 330 and 130, etc. in fig. 4 and method in fig. 5).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Zhao (above) and in view of references Every (above) and Choisel et al (US 20160057522 A1, hereinafter Chiosel).
Claim 14: the combination of Zhao and Every further teaches, according to claim 1 above, wherein the identifying the position of the terminal device relative to the user during the voice call from the plurality of predefined positions based on the position energy information comprises:
obtaining projection energies of the voice signals corresponding to each of the plurality of predefined positions (Zhao, the loudness of the 1st microphone and the loudness of the 2nd microphone, in order to calculate the loudness difference, para 30).
However, the combination of Zhao and Every does not explicitly teach wherein
identifying a maximum information on position energies, from among the projection energies of the voice signals corresponding to each of the plurality of predefined positions;
obtaining the position of the terminal device, from among the plurality of predefined positions, based on the maximum information on position energies.
Choisel teaches an analogous field of endeavor by disclosing a position detection method (title and abstract, ln 1-16, a method shown in fig. 4 and implemented by an audio capture device 101 in fig. 2) performed by a terminal device (the audio capture device 101 in fig. 2; the audio capture device 101 being a tablet computer, a laptop computer, a video conferencing phone, or a cellular telephone as the claimed terminal, para 22) comprising:
obtaining voice signals during a voice call by at least two voice collecting devices of the terminal device (two or more microphones 107 in a microphone array 105 to sense sounds and to generate audio signals to audio codec 221 in fig. 2, para 24; the captured sound being uttered by a user during communicating over telephone connection, e.g., calling “System, please call Megan”, para 40, as the claimed during a call; performed by power level unit 705 in fig. 7, and at step 401-403 in fig. 4);
obtaining information on position energies of the voice signals (the voice signals or beam patterns S1/S2 representing sounds captured by each of two or more beam patterns 301 and 303, para 42; by the S1, S2, power of signals P1/P2 is determined, corresponding to different directivity indices illuminated by patterns S1, S2, and performed at step 405, para 43);
identifying one of a plurality of predefined positions (Fig. 5A-5B, represented by distance r in fig. 5A-5B, as determined positions of the plurality of positions predetermined within the listening room having a size and wall 100 in fig. 3C, e.g., at home office, para 21, 51-58; including two or more beam patterns having different directions or directivity indices predetermined within the room, as claimed predetermined positions) as a position of the terminal device relative to a user during the voice call (including a distance r between the audio capture device and the user in fig. 4, para 51), based on the information (based on the calculated P1 and P2 at step 411-413 in fig. 4, para 51; e.g., using averaged or normalized to smooth the calculated distances r in fig. 5B, para 51-58; including two or more beam patterns focused on different directions predefined in the room as the claimed predefined positions based on the energy information; the beam with highest pressure/power level is determined to be the direction of the user relative to the audio capture device 101, para 62) and
wherein identifying the position of the terminal device further comprising:
identifying a maximum information on position energies, from among the projection energies of the voice signals corresponding to each of the plurality of predefined positions (the beams analyzed to determine a beam with the highest pressure/power level from the patterns, para 62); and
obtaining the position of the terminal device, from among the plurality of predefined positions, based on the maximum information on position energies (the direction of the user 103 relative to the audio capture device 101 as the beam direction or the directivity index having a highest pressure/power level, para 62) for benefits of optimizing audio-visual rendering (by accurately and simply determining a position of an electronic device relative to a listener position or vise verse, para 2, and with costless, para 4, 65, and repeating the measurement becomes possible for improving measurement accuracy, para 48).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein identifying the maximum information on position energies, from among the projection energies of the voice signals corresponding to each of the plurality of predefined positions; and obtaining the position of the terminal device, from among the plurality of predefined positions, based on the maximum information on position energies, as taught by Choisel, to identifying the position of the terminal device relative to the user during the voice and including obtaining the projection energies of the voice signals in the position detection method, as taught by the combination of Zhao and Every, for the benefits discussed above.
Allowable Subject Matter
Claims 7, 11-12, are objected to as being dependent upon a rejected base claim 1, but would be allowable if rewritten in independent form including all of the limitations of the base claim AND any intervening claims.
Response to Arguments
Applicant's arguments filed on November 26, 2025 have been fully considered and but are moot in view of the new ground(s) of rejection necessitated by the applicant amendment. Although a new ground of rejection has been used to address additional limitations that have been added to claims 1, 19-21, a response is considered necessary for several of applicant’s arguments since references Zhao and Every will continue to be used to meet several claimed limitations.
With respect to the prior art rejection of independent claim 1, similar to claims 19-20, under 35 USC §103(a), as set forth in the Office Action, applicant challenged prior art Zhao and argued: Zhao does not teach claimed “identifying one of a plurality of predefined positions as a position of the terminal device relative to a user during the voice call, based on the information …” because “Zhao does not teach or suggest predefined postures” and “does not teach or suggest first, second, and third predefined postures” and “Zhao does not teach or suggest that the first and second postures respectively correspond to different angles between a central axis of the terminal device and a central axis of a face of a user”, as asserted in paragraphs 3-4 of page 14 and paragraph 1 of page 15 in Remarks filed on November 26, 2025.
In response to the argument above, the Office respectfully disagrees because claim failed to recite what is base or by using which way so that “a plurality of” “positions” is “predetermined, and therefore, a BRI would be applied, see MPEP 2111. Under the BRI above, Zhao disclosed a scenario that a user is making a voice call by using smartphone for the invention as discussed in claim rejection of the office action above, and it is well-known in a real world that it is so common for a user to make the phone call in a plurality of user choices of his/her positions relative to the terminal device, which would be pre-relied on or predetermined according to user’s habits, perceived comfortable level, ambient noise level, spatial status of the user such as walking, laying down, sitting, in variety of environment conditions such as in beach, in forest, in park, in home, in office, in a vein around people or a quiet area, at cooking, at watching TV, etc. Indeed, Zhao does not say exact word “predefined” in the publication, but Zhao’s disclosure of the scenario above would be obvious for one having ordinary skill in the art that Zhao’s two “postures” with bad and good sound quality would be predetermined at first time by directly perceiving the sound quality affected by those conditions discussed above. In addition, it is also well-known in the art that expressing a spatial and relative position of object by using which coordinate system, using distance, direct line, or angles/radius, etc., would be obvious or designer’s choice, and thus, would not be considered as inventive subject matter and clearly, Every specifically disclosed what Zhao’s missing above, and see the response to the argument below.
Applicant further challenged prior art Every and argued “Every does not disclose that these regions are associated with an angle between a central axis of a terminal device and a central axis of a face of a user and Every also does not disclose that such regions are predefined” because Every’s “spatial coefficient σ(k) indicative of a possible positional orientation of a user” “between the primary acoustic signal c and the secondary acoustic signal f” and “Every discloses a spatial region which corresponds to a speech source and a spatial region which corresponds to a noise source”, etc., as asserted in paragraph 2 of page 15 in Remarks filed on November 26, 2025.
In response to the argument above, the Office further disagrees because Every does not only disclose “spatial region which corresponds to a speech source” as speech from user’s mouth spatially relative to the terminal device, and “corresponds to a noise source” of other speech as noise, etc., in a non-Cartesian system, as indicated in the argument, but Every also disclosed a predefined first angle, a predefined second angle (a nominal “close-talk” usage position and an example of variations in position from this nominal usage position in fig. 3, col 5, ln 65-67, col 6, ln 1-2), and a predefined third angle (a position at 1-2 cm apart and an angle between the device and the a boarder of the region 500 to provide a tradeoff between positional robustness and noise reduction robustness, i.e., the level of speech received by the primary microphone 106 is similar to that received by the secondary microphone 108, col 48-55) and those angles are measured between the user, or central axis of the user’s mouth or face and a central axis of the terminal device (fig. 3-5 and the discussion in office action above), and therefore, according to the BRI of the broadly claimed and argued “predefined” feature above, Every’s disclosure covered what Zhao’s missing discussed above and anticipated the argued “first/second/third angles” and “predefined” features above, but applicant is in silence and thus, the argument above is not persuasive.
Applicant further referred to other prior arts Herre, Kong, Mckata, and Chiosel with respect to the argued above, see the last paragraph of page 15 and paragraph 1 of page 16 in Remarks filed on November 26, 2025. However, as discussed above, the combination of Zhao and Every teaches all the features, including the argued features above, of claim 1, and therefore, Herre, Kong, Mckata, and Chiosel do not have to teach the features the combination of Zhao and Every has taught and thus, the argument above with respect to prior arts Herre, Kong, Mckata, and Chiosel is also not persuasive. It is recommended to consider the allowable subject matter as set forth above in order to expedite the prosecution.
It is further noted herein, in order to expedite the application prosecution, the Office respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Office in prosecuting this application.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589. The examiner can normally be reached on Monday-Friday 6:30amp-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/LESHUI ZHANG/
Primary Examiner,
Art Unit 2695