DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments filed 10/29/2025 have been considered but are moot in view of a new ground of rejections.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1, 3-9, 12, 14-17, and 20-21 are rejected under 35 U.S.C. 103 as being unpatentable over Potts et al. (US 6,593,956 B1 – hereinafter Potts) and Hayashi (US 2019/0184567 A1 – hereinafter Hayashi).
Regarding claim 1, Potts discloses a control method comprising: determining position/orientation information of a target object according to an imaging position of the target object in an imaging view of a photographing device of a media apparatus (column 6, lines 24-42 – determining location of a speaker with respect to the camera, which is stored in face location memory 110 as shown in Fig. 4, according to an imaging position of the person detected by video face location 102 and face location tracking 106 shown in Fig. 4, in images captured by the camera as further described in at least column 7, lines 49-54 and column 8, lines 2-33); determining sound source position/orientation information according to ambient audio picked up by a sound pickup device of the media apparatus (Fig. 4 – by modules 114, 116, and 118 as further described at least in column 7, lines 32-38); and adjusting a photographing parameter of the photographing device and/or a sound pickup parameter of the sound pickup device according to the position/orientation information of the target object and the sound source position/orientation information, to focus an image captured by the photographing device on the target object (Fig. 4 – by module 80; column 6, lines 31-34 – panning, tilting, or zooming to focus on the speaker); in response to detecting that the target object disappears from the imaging view of the photographing device (Figs. 23A-23C; column 23, lines 43-63 – in response to a speaker moving to a new location out of FOV of the camera, e.g. from location A to location B, or from location B back to location A): determining target sound source position/orientation information associated with the target object according to the ambient audio picked up by the sound pickup device (column 6, lines 24-34 – based on ‘target sound source position/orientation information associated with the target object according to the ambient audio picked up by the sound pickup device’ as described at least in column 6, lines 24-34); and adjusting the photographing parameter of the photographing device according to the target sound source position/orientation information, to cause the target object to return to the imaging view of the photographing device (Figs. 23A-23C; column 23, lines 43-63 – adjusting the photographic parameter, e.g. changing view, according to the target object moving to a new location thus based on ‘target sound source position/orientation information associated with the target object according to the ambient audio picked up by the sound pickup device’).
However, Potts does not disclose “determining target sound source position/orientation information associated with the target object according to the ambient audio picked up by the sound pickup device” as “in response to one of sound sources emitting audio with preset semantic information, determining target sound source position/orientation information based on position/orientation information of the one of the sound sources”.
Hayashi discloses “determining target sound source position/orientation information associated with the target object according to the ambient audio picked up by the sound pickup device” as “in response to one of sound sources emitting audio with preset semantic information, determining target sound source position/orientation information based on position/orientation information of the one of the sound sources” ([0205] – a sound source currently out of the FOV of the camera, among other sound sources, emitting voice conceivable as specific words thus with preset semantic information, determining at least the direction of the sound source to turn the camera facing the sound source).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Hayashi into the control method taught by Potts to facilitate an accurate recognition of the sound source (Hayashi: [0007]).
Regarding claim 3, Potts in view of Hayashi also discloses the method according to claim 1, wherein adjusting the photographing parameter according to the position/orientation information of the target object includes: in response to detecting that the target object disappears from the imaging view of the photographing device: determining a first predicted position/orientation of the target object according to the imaging position of the target object in the imaging view before the target object disappears from the imaging screen (Figs. 23A-23C; column 23, lines 43-63 – determining the original position); determining a second predicted position/orientation of the target object according to the sound source position/orientation information (Figs. 23A-23C; column 23, lines 43-63 – determining the new position); and adjusting the photographing parameter according to the first predicted position/orientation and the second predicted position/orientation, to cause the target object to return to the imaging view (Figs. 23A-23C; column 23, lines 43-63 – based on the original position and the new position, broadening the FOV of the camera to keep both positions in the imaging view).
Regarding claim 4, Potts in view of Hayashi also discloses the method according to claim 3, wherein determining the target sound source position/orientation information includes: obtaining audio feature information of sound sources (column 17, line 46 -column 18, line 24 – determining at least onset or beginning of a sequence of audio signals, or data associated with frequency components, or magnitude of frequency components etc.); and determining the target sound source position/orientation information based on the audio feature information (column 6, lines 24-31; column 18, lines 25-28 - locating the speaker with respect to camera).
Regarding claim 5, Potts in view of Hayashi also discloses the method according to claim 4, wherein determining the target sound source position/orientation information based on the audio feature information includes at least one of: in response to a frequency of audio emitted by one of the sound sources is within a target frequency band (column 17, line 57 – column 18, line 24 – at least within audio frequency band), determining the target sound source position/orientation information based on position/orientation information of the one of the sound sources (column 6, lines 24-31; column 18, lines 25-28 - locating the speaker with respect to camera); in response to an amplitude of audio emitted by one of the sound sources meets a preset amplitude condition (column 17, line 62 – column 18, line 2 – amplitude of audio should be greater than magnitude of audio in previous frames by a predetermined amount), determining the target sound source position/orientation information based on position/orientation information of the one of the sound sources (column 6, lines 24-31; column 18, lines 25-28 - locating the speaker with respect to camera).
Regarding claim 6, Potts in view of Hayashi also discloses the method according to claim 3, wherein adjusting the photographing parameter according to the first predicted position/orientation and the second predicted position/orientation includes: predicting where the target object is located according to the first predicted position/orientation and the second predicted position/orientation to obtain a predicted area (column 13, lines 51-57; column 14, lines 12-49); and adjusting the photographing parameter of the photographing device according to a position/orientation of the predicted area (column 17, lines 1-33 – storing predicted location into a corresponding track file, then converted to audio coordinate system in face location memory, which is used to adjust the photographing parameter of the camera).
Regarding claim 7, Potts in view of Hayashi also discloses the method according to claim 1, wherein adjusting the photographing parameter of the photographing device and the sound pickup parameter of the sound pickup device includes at least one of: adjusting the photographing parameter to cause the target object to be in a specified area in the imaging view (column 20, lines 22-49 – adjusting the photographing parameter by correcting the offset to move the face to a desired position); adjusting the photographing parameter to cause a size of the target object in the imaging view to match a distance from the target object to the media apparatus; adjusting the sound pickup parameter to cause the audio picked up by the sound pickup device to match the distance from the target object to the media apparatus; or adjusting the sound pickup parameter to enhance an amplitude of the audio of the target object and weaken an amplitude of other audio except the audio of the target object.
Regarding claim 8, Potts in view of Hayashi also discloses the method according to claim 1, further comprising: in response to the imaging view being not synchronized with the ambient audio, determining the imaging position based on a most recently obtained imaging view including the target object (column 20, line 59 – column 22, line 5 – in response to the imaging view of a face being not synchronized with the ambient audio causing pointing errors between the ambient audio and speaker, determining the imaging position based on current frame as further described at least in column 21, lines 19-41).
Regarding claim 9, see the teachings of Potts and Hayashi as discussed in claim 1 above. Hayashi also discloses adjusting the photographing parameter and the sound pickup parameter includes at least one of: performing audio recording on the target object based on a recording mode selected by a user ([0209]-[0210] – a recording mode set by the user to record voice based on similarity), and, in the recording mode, adjusting the sound pickup parameter in real time according to the position/orientation information of the target object and the sound source position/orientation information ([0209]-[0210] – in such a recording mode based on voice similarity, adjusting at least adjusting the sound pickup parameter in real time to the position/orientation information of the target object and the sound source position/orientation information).
The motivation for incorporating the teachings of Hayashi into the control method has been discussed in claim 1 above.
Claim 12 is rejected for the same reason as discussed in claim 1 above in view of Potts also disclosing a control device (Fig. 1) comprising: at least one processor (column 6, line 65 – column 7, line 10 – a processor or a microprocessor); and at least one memory storing at least one computer program that, when executed by the at least one processor, causes the control device to perform the recited method (column 7, lines 1-10 – a suitable memory such as ROM, RAM, etc. storing instructions to be executed by the processor for performing the method as discussed in claim 1 above).
Claim 14 is rejected for the same reason as discussed in claim 3 above.
Claim 15 is rejected for the same reason as discussed in claim 7 above.
Claim 16 is rejected for the same reason as discussed in claim 8 above.
Claim 17 is rejected for the same reason as discussed in claim 9 above.
Claim 20 is rejected for the same reason as discussed in claim 1 above in view of Potts also disclosing a media apparatus (Fig. 1) comprising: at least one photographing device configured to collect an ambient image (Fig. 1 – camera 14); at least one sound pickup device configured to pick up ambient audio (Fig. 1 – microphone array 12); at least one processor (column 6, line 65 – column 7, line 10 – a processor or a microprocessor); and at least one memory storing at least one computer program that, when executed by the at least one processor, causes the apparatus to perform the recited method (column 7, lines 1-10 – a suitable memory such as ROM, RAM, etc. storing instructions to be executed by the processor for performing the method as discussed in claim 1 above).
Regarding claim 21, Potts in view of Hayashi also discloses the method according to claim 1, wherein adjusting the photographing parameter and/or the sound pickup parameter further includes: photographing the target object based on a photographing mode selected by the user (column 6, lines 60-64), and, in the photographing mode, adjusting the photographing parameter in real time according to the position/orientation information of the target object and the sound source position/orientation information (column 6, lines 24-33 — at least adjusting the photographing parameter in real time).
Claims 10-11, 18-19, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Potts and Hayashi as applied to claims 1, 3-9, 12, 14-17, and 20-21 above, and further in view of Zad Issa et al. (US 2015/0054943 A1 – hereinafter Zad Issa).
Regarding claim 10, see the teachings of Potts and Hayashi as discussed in claim 1 above. However, Potts and Hayashi do not disclose adjusting the photographing parameter and the sound pickup parameter includes: adjusting the sound pickup parameter according to the sound source position/orientation information to focus picked-up audio picked up by the sound pickup device on the target object; and in response to the position/orientation of the target object changing, adjusting the sound pickup parameter based on changed position/orientation information of the target object to refocus the picked-up audio on the target object.
Zad Issa discloses adjusting a photographing parameter and a sound pickup parameter includes: adjusting the sound pickup parameter according to a sound source position/orientation information to focus picked-up audio picked up by a sound pickup device on a target object; and in response to the position/orientation of the target object changing, adjusting the sound pickup parameter based on changed position/orientation information of the target object to refocus the picked-up audio on the target object ([0027]; [0064]-[0065]; [0069] – adjusting sound pickup parameter according to the location of the sound source, and in response to the sound source moves, i.e. changing position/orientation, the sound pickup device refocus the picked up audio on the object by readjusting audio focus region).
One of ordinary skill in the art before the effective filing date of the claimed invention would have been motivated to incorporate the teachings of Zad Issa into the adjusting step taught by Potts and Hayashi above in order to enhance audio coming from the direction of the speaker.
Regarding claim 11, Zad Issa in view of Potts and Hayashi also disclose the method according to claim 10, wherein adjusting the sound pickup parameter based on the changed position/orientation information of the target object to refocus the picked- up audio on the target object in response to the position/orientation information of the target object changing is performed ([0027]; [0064]-[0065]; [0069] – adjusting sound pickup parameter according to the location of the sound source, and in response to the sound source moves, i.e. changing position/orientation, the sound pickup device refocus the picked up audio on the object by readjusting audio focus region) in response to at least one of following conditions is met: at least one microphone of the sound pickup device is unavailable, and an amplitude of background noise is greater than a preset amplitude threshold ([0109]-[0113] – in response to noisy environment, thus indicating the noise level in the environment must be greater than a minimum amplitude threshold).
The motivation for incorporating the teachings of Zad Issa into the method has been discussed in claim 10 above.
Claim 18 is rejected for the same reason as discussed in claim 10 above.
Claim 19 is rejected for the same reason as discussed in claim 11 above.
Regarding claim 22, Zad Issa in view of Potts and Hayashi also disclose the method according to claim 10, wherein adjusting the sound pickup parameter based on the changed position/orientation information of the target object to refocus the picked- up audio on the target object in response to the position/orientation information of the target object changing is performed ([0027]; [0064]-[0065]; [0069] – adjusting sound pickup parameter according to the location of the sound source, and in response to the sound source moves, i.e. changing position/orientation, the sound pickup device refocus the picked up audio on the object by readjusting audio focus region) in response to an amplitude of background noise is greater than a preset amplitude threshold ([0109]-[0113] – in response to noisy environment, thus indicating the noise level in the environment must be greater than a minimum amplitude threshold).
The motivation for incorporating the teachings of Zad Issa into the method has been discussed in claim 10 above.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG Q DANG whose telephone number is (571)270-1116. The examiner can normally be reached IFT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Thai Q Tran can be reached at 571-272-7382. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/HUNG Q DANG/Primary Examiner, Art Unit 2484