DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
All objections/rejections not mentioned in this Office Action have been withdrawn by the Examiner.
Response to Amendments
Applicant’s amendment filed on 30 October 2025 has been entered.
In view of the amendment to the claim(s), the amendment of claim(s) 1, 3-5, 9, 11, 13, 15, and 20 have been acknowledged and entered.
After entry of the amendments, claims 1-9, 11-18, and 20 remain pending.
In view of the amendment of claim(s) 1-9, 11-18, and 20, the rejection of claims 1-9, 11-18, and 20 under 35 U.S.C. §102 and 35 U.S.C. §103 is withdrawn.
In light of the amended claims, new grounds for rejection under 35 U.S.C. §102 and 35 U.S.C. §103 are provided in the response below.
Response to Arguments
Applicant’s arguments regarding the prior art rejections under 35 U.S.C. §102 and 35 U.S.C. §103, see pages 9-12 of the Response to Non-Final Office Action dated 30 July 2025, which was received on 30 October 2025 (hereinafter Response and Office Action, respectively), have been fully considered.
With respect to the rejection(s) of claim(s) 1, 11, and 20 under 35 U.S.C. 102(a)(1) and 35 U.S.C. 102(a)(2) as anticipated by Yang (U.S. Pat. No. 11683634, hereinafter Yang), applicant asserts that Yang fails to teach, at least, (1) transmitting, by a local device in a local environment, media content to a remote device, while the local device operates in a shared playback mode with the remote device such that the remote device plays back the media content in a remote environment of the remote device and the media content is played back by a playback device in the local environment; and transmitting, by the local device, the output audio signal [which is generated from a received input audio signal that is captured by a microphone of the local device] to the remote device for playback” and (2) “determining a target audio area surrounding the local device using the side information” where the side information “is determined at least partially on an input audio signal” as taught in claims 1, 11, and 20 as amended. Applicant’s arguments are addressed individually below.
Regarding the first argument, this argument is not persuasive. Yang discloses all indicated limitations. Specifically, Yang discloses transmitting, by a local device in a local environment, media content to a remote device. Yang discloses “an artificial reality system” which “may include completely generated content or generated content combined with captured (e.g., real-world) content” where “the artificial reality content may be implemented on” a “hardware platform capable of providing artificial reality content to one or more viewers.” (Yang, Col. 3, lines 46-53; Col. 7, line 64-Col. 8, line 14.). An artificial reality environment shared between multiple viewers, is transmitted between all local devices (where local device and local environment is from the perspective of the local user), the local user is in their local environment with the local device, and the “artificial reality content” is “generated content combined with captured (e.g., real-world) content” which can “include video, audio, haptic feedback, or some combination thereof” to create the media content which is shared with the remaining viewers of the “one or more viewers” who are remote with respect to each individual local user/local device. (Yang, Col. 3, lines 27-52).
Yang further describes performing the above functions while the local device operates in a shared playback mode with the remote device. As explained in Yang, the “hardware platform”, as integrated into the first headset {local device}, can share the “artificial reality content to one or more viewers {shared playback mode}”, thus the viewers, including the first viewer through the first headset {a local device} and a second viewer through the second headset {remote device}, each receive the content at their respective headset. In the context of shared virtual reality/augmented reality, the users each receive the same “artificial reality content”, which includes the incorporated “captured (e.g., real-world) content,” as delivered based on local device in the local environment of the user. Further, as local is a relative term to each user, each of the “one or more users” are understood to be contributing to the captured content, as received from the perspective of that user, where each user is a local user from their own perspective and each local user has a local device in a local environment. (Yang, Col. 3, lines 27-52). Thus, the local device (e.g., the VR headset serving as the local device for the local user) is playing the same media content (e.g., the shared virtual experience). (Id.) The distinction of each user being a local user from their own perspective, and remote from the perspective of other users, is applied throughout the described rejection without further explicit recitation.
Yang further describes “such that the remote device plays back the media content in a remote environment of the remote device and the media content is played back by a playback device in the local environment.” As previously discussed, the media content is the artificial reality content, which Yang explicitly recites may be provided to the “one or more viewers” through “various platforms, including a wearable device (e.g., headset) connected to a host computer system, a standalone wearable device (e.g., headset), a mobile device or computing system, or any other hardware platform.” (Yang, Col. 3, lines 27-52). By providing the artificial reality content, including “generated content combined with captured (e.g., real-world) content,” to a first viewer, the artificial reality content {media content} is played back by the “various platform {playback device}” which is referred to, for clarity, as a headset, in the environment corresponding to the local environment of the local user. Further, as previously discussed, the artificial reality content is provided to each of the “one or more viewers”, where the phrase “more viewers” necessarily includes two (2) or more viewers, and with consideration of local and remote viewers, the second viewer and/or third viewer can be remote with respect to the first viewer. Yang further considers multiple headsets in the same local area, as sound sources can be defined based on “a person wearing a headset” and the model of the local area is further defined based on “each person… wearing a headset in the local area”. (Yang, Col. 8, lines 29-56). The second viewer would receive the same artificial reality content at a remote headset {remote device} in a remote environment from the perspective of the local user/device/environment, as received by the first viewer. The artificial reality content received by the third viewer through the third headset, being in the local area of the first viewer, would be a local nonstationary interference source from the perspective of the first headset of the first viewer.
Yang further teaches “and transmitting, by the local device, the output audio signal [which is generated from a received input audio signal that is captured by a microphone of the local device] to the remote device for playback.” As explained in Yang, the “generated content” can be “combined with captured (e.g., real-world) content” including “video, audio, haptic feedback, or some combination thereof” which is presented to the viewers. (Yang, Col. 3, lines 27-41). The incorporated “captured (e.g., real-world) content” is understood to include any or all captured content described as being content captured from the real world. As such, captured content includes “capture[d] visual information for the local area surrounding the headset” and “microphone array [which] detects sounds within the local area of the headset 100” to “capture sounds emitted from one or more real-world sound sources in the local area (e.g., a room),” Such as those emitted by the third headset based on receipt of the artificial reality content at the third headset producing audio received by the microphone array of the first headset, where this received audio would then be incorporated into the artificial reality content propagated to all headsets. (Yang, Col. 4, lines 64-66; Col. 6, lines 49-53). When read in combination, Yang describes providing, to “one or more viewers,” “generated [video and audio]” which is “combined with captured (e.g., real-world) [video and audio]” where the captured audio is captured by the “microphone array” of the “headset 100” from “sounds within the local area of the headset 100” where the sounds are “emitted from one or more real-world sound sources in the local area.” Further, as each of the users have a local device with respect to themselves, each of the users contribute to the “captured (e.g., real-world) [video and audio]” which is “combined with” the “generated [video and audio]” to generate the artificial reality content which is then provided to each of the “one or more viewers.” The media content includes the captured real world audio content “which may be presented in a single channel or in multiple channels,” which is understood to mean that the captured audio can be included, either in an integrated fashion (e.g., a single audio channel incorporating elements of generated audio and captured audio) or delivered together but separately (e.g., one or more first audio channels delivering generated audio content and one or more channels delivering captured audio content) as components of the shared artificial reality content. Therefore, the first argument is not persuasive and the rejection is maintained in light of the above explanation and the explanations provided in the rejection below.
Regarding the second argument, applicants arguments as applied to the specific cited embodiment of Yang are persuasive. As it was not relevant to the claims as previously presented, the further embodiments of Yang which incorporate audio information in the determination of the local area were not cited. However, as presented in the arguments below, Yang is not limited solely to establishing the local area based on visual data. As explained in Yang, the system can determine locational characteristics (e.g., “Reverberation interference may be present during indoor applications”, etc.) and acoustic parameters (e.g., reverberation time) from the captured sound, where “acoustic parameters describe one or more acoustic properties (e.g., room impulse response, a reverberation time, a reverberation level, etc.) of the local area,” which is based on the input audio signal, and each of which would be “side information” within the meaning of the instant application. (Yang, Col. 24, lines 17-44). As such, in light of the amended claims, the rejections of claims 1, 11, and 20 under 35 U.S.C. 102(a)(1) and 35 U.S.C. 102(a)(2), as previously presented, are withdrawn.
Applicant further argues that the rejection(s) of dependent claims 2-9 and 12-18 should be withdrawn for at least the same reasons as independent claims 1, 11, and 20. Applicant’s arguments in light of the amended claims are persuasive. As such, the rejections of claims 2-9 and 12-18 under 35 U.S.C. §102 and 35 U.S.C. §103 are withdrawn.
However, upon further consideration, new ground(s) of rejection under 35 U.S.C. §103 are made in light of combinations of Yang, McElveen (U.S. Pat. App. Pub. No. 2021/0092548, hereinafter McElveen), Zheng (U.S. Pat. App. Pub. No. 2021/0005214, hereinafter Zheng), and Guo (CN103580704A, hereinafter Guo), Holman (U.S. Pat. App. Pub. No. 2014/0003626, hereinafter Holman) and Paxinos (U.S. Pat. App. Pub. No. 2020/0045095, hereinafter Paxinos).
The Applicant has not provided any further statement and therefore, the Examiner directs the Applicant to the below rationale.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.
(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.
Claim(s) 1, 11, and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yang.
Regarding claim 1, Yang discloses A method of suppressing audio interference (Systems and methods described with reference to the “artificial reality system” as implemented through a “headset 100.” Of note, though described for convenience in the context of a headset, Yang expressly considers implementation in “a video calling device, a digital assistant device, a computer, a laptop, a mobile phone, a wearable device (e.g., a hearing aid, a smartwatch, etc.), and any other suitable electronic device” including distribution of embodiments among a plurality of the above.; Yang, ¶ Col. 3, lines 46-53; Col. 7, line 64-Col. 8, line 14.), the method comprising: transmitting, by a local device in a local environment, media content to a remote device (Discloses “an artificial reality system” which “may include completely generated content or generated content combined with captured (e.g., real-world) content,” the “generated content” being “combined with captured (e.g., real-world) content” including “video, audio, haptic feedback, or some combination thereof” which is presented to the viewers, where “the artificial reality content may be implemented on” a “hardware platform capable of providing artificial reality content to one or more viewers.” One or more viewers includes three viewers, each of which may have “a headset 100” (thus a first headset, a second headset, and a third headset), which “may be worn on the face of a user such that content (e.g., media content) is presented using a display assembly and/or an audio system.” Further, the “hardware platform” may be implemented locally (e.g., as part of the first headset).; Yang, ¶ Col. 3, lines 27-63, Col. 4, lines 64-66; Col. 6, lines 49-53), while the local device operates in a shared playback mode with the remote device (The “hardware platform”, as integrated into the first headset {local device}, can share the “artificial reality content to one or more viewers {shared playback mode}”, thus the viewers, including the first viewer through the first headset {a local device} and a second viewer through the second headset {remote device}, each receive the content at their respective headset.; Yang, ¶ Col. 3, lines 27-63) such that the remote device plays back the media content in a remote environment of the remote device (The hardware platform can deliver the content to the second headset {remote device} in the environment of said second headset {remote environment}. The second headset is not restricted to a specific location, and can be remote from the first headset.; Yang, ¶ Col. 3, lines 27-63) and the media content is played back by a playback device in the local environment (The system can “update the model of the local area” based on each “person wearing a headset in the local area.” As such, the content delivered to “one or more viewers” can include a third headset {playback device} in the local environment of said first headset. As local in local environment is a comparative term, the environment of the first headset and the third headset may be understood as the local environment by comparison to the second device being in the remote environment.; Yang, ¶ Col. 3, lines 27-63; Col. 6, lines 1-3); receiving, by the local device, an input audio signal captured by at least one microphone of the local device (The first headset {local device}, as with all other headsets, can include a microphone array, where “The microphone array 210 may capture sounds emitted by one or more real-world sound sources (including a user of the electronic device) within the local area,” and the “local area is the area surrounding the headset 100,” where the system “may capture sounds emitted by one or more real-world sound sources (including a user of the electronic device) within the local area. The captured sounds may include a plurality of interferences of different types” and “The captured sounds may include a plurality of interferences of different types” such as “nonstationary interference” which is “any sound with an intensity, spectrum shape, mean, variance, or other characteristic that changes over time” including “a plurality of people talking in the local area, phones ringing in the local area, music playing in the local area, televisions playing in the local area, airplane flying overhead in the local area, car horns honking in the local area, etc.”; Yang, ¶ Col. 6, lines 39-50; Col. 4, lines 37-41; Col. 8, lines 29-56), the input audio signal including speech of a user of the local device and the media content played back by the playback device (“The microphones 180 capture sounds emitted from one or more real-world sound sources in the local area (e.g., a room). The sounds may include a plurality of interferences of different types (e.g., a wind interference, an echo interference, a stationary interference, etc.).” and a “particular sound (e.g., speech emitted by the user)”. In a shared artificial reality context, the first viewer at the first headset {local device} in a local environment with respect to the first viewer, captures speech of the third viewer (e.g., the third viewer communicating with the first viewer and the second viewer as part of the artificial reality content). As the third viewer and the third headset may be in the local environment of the first viewer and the first headset, the input audio received at the first headset would include speech from the first viewer {a user} of the first headset {the local device} and audio from the artificial reality content {media content} produced from the third headset {the playback device} in the same local environment.; Yang, ¶ Col. 6, lines 49-56; Col. 16, line 4-6); determining side information that is associated with an environment of the local device and the playback device (The system through the “microphone array 210 may capture sounds emitted by one or more real-world sound sources (including a user of the electronic device) within the local area {associated with an environment of the local device and the playback device}” including “a plurality of interferences of different types” such as “a wind interference, an echo interference, a reverberation interference, a stationary interference, and a nonstationary interference” and further the system determines locational characteristics (e.g., “Reverberation interference may be present during indoor applications”, etc.) and acoustic parameters (e.g., reverberation time) from the captured sound, where “acoustic parameters describe one or more acoustic properties (e.g., room impulse response, a reverberation time, a reverberation level, etc.) of the local area,” which is based on the input audio signal, and each of which would be “side information” within the meaning of the instant application.; Yang, ¶ Col. 5, lines 35-56; Col. 8, line 29-62; Col. 24, lines 17-44) based at least partially on the input audio signal (The plurality of interferences and related local area attributes are derived from the input audio signal, which are incorporated into the model of the local area, thus the model of the local area is based at least partially on the input audio signal.; Yang, ¶ Col. 8, line 29-62): determining, by the local device, a target audio area surrounding the local device using the side information (“the headset 100 may include one or more imaging devices 130” to determine “the local area surrounding the headset 100,” where “[t]he microphone array detects sounds within the local area of the headset 100,” where the system updates the model of the local area “as one or more sound sources change position {based on the input audio signal}” based on positional information and where the “model of the local area” can further include “acoustic parameters (e.g., reverberation time) that describe acoustic properties of the local area and a location characteristic (e.g., indoor or outdoor) that describes the local area,” where describing acoustic parameters of the local area and determining location characteristics based on the audio (e.g., “room impulse response, a reverberation time, a reverberation level, etc.” of the local area) is part of determining the local area {target audio area}.; Yang, ¶ Col. 4, lines 64-66; Col. 5, lines 44-50; Col. 6, lines 49-50; Col. 10, lines 3-23; Col. 24, lines 17-44), the target audio area distinguishing between the user of the local device estimated to be within the target audio area and the playback device estimated to be outside of the target audio area (“The coefficient generation module 260 may perform an additional pre-processing operation which includes separating the audio signal (or each of the M-band audio signals) into two separate signals (e.g., a first separate audio signal and a second separate audio signal) according to direction of arrival estimates determined by the DOA estimation module 245 and the model of the local area” where the system separates “a particular sound (e.g., speech emitted by the user) and near-by interferences... into the first separate audio signal”, being the first viewer {the user} which is “located within a threshold distance of the user {...estimated to be within the target audio area}” and “interferences {a source of the interference signal} emitted by various sound sources located beyond a threshold distance of the user {...estimated to be outside of the target audio area} are separated from the audio signal into the second separate audio signal”; Yang, ¶ Col. 15, line 65-Col. 16, line 16); generating, by the local device, an output audio signal from the input audio signal according to the target audio area such that the output audio signal preserves the speech of the user within the target audio area and suppresses the media content played back by the playback device that is outside the target audio area (“The coefficient generation module 260 generates an attenuation coefficient for each of the plurality of interferences (e.g., of different types)... based on the location characteristic of the local area stored in the model of the local area” and “The sound filter module 265 may suppress the plurality of interferences from the audio signal by multiplying the final attenuation coefficient with the audio signal {suppresses the media content played back by the playback device that is outside the target audio area}” where the “particular sound” is not an interference and, as such, is not suppressed by the attenuation coefficient, where the particular sound is within the local area {preserves the speech from the user within the target audio area}.; Yang, ¶ Col. 16, lines 5-8 and 17-24; Col. 18, lines 31-41); and transmitting, by the local device, the output audio signal to the remote device for playback (“The artificial reality system” can include a “hardware platform capable of providing artificial reality content to one or more viewers” where the “Artificial reality content may include... generated content combined with captured (e.g., real-world) content” and the captured content can include “video, audio, haptic feedback, or some combination thereof” and where the “audio content” is captured by the user via the “microphone array” to “capture sounds emitted from one or more real-world sound sources in the local area.” In embodiments with two viewers, each viewer having a headset, each headset is understood to perform the same functions described in Yang including capture of audio and the respective audio processing steps described previously for capture of audio for their respective local areas. This creates a first audio signal from the first headset {local device} at the first viewer, in a local environment to the first viewer {the local environment}, and a second audio signal from the second headset at the second viewer in a local environment to the second viewer {the remote environment}. The first audio signal and the second audio signal are then combined into the artificial reality content as part of the “captured (e.g., real-world) content” which is then “provid[ed]...to one or more viewers”. As such, the first audio signal {output audio signal}, the second audio signal as captured by the second headset, and any further content as captured and processed at any further headsets, can be provided {transmitted} by the hardware platform {local device} to both the first headset {playback device} and the second headset {remote device}.; Yang, ¶ Col. 3, lines 28-52; Col. 6, lines 49-56).
Regarding claim 11, Yang discloses A local device comprising (Systems and methods described with reference to the “artificial reality system” as implemented through a “headset 100.” Of note, though described for convenience in the context of a headset, Yang expressly considers implementation in “a video calling device, a digital assistant device, a computer, a laptop, a mobile phone, a wearable device (e.g., a hearing aid, a smartwatch, etc.), and any other suitable electronic device” including distribution of embodiments among a plurality of the above.; Yang, ¶ Col. 3, lines 46-53; Col. 7, line 64-Col. 8, line 14.): at least one microphone (The headset 100 includes a microphone array; Yang, ¶ Col. 6, lines 39-41); at least one processor (The headset further includes an audio controller, where “The audio controller 150 may comprise a processor”; Yang, ¶ Col. 6, lines 39-41; Col. 7, lines 22-23); and a memory coupled to the at least one processor to store instructions, which when executed by the at least one processor, cause the local device to (“Any of the steps, operations, or processes described herein may be performed or implemented with one or more hardware or software modules, alone or in combination with other devices. In one embodiment, a software module is implemented with a computer program product comprising a computer-readable medium containing computer program code, which can be executed by a computer processor for performing any or all the steps, operations, or processes described.”; Yang, ¶ Col. 25, lines 54-62): transmit media content to a remote device (Discloses “an artificial reality system” which “may include completely generated content or generated content combined with captured (e.g., real-world) content,” the “generated content” being “combined with captured (e.g., real-world) content” including “video, audio, haptic feedback, or some combination thereof” which is presented to the viewers, where “the artificial reality content may be implemented on” a “hardware platform capable of providing artificial reality content to one or more viewers.” One or more viewers includes three viewers, each of which may have “a headset 100” (thus a first headset, a second headset, and a third headset), which “may be worn on the face of a user such that content (e.g., media content) is presented using a display assembly and/or an audio system.” Further, the “hardware platform” may be implemented locally (e.g., as part of the first headset).; Yang, ¶ Col. 3, lines 27-63, Col. 4, lines 64-66; Col. 6, lines 49-53), while the local device operates in a shared playback mode with the remote device (The “hardware platform”, as integrated into the first headset {local device}, can share the “artificial reality content to one or more viewers {shared playback mode}”, thus the viewers, including the first viewer through the first headset {a local device} and a second viewer through the second headset {remote device}, each receive the content at their respective headset.; Yang, ¶ Col. 3, lines 27-63) such that the remote device plays back the media content in a remote environment of the remote device (The hardware platform can deliver the content to the second headset {remote device} in the environment of said second headset {remote environment}. The second headset is not restricted to a specific location, and can be remote from the first headset.; Yang, ¶ Col. 3, lines 27-63) and the media content is played back by a playback device in a local environment of the local device (The system can “update the model of the local area” based on each “person wearing a headset in the local area.” As such, the content delivered to “one or more viewers” can include a third headset {playback device} in the local environment of said first headset. As local in local environment is a comparative term, the environment of the first headset and the third headset may be understood as the local environment by comparison to the second device being in the remote environment.; Yang, ¶ Col. 3, lines 27-63; Col. 6, lines 1-3); receive an input audio signal captured by the at least one microphone (The first headset {local device}, as with all other headsets, can include a microphone array, where “The microphone array 210 may capture sounds emitted by one or more real-world sound sources (including a user of the electronic device) within the local area,” and the “local area is the area surrounding the headset 100,” where the system “may capture sounds emitted by one or more real-world sound sources (including a user of the electronic device) within the local area. The captured sounds may include a plurality of interferences of different types” and “The captured sounds may include a plurality of interferences of different types” such as “nonstationary interference” which is “any sound with an intensity, spectrum shape, mean, variance, or other characteristic that changes over time” including “a plurality of people talking in the local area, phones ringing in the local area, music playing in the local area, televisions playing in the local area, airplane flying overhead in the local area, car horns honking in the local area, etc.”; Yang, ¶ Col. 6, lines 39-50; Col. 4, lines 37-41; Col. 8, lines 29-56), the input audio signal includes speech of a user of the local device and the media content played back by the playback device (“The microphones 180 capture sounds emitted from one or more real-world sound sources in the local area (e.g., a room). The sounds may include a plurality of interferences of different types (e.g., a wind interference, an echo interference, a stationary interference, etc.).” and a “particular sound (e.g., speech emitted by the user)”. In a shared artificial reality context, the first viewer at the first headset {local device} in a local environment with respect to the first viewer, captures speech of the third viewer (e.g., the third viewer communicating with the first viewer and the second viewer as part of the artificial reality content). As the third viewer and the third headset may be in the local environment of the first viewer and the first headset, the input audio received at the first headset would include speech from the first viewer {a user} of the first headset {the local device} and audio from the artificial reality content {media content} produced from the third headset {the playback device} in the same local environment.; Yang, ¶ Col. 6, lines 49-56; Col. 16, line 4-6); determining side information that is associated with an environment of the local device and the playback device (The system through the “microphone array 210 may capture sounds emitted by one or more real-world sound sources (including a user of the electronic device) within the local area {associated with an environment of the local device and the playback device}” including “a plurality of interferences of different types” such as “a wind interference, an echo interference, a reverberation interference, a stationary interference, and a nonstationary interference” and further “determine[s] positional information about one or more sound sources in the local area (i.e. where a sound source is located within the local area)” each of which further define the local area (e.g., “Reverberation interference may be present during indoor applications”, etc.) and each of which would be “side information” within the meaning of the instant application.; Yang, ¶ Col. 5, lines 35-56; Col. 8, line 29-62) based at least partially on the input audio signal (The plurality of interferences and related local area attributes are derived from the input audio signal.; Yang, ¶ Col. 8, line 29-62); determine a target audio area surrounding the local device using the side information, (“the headset 100 may include one or more imaging devices 130” to determine “the local area surrounding the headset 100,” where “[t]he microphone array detects sounds within the local area of the headset 100,” where the system updates the model of the local area “as one or more sound sources change position {based on the input audio signal}” based on positional information and where the local area can further be defined by “acoustic parameters (e.g., reverberation time) that describe acoustic properties of the local area and a location characteristic (e.g., indoor or outdoor) that describes the local area,” where describing acoustic parameters of the local area and determining location characteristics based on the audio (e.g., “room impulse response, a reverberation time, a reverberation level, etc.” of the local area) is part of determining the local area {target audio area}.; Yang, ¶ Col. 4, lines 64-66; Col. 5, lines 44-50; Col. 6, lines 49-50; Col. 10, lines 3-23; Col. 24, lines 37-40) wherein the target audio area distinguishes between the user of the local device estimated to be within the target audio area and the playback device estimated to be outside of the target audio area (“The coefficient generation module 260 may perform an additional pre-processing operation which includes separating the audio signal (or each of the M-band audio signals) into two separate signals (e.g., a first separate audio signal and a second separate audio signal) according to direction of arrival estimates determined by the DOA estimation module 245 and the model of the local area” where the system separates “a particular sound (e.g., speech emitted by the user) and near-by interferences... into the first separate audio signal”, being the first viewer {the user} which is “located within a threshold distance of the user {...estimated to be within the target audio area}” and “interferences {a source of the interference signal} emitted by various sound sources located beyond a threshold distance of the user {...estimated to be outside of the target audio area} are separated from the audio signal into the second separate audio signal”; Yang, ¶ Col. 15, line 65-Col. 16, line 16); generate an output audio signal from the input audio signal according to the target audio area such that the output audio signal preserves the speech of the user within the target audio area and suppresses the media content played back by the playback device that is outside the target audio area (“The coefficient generation module 260 generates an attenuation coefficient for each of the plurality of interferences (e.g., of different types)... based on the location characteristic of the local area stored in the model of the local area” and “The sound filter module 265 may suppress the plurality of interferences from the audio signal by multiplying the final attenuation coefficient with the audio signal {suppresses the interference signal outside the target audio area}” where the “particular sound” is not an interference and, as such, is not suppressed by the attenuation coefficient, where the particular sound is within the local area {preserves the target signal within the target audio area}.; Yang, ¶ Col. 16, lines 5-8 and 17-24; Col. 18, lines 31-41); and transmit the output audio signal to the remote device for playback (“The artificial reality system” can include a “hardware platform capable of providing artificial reality content to one or more viewers” where the “Artificial reality content may include... generated content combined with captured (e.g., real-world) content” and the captured content can include “video, audio, haptic feedback, or some combination thereof” and where the “audio content” is captured by the user via the “microphone array” to “capture sounds emitted from one or more real-world sound sources in the local area.” In embodiments with two viewers, each viewer having a headset, each headset is understood to perform the same functions described in Yang including capture of audio and the respective audio processing steps described previously for capture of audio for their respective local areas. This creates a first audio signal from the first headset {local device} at the first viewer, in a local environment to the first viewer {the local environment}, and a second audio signal from the second headset at the second viewer in a local environment to the second viewer {the remote environment}. The first audio signal and the second audio signal are then combined into the artificial reality content as part of the “captured (e.g., real-world) content” which is then “provid[ed]...to one or more viewers”. As such, the first audio signal {output audio signal}, the second audio signal as captured by the second headset, and any further content as captured and processed at any further headsets, can be provided {transmitted} by the hardware platform {local device} to both the first headset {playback device} and the second headset {remote device}.; Yang, ¶ Col. 3, lines 28-52; Col. 6, lines 49-56).
Regarding claim 20, Yang discloses A non-transitory computer-readable medium having instructions stored therein, which when executed by at least one processor of a local device, causes the local device to (Systems and methods described with reference to the “artificial reality system” as implemented through a “headset 100.” Of note, though described for convenience in the context of a headset, Yang expressly considers implementation in “a video calling device, a digital assistant device, a computer, a laptop, a mobile phone, a wearable device (e.g., a hearing aid, a smartwatch, etc.), and any other suitable electronic device” including distribution of embodiments among a plurality of the above.; Yang, ¶ Col. 3, lines 46-53; Col. 7, line 64-Col. 8, line 14.): transmit media content to a remote device (Discloses “an artificial reality system” which “may include completely generated content or generated content combined with captured (e.g., real-world) content,” the “generated content” being “combined with captured (e.g., real-world) content” including “video, audio, haptic feedback, or some combination thereof” which is presented to the viewers, where “the artificial reality content may be implemented on” a “hardware platform capable of providing artificial reality content to one or more viewers.” One or more viewers includes three viewers, each of which may have “a headset 100” (thus a first headset, a second headset, and a third headset), which “may be worn on the face of a user such that content (e.g., media content) is presented using a display assembly and/or an audio system.” Further, the “hardware platform” may be implemented locally (e.g., as part of the first headset).; Yang, ¶ Col. 3, lines 27-63, Col. 4, lines 64-66; Col. 6, lines 49-53), while the local device operates in a shared playback mode with the remote device (The “hardware platform”, as integrated into the first headset {local device}, can share the “artificial reality content to one or more viewers {shared playback mode}”, thus the viewers, including the first viewer through the first headset {a local device} and a second viewer through the second headset {remote device}, each receive the content at their respective headset.; Yang, ¶ Col. 3, lines 27-63) such that the remote device plays back the media content in a remote environment of the remote device (The hardware platform can deliver the content to the second headset {remote device} in the environment of said second headset {remote environment}. The second headset is not restricted to a specific location, and can be remote from the first headset.; Yang, ¶ Col. 3, lines 27-63) and the media content is played back by a playback device in a local environment of the local device (The system can “update the model of the local area” based on each “person wearing a headset in the local area.” As such, the content delivered to “one or more viewers” can include a third headset {playback device} in the local environment of said first headset. As local in local environment is a comparative term, the environment of the first headset and the third headset may be understood as the local environment by comparison to the second device being in the remote environment.; Yang, ¶ Col. 3, lines 27-63; Col. 6, lines 1-3); receive an input audio signal captured by at least one microphone, (The first headset {local device}, as with all other headsets, can include a microphone array, where “The microphone array 210 may capture sounds emitted by one or more real-world sound sources (including a user of the electronic device) within the local area,” and the “local area is the area surrounding the headset 100,” where the system “may capture sounds emitted by one or more real-world sound sources (including a user of the electronic device) within the local area. The captured sounds may include a plurality of interferences of different types” and “The captured sounds may include a plurality of interferences of different types” such as “nonstationary interference” which is “any sound with an intensity, spectrum shape, mean, variance, or other characteristic that changes over time” including “a plurality of people talking in the local area, phones ringing in the local area, music playing in the local area, televisions playing in the local area, airplane flying overhead in the local area, car horns honking in the local area, etc.”; Yang, ¶ Col. 6, lines 39-50; Col. 4, lines 37-41; Col. 8, lines 29-56) wherein the input audio signal includes speech of a user of the local device and the media content played back by the playback device (“The microphones 180 capture sounds emitted from one or more real-world sound sources in the local area (e.g., a room). The sounds may include a plurality of interferences of different types (e.g., a wind interference, an echo interference, a stationary interference, etc.).” and a “particular sound (e.g., speech emitted by the user)”. In a shared artificial reality context, the first viewer at the first headset {local device} in a local environment with respect to the first viewer, captures speech of the third viewer (e.g., the third viewer communicating with the first viewer and the second viewer as part of the artificial reality content). As the third viewer and the third headset may be in the local environment of the first viewer and the first headset, the input audio received at the first headset would include speech from the first viewer {a user} of the first headset {the local device} and audio from the artificial reality content {media content} produced from the third headset {the playback device} in the same local environment.; Yang, ¶ Col. 6, lines 49-56; Col. 16, line 4-6); determining side information that is associated with an environment of the local device and the playback device (The system through the “microphone array 210 may capture sounds emitted by one or more real-world sound sources (including a user of the electronic device) within the local area {associated with an environment of the local device and the playback device}” including “a plurality of interferences of different types” such as “a wind interference, an echo interference, a reverberation interference, a stationary interference, and a nonstationary interference” and further “determine[s] positional information about one or more sound sources in the local area (i.e. where a sound source is located within the local area)” each of which further define the local area (e.g., “Reverberation interference may be present during indoor applications”, etc.) and each of which would be “side information” within the meaning of the instant application.; Yang, ¶ Col. 5, lines 35-56; Col. 8, line 29-62) based at least partially on the input audio signal (The plurality of interferences and related local area attributes are derived from the input audio signal.; Yang, ¶ Col. 8, line 29-62); determine a target audio area surrounding the local device using the side information (“the headset 100 may include one or more imaging devices 130” to determine “the local area surrounding the headset 100,” where “[t]he microphone array detects sounds within the local area of the headset 100,” where the system updates the model of the local area “as one or more sound sources change position {based on the input audio signal}” based on positional information and where the local area can further be defined by “acoustic parameters (e.g., reverberation time) that describe acoustic properties of the local area and a location characteristic (e.g., indoor or outdoor) that describes the local area,” where describing acoustic parameters of the local area and determining location characteristics based on the audio (e.g., “room impulse response, a reverberation time, a reverberation level, etc.” of the local area) is part of determining the local area {target audio area}.; Yang, ¶ Col. 4, lines 64-66; Col. 5, lines 44-50; Col. 6, lines 49-50; Col. 10, lines 3-23; Col. 24, lines 37-40), wherein the target audio area distinguishes between the user of the local device estimated to be within the target audio area and the playback device estimated to be outside of the target audio area (“The coefficient generation module 260 may perform an additional pre-processing operation which includes separating the audio signal (or each of the M-band audio signals) into two separate signals (e.g., a first separate audio signal and a second separate audio signal) according to direction of arrival estimates determined by the DOA estimation module 245 and the model of the local area” where the system separates “a particular sound (e.g., speech emitted by the user) and near-by interferences... into the first separate audio signal”, being the first viewer {the user} which is “located within a threshold distance of the user {...estimated to be within the target audio area}” and “interferences {a source of the interference signal} emitted by various sound sources located beyond a threshold distance of the user {...estimated to be outside of the target audio area} are separated from the audio signal into the second separate audio signal”; Yang, ¶ Col. 15, line 65-Col. 16, line 16); generate an output audio signal from the input audio signal according to the target audio area such that the speech of the user within the target audio area is preserved and the media content played back by the playback device that is outside the target audio area is suppressed (“The coefficient generation module 260 generates an attenuation coefficient for each of the plurality of interferences (e.g., of different types)... based on the location characteristic of the local area stored in the model of the local area” and “The sound filter module 265 may suppress the plurality of interferences from the audio signal by multiplying the final attenuation coefficient with the audio signal {suppresses the interference signal outside the target audio area}” where the “particular sound” is not an interference and, as such, is not suppressed by the attenuation coefficient, where the particular sound is within the local area {preserves the target signal within the target audio area}.; Yang, ¶ Col. 16, lines 5-8 and 17-24; Col. 18, lines 31-41); and transmit the output audio signal to the remote device for playback (“The artificial reality system” can include a “hardware platform capable of providing artificial reality content to one or more viewers” where the “Artificial reality content may include... generated content combined with captured (e.g., real-world) content” and the captured content can include “video, audio, haptic feedback, or some combination thereof” and where the “audio content” is captured by the user via the “microphone array” to “capture sounds emitted from one or more real-world sound sources in the local area.” In embodiments with two viewers, each viewer having a headset, each headset is understood to perform the same functions described in Yang including capture of audio and the respective audio processing steps described previously for capture of audio for their respective local areas. This creates a first audio signal from the first headset {local device} at the first viewer, in a local environment to the first viewer {the local environment}, and a second audio signal from the second headset at the second viewer in a local environment to the second viewer {the remote environment}. The first audio signal and the second audio signal are then combined into the artificial reality content as part of the “captured (e.g., real-world) content” which is then “provid[ed]...to one or more viewers”. As such, the first audio signal {output audio signal}, the second audio signal as captured by the second headset, and any further content as captured and processed at any further headsets, can be provided {transmitted} by the hardware platform {local device} to both the first headset {playback device} and the second headset {remote device}.; Yang, ¶ Col. 3, lines 28-52; Col. 6, lines 49-56).
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claim(s) 2, 6, 8, 16, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang as applied to claim(s) 1 and 11, and further in view of Zheng.
Regarding claim 2, the rejection of claim 1 is incorporated. Yang discloses all of the elements of the current invention as stated above. Yang further discloses Yang further discloses wherein the target audio area identifies a distance boundary (The “coefficient generation module 260” separates speech emitted by the user and interferences based on whether the sound source is located within or outside of “a threshold distance of the user”; Yang, ¶ Col. 16, lines 4-16). However, Yang fails to expressly recite wherein the target audio area identifies a distance boundary, and wherein the user is determined as closer to the local device than the distance boundary and the playback devices determined as farther away from the local device than the distance boundary.
Zheng teaches “a sound insulation method, a device, a system, an electronic device and a storage medium to realize intelligent sound insulation in audio calls and/or video calls.” (Zheng, ¶ [0026]). Regarding claim 2, Zheng teaches wherein the target audio area identifies a distance boundary (The system “zones the space into a public area 311 and a private area 312” where “In the public area 311, each speaker can participate in a video conference through a video calling device 320.” where the zoning establishes a distance boundary for each speaker (or audio source); Zheng, ¶ [0044]), and wherein the user is determined as closer to the local device than the distance boundary and the playback devices determined as farther away from the local device than the distance boundary (“location of each sound source relative to the video calling device 320 can be located according to the identified voice data” and the unintentionally collected audio data is “sound source... separated from the sound source collection module by a distance greater than or equivalent to a distance threshold”; Zheng, ¶ [0044], [0046]-[0047]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang to incorporate the teachings of Zheng to include wherein the target audio area identifies a distance boundary, and wherein the user is determined as closer to the local device than the distance boundary and the playback devices determined as farther away from the local device than the distance boundary. The system of Zheng zones “a public area and a private area …to send out the voice data in the public area and filter a private voice data in the private area, such that intelligent sound insulation can be performed, and the problems of total sound insulation and inconvenient operation of manual mute button can be resolved,” as recognized by Zheng. (Zheng, ¶ [0030]).
Regarding claim 6, the rejection of claim 1 is incorporated. Yang discloses all of the elements of the current invention as stated above. Yang further discloses wherein generating the output audio signal further comprises: generating a filtered audio signal that attenuates the media content based on the playback device estimated to be outside of the target audio area to reduce interference on the speech (“The coefficient generation module 260 generates an attenuation coefficient for each of the plurality of interferences (e.g., of different types)... based on the location characteristic of the local area stored in the model of the local area” and “The sound filter module 265 may suppress the plurality of interferences from the audio signal by multiplying the final attenuation coefficient with the audio signal {suppresses the media content played back by the playback device that is outside the target audio area}” where the “particular sound” is not an interference and, as such, is not suppressed by the attenuation coefficient, where the particular sound is within the local area {preserves the speech from the user within the target audio area}.; Yang, ¶ Col. 16, lines 5-8 and 17-24; Col. 18, lines 31-41); adjusting a gain of the filtered audio signal to generate a gain-adjusted output audio signal (“In some embodiments, the sound filter module 265 increases the count of combined attenuation coefficients by using a band gain for each bin inside that band.”; Yang, ¶ Col. 18, lines 27-30); determining an audio level threshold (“The interference detector module 240 may calculate the energy of the obtained low-band audio signal and the energy of the audio signal” and compute “the energy ratio between the energy of the low-band audio signal to the energy of the audio signal.”; Yang, ¶ Col. 10, lines 47-52). However, Yang fails to expressly recite suppressing the gain-adjusted output audio signal when an audio level of the gain- adjusted output audio signal falls below the audio level threshold; and preserving the gain-adjusted output audio signal when the audio level of the gain-adjusted output audio signal satisfies the audio level threshold.
The relevance of Zheng is described above with relation to claim 2. Regarding claim 6, Zheng teaches determining an audio level threshold (The system discloses a first audio level threshold.; Zheng, ¶ [0046]-[0047]); suppressing the gain-adjusted output audio signal when an audio level of the gain- adjusted output audio signal falls below the audio level threshold (“The sound source having an audio intensity less than or equivalent to a first intensity threshold and separated from the sound source collection module by a distance greater than or equivalent to a distance threshold is filtered”; Zheng, ¶ [0046]-[0047]); and preserving the gain-adjusted output audio signal when the audio level of the gain-adjusted output audio signal satisfies the audio level threshold (“only the normal speech and/or the loud speech,” which is speech greater than the first intensity threshold, “at a farther distance are transmitted”; Zheng, ¶ [0046]-[0047]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang, to incorporate the teachings of Zheng to include suppressing the gain-adjusted output audio signal when an audio level of the gain- adjusted output audio signal falls below the audio level threshold; and preserving the gain-adjusted output audio signal when the audio level of the gain-adjusted output audio signal satisfies the audio level threshold. The system of Zheng zones “a public area and a private area …to send out the voice data in the public area and filter a private voice data in the private area, such that intelligent sound insulation can be performed, and the problems of total sound insulation and inconvenient operation of manual mute button can be resolved,” as recognized by Zheng. (Zheng, ¶ [0030]).
Regarding claim 8, the rejection of claim 6 is incorporated. Yang discloses all of the elements of the current invention as stated above. However, Yang fails to expressly recite wherein determining the audio level threshold comprises: estimating a distance of the user or the playback device to adjust the audio level threshold; detecting a face of the user to adjust the audio level threshold when the input audio signal includes the speech; estimating an audio level of the speech or the media content to adjust the audio level threshold; or estimating acoustic characteristics of the local environment of the local device to adjust the audio level threshold.
The relevance of Zheng is described above with relation to claim 2. Regarding claim 8, Zheng teaches wherein determining the audio level threshold comprises: estimating a distance of the user or the playback device to adjust the audio level threshold; detecting a face of the user to adjust the audio level threshold when the input audio signal includes the speech; estimating an audio level of the speech or the media content to adjust the audio level threshold; or estimating acoustic characteristics of the local environment of the local device to adjust the audio level threshold (The system discloses two audio intensity thresholds, where “The sound source having an audio intensity less than or equivalent to a first intensity threshold and separated from the sound source collection module by a distance greater than or equivalent to a distance threshold is filtered, and/or the sound source having an audio intensity less than or equivalent to a second intensity threshold and separated from the sound source collection module which collects the voice data of a sound source by a distance less than or equivalent to a distance threshold is filtered, and only the normal speech and/or the loud speech at a farther distance are transmitted.”; Zheng, ¶ [0047]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang, to incorporate the teachings of Zheng to include wherein determining the audio level threshold comprises: estimating a distance of the user or the playback device to adjust the audio level threshold; detecting a face of the user to adjust the audio level threshold when the input audio signal includes the speech; estimating an audio level of the speech or the media content to adjust the audio level threshold; or estimating acoustic characteristics of the local environment of the local device to adjust the audio level threshold. The system of Zheng zones “a public area and a private area …to send out the voice data in the public area and filter a private voice data in the private area, such that intelligent sound insulation can be performed, and the problems of total sound insulation and inconvenient operation of manual mute button can be resolved,” as recognized by Zheng. (Zheng, ¶ [0030]).
Regarding claim 16, the rejection of claim 11 is incorporated. Claim 16 is substantially the same as claim 6 and is therefore rejected under the same rationale as above.
Regarding claim 18, the rejection of claim 16 is incorporated. Claim 18 is substantially the same as claim 8 and is therefore rejected under the same rationale as above.
Claims 3 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang as applied to claims 1 and 11 above, and further in view of McElveen and Zheng.
Regarding claim 3, the rejection of claim 1 is incorporated. Yang discloses all of the elements of the current invention as stated above. However, Yang fails to expressly recite wherein determining the side information comprises: receiving by a machine learning model an estimated distance of the user or the playback device from the local device to adjust the target audio area; receiving by the machine learning model a detected face of the user to adjust the target audio area when the user is speaking; receiving by the machine learning model an estimated audio level of the speech or the media content to adjust the target audio area; or receiving estimated acoustic characteristics of the local environment of the local device to adjust the target audio area.
McElveen teaches systems and methods for spatial audio array processing “to enable audio signals to be received from, or transmitted to, selected locations in an acoustic space.” (McElveen, ¶ [0002]). Regarding claim 3, McElveen teaches wherein determining the side information comprises: receiving by a machine learning model an estimated distance of the user or the playback device from the local device to adjust the target audio area (“an exemplary system and method according to the principles herein may process audio input data to calculate/estimate, and/or use one or more machine learning techniques to learn, an acoustic propagation model between a target location of a sound source relative to one or more array elements within an acoustic space”; McElveen, ¶ [0068]-[0069]); receiving by the machine learning model a detected face of user to adjust the target audio area when the user is speaking (“In certain embodiments where visual triggering is employed, a spatial audio processing system (e.g. as shown and described in FIG. 1) may include a video camera or motion sensor configured to identify activity or sound source location as a trigger for designating the audio segment.”; McElveen, ¶ [0097]); or receiving estimated acoustic characteristics of the local environment of the local device to adjust the target audio area (the system can further “accommodate for suboptimal acoustic propagation environments (e.g., large reflective surfaces, objects located between the target acoustic location and the transducers that interfere with the free-space propagation, and the like) by processing audio input data according to a data processing framework in which one or more boundary conditions are estimates within a Green’s Function algorithm to derive an acoustic propagation model for a target acoustic location.”; McElveen, ¶ [0068]-[0069]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang to incorporate the teachings of McElveen to include wherein determining the side information comprises: receiving by a machine learning model an estimated distance of the user or the playback device from the local device to adjust the target audio area; receiving by the machine learning model a detected face of the user to adjust the target audio area when the user is speaking; receiving by the machine learning model an estimated audio level of the speech or the media content to adjust the target audio area; or receiving estimated acoustic characteristics of the local environment of the local device to adjust the target audio area. The spatial audio processing systems of McElveen can process “audio input data” to determine “an acoustic propagation model between a target location of a sound source relative to one or more array elements within an acoustic space” which provides significant “signal-to-noise ratio (SNR) improvement compared to prior art beamforming techniques, while using far fewer transducers,” and which “does not require knowledge of the array configuration, location, or orientation for improving SNR, regardless of whether the array transducers are co-located or distributed around an acoustic space,” allowing for improvements to SNR with wider applicability to a broad variety of configurations, without necessitating significant research or planning based on device configuration and/or spatial pose, as recognized by McElveen. (McElveen, ¶ [0018]-[0019], [0068]). However, Yang and McElveen fail to expressly recite receiving by the machine learning model an estimated audio level of the target signal or the interference signal to adjust the target audio area.
The relevance of Zheng is described above with relation to claim 2. Regarding claim 3, Zheng teaches receiving by the machine learning model an estimated audio level of the speech or the media content to adjust the target audio area (“the present embodiment can recognize each speaker from the first video data in the collection space of the video calling device 320 using the facial recognition technology, and can further obtain the location of each speaker relative to the video calling device 320 according to the location of each speaker in the image of the first identified video data. According to the above embodiments, the location of each sound source relative to the video calling device 320 can be located according to the identified voice data.”; Zheng, ¶ [0044]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang, as modified by the spatial audio processing systems of McElveen, to incorporate the teachings of Zheng to include receiving by the machine learning model an estimated audio level of the target signal or the interference signal to adjust the target audio area. The system of Zheng zones “a public area and a private area …to send out the voice data in the public area and filter a private voice data in the private area, such that intelligent sound insulation can be performed, and the problems of total sound insulation and inconvenient operation of manual mute button can be resolved,” as recognized by Zheng. (Zheng, ¶ [0030]).
Regarding claim 13, the rejection of claim 11 is incorporated. Yang discloses all of the elements of the current invention as stated above. However, Yang fails to expressly recite wherein determining the target audio area comprises: receiving by a machine learning model an estimated distance of the source of the target signal or the source of the interference signal from the electronic device to adjust the target audio area; receiving by the machine learning model a detected face of a speaker of the target signal to adjust the target audio area when the target signal includes speech from the speaker; receiving by the machine learning model an estimated audio level of the target signal or the interference signal to adjust the target audio area; or receiving estimated acoustic characteristics of an environment of the electronic device to adjust the target audio area.
The relevance of McElveen is described above with relation to claim 3. Regarding claim 13, McElveen teaches wherein to determine the side information, the processor further executes the instructions to: receive an estimated distance of the source of the target signal or the source of the interference signal from the device to adjust the target audio area (“an exemplary system and method according to the principles herein may process audio input data to calculate/estimate, and/or use one or more machine learning techniques to learn, an acoustic propagation model between a target location of a sound source relative to one or more array elements within an acoustic space”; McElveen, ¶ [0068]-[0069]); receive a detected face of a speaker of the target signal to adjust the target audio area when the target signal includes speech from the speaker (“In certain embodiments where visual triggering is employed, a spatial audio processing system (e.g. as shown and described in FIG. 1) may include a video camera or motion sensor configured to identify activity or sound source location as a trigger for designating the audio segment.”; McElveen, ¶ [0097]); or receive estimated acoustic characteristics of an environment of the electronic device to adjust the target audio area (the system can further “accommodate for suboptimal acoustic propagation environments (e.g., large reflective surfaces, objects located between the target acoustic location and the transducers that interfere with the free-space propagation, and the like) by processing audio input data according to a data processing framework in which one or more boundary conditions are estimates within a Green’s Function algorithm to derive an acoustic propagation model for a target acoustic location.”; McElveen, ¶ [0068]-[0069]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang to incorporate the teachings of McElveen to include wherein to determine the target audio area, the processor further executes the instructions to: receive an estimated distance of the source of the target signal or the source of the interference signal from the device to adjust the target audio area; receive a detected face of a speaker of the target signal to adjust the target audio area when the target signal includes speech from the speaker; or receiving estimated acoustic characteristics of an environment of the electronic device to adjust the target audio area. The spatial audio processing systems of McElveen can process “audio input data” to determine “an acoustic propagation model between a target location of a sound source relative to one or more array elements within an acoustic space” which provides significant “signal-to-noise ratio (SNR) improvement compared to prior art beamforming techniques, while using far fewer transducers,” and which “does not require knowledge of the array configuration, location, or orientation for improving SNR, regardless of whether the array transducers are co-located or distributed around an acoustic space,” allowing for improvements to SNR with wider applicability to a broad variety of configurations, without necessitating significant research or planning based on device configuration and/or spatial pose, as recognized by McElveen. (McElveen, ¶ [0018]-[0019], [0068]). However, Yang and McElveen fail to expressly recite receive an estimated audio level of the target signal or the interference signal to adjust the target audio area.
The relevance of Zheng is described above with relation to claim 2. Regarding claim 13, Zheng teaches receive an estimated audio level of the target signal or the interference signal to adjust the target audio area (“the present embodiment can recognize each speaker from the first video data in the collection space of the video calling device 320 using the facial recognition technology, and can further obtain the location of each speaker relative to the video calling device 320 according to the location of each speaker in the image of the first identified video data. According to the above embodiments, the location of each sound source relative to the video calling device 320 can be located according to the identified voice data.”; Zheng, ¶ [0044]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang, as modified by the spatial audio processing systems of McElveen, to incorporate the teachings of Zheng to include receiving by the machine learning model an estimated audio level of the target signal or the interference signal to adjust the target audio area. The system of Zheng zones “a public area and a private area …to send out the voice data in the public area and filter a private voice data in the private area, such that intelligent sound insulation can be performed, and the problems of total sound insulation and inconvenient operation of manual mute button can be resolved,” as recognized by Zheng. (Zheng, ¶ [0030]).
Claims 4 and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang as applied to claim 1 above, and further in view of Paxinos.
Regarding claim 4, the rejection of claim 1 is incorporated. Yang discloses all of the elements of the current invention as stated above. Yang further discloses wherein the electronic device is a local device, (“the headset 100 {electronic device}” exists in “the local area surrounding the headset 100 {is a local device}”; Yang, ¶ Col. 4, lines 64-66, Col. 6, lines 49, Col. 5, lines 44-50) wherein the method further comprises: initiating, by the local device and with a remote device, an audio communication session (“The system 500” may include “multiple headsets each having an associated I/O interface 510, with each headset and I/O interface 510 communicating with the console 515” of another headset which can include communication “an artificial reality environment (e.g., a virtual reality environment, an augmented reality environment, a mixed reality environment, or some combination thereof) {an audio communication session}”; Yang, ¶ Col. 22, lines 27-42); wherein the determining the target audio area and generating the output audio signal are performed by an echo cancellation process in which echo of… [a] remote sound is canceled from the output audio signal (“the interference detector module 240 may determine... an echo interference is present (e.g., based on detected double talk in the audio signal)” where double talk is understood in the art as the detection of audio from a non-proximate speaker while a proximate speaker is speaking. Here, “the coefficient generation module 260 may determine that an echo path change takes place based on the model of the local area” and “When the echo path change is detected by the coefficient generation module 260 and double talk is detected by the interference detector module 240, the coefficient generation module 260 applies an adaptive filter that is changing, and the linear portion of the echo signal is subtracted from the audio signal.”; Yang, ¶ Col. 10, lines 26-33; Col. 15, lines 51-64). However, Yang fails to expressly recite wherein the method further comprises: initiating, by the local device and with a remote device, an audio communication session in which remote sound captured by the remote device is played back by a playback device that is local to the local device as the interference signal and the output audio signal is to be transmitted to the remote device for playback.
Paxinos teaches systems and methods for a user “to make audio and/or audio-video calls simultaneously while watching (co-viewing) the same provider video (e.g. program, movie or sporting event).” (Paxinos, ¶ [0006]). Regarding claim 4, Paxinos teaches wherein the method further comprises: initiating, by the local device and with a remote device, an audio communication session (“a first user requesting {initiating by the local device} a coviewing phone call {an audio communication session} connection with a second user at a specified or otherwise specific location (such as phone number, email address, or other unique identifier) on the first user’s device interface” the second user interacting through a second device interface {with a remote device}.; Paxinos, ¶ [0033]) in which remote sound captured by the remote device is played back by a playback device that is local to the local device as the interference signal and the output audio signal is to be transmitted to the remote device for playback, (“In actuating a coviewing session, the user’s device interfaces {playback device that is local to the local device} begin to display or otherwise play {is played back by...} the provider video content and phone call communications {remote sound captured by the remote device} as directed by the coviewing control system and transmit capture phone call related audio and video information to the coviewing control system {and the output audio signal is to be transmitted to the remote device for playback}.”; Paxinos, ¶ [0034]) … [and] an echo cancellation process in which echo of the remote sound is canceled from the output audio signal (“The echo canceller 108 receives audio data from a plurality of sources relating to noise which is being generated around the set top box 11” said noise affecting a “co-viewing call” and “generates a signal which enables echo cancellation and unwanted noise suppression.”; Paxinos, ¶ [0030]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang to incorporate the teachings of Paxinos to include wherein the method further comprises: initiating, by the local device and with a remote device, an audio communication session in which remote sound captured by the remote device is played back by a playback device that is local to the local device as the interference signal and the output audio signal is to be transmitted to the remote device for playback. The system of Paxinos provides an echo cancelling system which addresses the situation of a “point to point audio call, audio/visual call, or co-viewing call” receiving interference from an “audio signal that would emit from the television or display device” by receiving “the audio signal… and generate a cancelling signal that would mix with the signal …thus cancelling the unwanted audio signal and preventing unwanted feedback,” which addresses the same field of endeavor of the problem described in Yang under circumstances which were not expressly considered by Yang, and provides the additional benefit of reducing interference with received audio within the target audio area by local interactive devices, as recognized by Paxinos. (Paxinos, ¶ [0030]).
Regarding claim 9, the rejection of claim 1 is incorporated. Yang discloses all of the elements of the current invention as stated above. Yang further discloses wherein the electronic device is a local device, (“the headset 100 {electronic device}” exists in “the local area surrounding the headset 100 {is a local device}”; Yang, ¶ Col. 4, lines 64-66, Col. 6, lines 49, Col. 5, lines 44-50) wherein the method further comprises: initiating, by the local device, a shared playback mode (“The system 500 may operate in an artificial reality environment (e.g., a virtual reality environment, an augmented reality environment, a mixed reality environment, or some combination thereof)” which may include “multiple headsets each having an associated I/O interface 510, with each headset and I/O interface 510 communicating with the console 515” of another headset {an audio communication session}.; Yang, ¶ Col. 22, lines 27-42), wherein determining the target audio area and generating the output audio signal are performed by an echo cancellation process in which echo of the media content is canceled from the output audio signal (“the interference detector module 240 may determine... an echo interference is present (e.g., based on detected double talk in the audio signal)” where double talk is understood in the art as the detection of audio from a non-proximate speaker while a proximate speaker is speaking. Here, “the coefficient generation module 260 may determine that an echo path change takes place based on the model of the local area” and “When the echo path change is detected by the coefficient generation module 260 and double talk is detected by the interference detector module 240, the coefficient generation module 260 applies an adaptive filter that is changing, and the linear portion of the echo signal is subtracted from the audio signal.”; Yang, ¶ Col. 10, lines 26-33; Col. 15, lines 51-64). However, Yang fails to expressly recite wherein the method further comprises: initiating, by the local device, a shared playback mode in which a remote device plays back media content and a playback device that is local to the local device plays back the media content as the interference signal.
The relevance of Paxinos is described above with relation to claim 4. Regarding claim 9, Paxinos teaches wherein the method further comprises: initiating, by the local device, a shared playback mode (“a first user requesting {initiating by the local device} a coviewing phone call {an audio communication session} connection with a second user at a specified or otherwise specific location (such as phone number, email address, or other unique identifier) on the first user’s device interface {with a remote device}.”; Paxinos, ¶ [0033]) in which a remote device plays back media content and a playback device that is local to the local device plays back the media content as the interference signal, (“In actuating a coviewing session, the user’s device interfaces {playback device that is local to the local device} begin to display or otherwise play {is played back by...} the provider video content and phone call communications {remote sound captured by the remote device} as directed by the coviewing control system and transmit capture phone call related audio and video information to the coviewing control system “ where “the outputs of the set top box 11 {local device plays back media content}” comprise “noise which is being generated around the set top box 11 {as the interference signal}”; Paxinos, ¶ [0034]) … [and] an echo cancellation process in which echo of the media content is canceled from the output audio signal (“The echo canceller 108 receives audio data from a plurality of sources relating to noise which is being generated around the set top box 11” said noise affecting a “co-viewing call” and “generates a signal which enables echo cancellation and unwanted noise suppression.”; Paxinos, ¶ [0030]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang to incorporate the teachings of Paxinos to include wherein the method further comprises: initiating, by the local device, a shared playback mode in which a remote device plays back media content and a playback device that is local to the local device plays back the media content as the interference signal. The system of Paxinos provides an echo cancelling system which addresses the situation of a “point to point audio call, audio/visual call, or co-viewing call” receiving interference from an “audio signal that would emit from the television or display device” by receiving “the audio signal… and generate a cancelling signal that would mix with the signal …thus cancelling the unwanted audio signal and preventing unwanted feedback,” which addresses the same field of endeavor of the problem described in Yang under circumstances which were not expressly considered by Yang, and provides the additional benefit of reducing interference with received audio within the target audio area by local interactive devices, as recognized by Paxinos. (Paxinos, ¶ [0030]).
Claims 5 and 14-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang as applied to claim 11 above, and further in view of McElveen.
Regarding claim 5, the rejection of claim 1 is incorporated. Yang discloses all of the elements of the current invention as stated above. Yang further discloses further comprising: determining that the input audio signal comprises non-speech originating from within the target audio area (the system separates {determining...} “a particular sound (e.g., speech emitted by the user) and near-by interferences {interference signal comprising non-speech}... into the first separate audio signal” which is “located within a threshold distance of the user {...originating from within the target audio area}” from “interferences emitted by various sound sources located beyond a threshold distance of the user” which “are separated from the audio signal into the second separate audio signal”; Yang, ¶ Col. 15, line 65-Col. 16, line 16). However, Yang fails to expressly recite generating the output audio signal that preserves the speech while suppressing the non-speech without using a reference copy of the non-speech.
The relevance of McElveen is described above with relation to claim 3. Regarding claim 5, McElveen teaches further comprising: determining that the input audio signal comprises non-speech originating from within the target audio area (“Certain embodiments enable projection of cancelled sound to a target location for noise control applications, as well as remote determination of residue to use in adaptively canceling sound in a target location.”; McElveen, ¶ [0070]); wherein the output audio signal is generated to preserve the speech while suppressing the non-speech without using a reference copy of the non-speech (“Processing routine 700 proceeds by calculating a whitening filter using inverse noise spatial correlation matrix 704 and applying the Green's Function estimate and whitening filter {generating, by the device based on the target audio area...} to the audio input within the frequency domain 706 {generating...an audio output signal...} to extract the target audio frequencies/signals {...that preserves the target signal within the target audio area} and suppress the non-target frequencies/signals (i.e., noise) {...and suppresses the interference signal outside the target audio area} from the live or recorded audio input.” Further, as stated above, the whitening filter is calculated using an “inverse noise spatial correlation matrix,” which is a statistical model which describes the properties of the noise based on the incoming audio {without using a reference copy of the non-speech}; McElveen, ¶ [0099]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang to incorporate the teachings of McElveen to include generating the output audio signal that preserves the speech while suppressing the non-speech without using a reference copy of the non-speech. The spatial audio processing systems of McElveen can process “audio input data” to determine “an acoustic propagation model between a target location of a sound source relative to one or more array elements within an acoustic space” which provides significant “signal-to-noise ratio (SNR) improvement compared to prior art beamforming techniques, while using far fewer transducers,” and which “does not require knowledge of the array configuration, location, or orientation for improving SNR, regardless of whether the array transducers are co-located or distributed around an acoustic space,” allowing for improvements to SNR with wider applicability to a broad variety of configurations, without necessitating significant research or planning based on device configuration and/or spatial pose, as recognized by McElveen. (McElveen, ¶ [0018]-[0019], [0068]).
Regarding claim 14, the rejection of claim 11 is incorporated. Yang discloses all of the elements of the current invention as stated above. However, Yang fails to expressly recite wherein the processor further executes the instructions to: determine that the target signal comprises live speech and the interference signal comprises recorded speech .
The relevance of McElveen is described above with relation to claim 3. Regarding claim 14, McElveen teaches wherein the processor further executes the instructions to: determine that the target signal comprises live speech and the interference signal comprises recorded speech (“Signals captured at each of the two or more spatially distributed transducers may comprise a live and/or recorded audio input for use in processing” where the system distinguishes between live audio input and recorded audio input. Therefore, the system determines whether live and/or recorded audio input is received from either source.; McElveen, ¶ [0064], [0097]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang to incorporate the teachings of McElveen to include wherein the processor further executes the instructions to: determine that the target signal comprises live speech and the interference signal comprises recorded speech. The spatial audio processing systems of McElveen can process “audio input data” to determine “an acoustic propagation model between a target location of a sound source relative to one or more array elements within an acoustic space” which provides significant “signal-to-noise ratio (SNR) improvement compared to prior art beamforming techniques, while using far fewer transducers,” and which “does not require knowledge of the array configuration, location, or orientation for improving SNR, regardless of whether the array transducers are co-located or distributed around an acoustic space,” allowing for improvements to SNR with wider applicability to a broad variety of configurations, without necessitating significant research or planning based on device configuration and/or spatial pose, as recognized by McElveen. (McElveen, ¶ [0018]-[0019], [0068]).
Regarding claim 15, the rejection of claim 11 is incorporated. Claim 15 is substantially the same as claim 5 and is therefore rejected under the same rationale as above.
Claims 7 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang, McElveen, and Zheng as applied to claims 6 and 16 above, and further in view of Guo.
Regarding claim 7, the rejection of claim 6 is incorporated. Yang and Zheng discloses all of the elements of the current invention as stated above. However, Yang fails to expressly recite wherein adjusting the gain of the filtered audio signal comprises: determining an indication of a level of attenuation of the filtered audio signal relative to the input audio signal; determining whether the speech is also attenuated based on the indication; increasing the gain of the filtered audio signal to recover the speech in response to the speech being attenuated; and decreasing the gain of the filtered audio signal to further attenuate the media content responsive to the speech not being attenuated.
The relevance of McElveen is described above with relation to claim 3. Regarding claim 7, McElveen further teaches wherein adjusting the gain of the filtered audio signal comprises: determining an indication of a level of attenuation of the filtered audio signal relative to the input audio signal (“whitening filter 910” which may be a machine learning model, “may be updated in response to a trigger condition, such as by a source activity detector indicating ‘false,’ i.e. an indication that only noise is present to be used in the noise estimate. “; McElveen, ¶ [0101], [0068]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang, as modified by the sound insulation systems of Zheng, to incorporate the teachings of McElveen to include wherein adjusting the gain of the filtered audio signal comprises: determining an indication of a level of attenuation of the filtered audio signal relative to the input audio signal. The spatial audio processing systems of McElveen can process “audio input data” to determine “an acoustic propagation model between a target location of a sound source relative to one or more array elements within an acoustic space” which provides significant “signal-to-noise ratio (SNR) improvement compared to prior art beamforming techniques, while using far fewer transducers,” and which “does not require knowledge of the array configuration, location, or orientation for improving SNR, regardless of whether the array transducers are co-located or distributed around an acoustic space,” allowing for improvements to SNR with wider applicability to a broad variety of configurations, without necessitating significant research or planning based on device configuration and/or spatial pose, as recognized by McElveen. (McElveen, ¶ [0018]-[0019], [0068]). However, Yang, McElveen, and Zhang fail to expressly recite determining whether the target signal is also attenuated based on the indication; increasing the gain of the filtered audio signal to recover the target signal in response to the target signal is determined as being attenuated; and decreasing the gain of the filtered audio signal to further attenuate the interference signal in responsive to the target signal is determined as not being attenuated.
Guo teaches systems and methods of using automatic gain control in a receiver. (Guo, ¶ [0001]). Regarding claim 7, Guo teaches determining whether the speech is also attenuated based on the indication (“the strength of the received signal is compared with target signal power,” thus attenuated or not with respect to the target signal power; Guo, ¶ Abstract); increasing the gain of the filtered audio signal to recover the speech in response to the speech being attenuated (“a radio frequency automatic gain amplifier and a base band automatic gain amplifier are controlled respectively to amplify... the received signal on a signal link, and (e) the power of the amplified or attenuated signal is made to reach the target signal power.”; Guo, ¶ Abstract); and decreasing the gain of the filtered audio signal to further attenuate the media content responsive to the speech not being attenuated (“a radio frequency automatic gain amplifier and a base band automatic gain amplifier are controlled respectively to... attenuate the received signal on a signal link, and (e) the power of the amplified or attenuated signal is made to reach the target signal power.”; Guo, ¶ Abstract).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang, as modified by the spatial audio processing systems of McElveen, as modified by the sound insulation systems of Zheng, to incorporate the teachings of Guo to include determining whether the target signal is also attenuated based on the indication; increasing the gain of the filtered audio signal to recover the target signal in response to the target signal is determined as being attenuated; and decreasing the gain of the filtered audio signal to further attenuate the interference signal in responsive to the target signal is determined as not being attenuated. By “adopting the automatic gain control and the method for controlling the receiver by means of the automatic gain, errors… of the performance of the receiver can be reduced, the range of sensitivity of the receiver is widened, continuity of signals of the receiver is guaranteed, false-alarm signals caused by interference signals are eliminated, and stability of the signals of the receiver is guaranteed,” as recognized by Guo. (Guo, ¶ Abstract).
Regarding claim 17, the rejection of claim 16 is incorporated. Claim 17 is substantially the same as claim 7 and is therefore rejected under the same rationale as above.
Claim 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Yang as applied to claim 11 above, and further in view of Holman.
Regarding claim 12, the rejection of claim 11 is incorporated. Yang discloses all of the elements of the current invention as stated above. However, Yang fails to expressly recite wherein the at least one processor further executes instructions to: determine whether the local device is in a hand-held mode in which the local device is held by the user or in a coffee table mode in which the local device is on a surface; and adjust the target audio area based on a mode of the local device.
Holman teaches systems and methods for “mobile communications… that adjusts audio equalization of an audio signal based on detecting whether the device is being held in the hand of a user or is not being held in the hand of a user (e.g. lying flat on a surface).” (Holman, ¶ [0001]). Regarding claim 12, Holman teaches wherein the at least one processor further executes instructions to: determine whether the local device is in a hand-held mode in which the local device is held by the user (Discloses a “context detector 8… that determines or detects whether the portable audio device 1 is being held in the hand of a user {a hand held mode in which the device is held by a user} or lying still on a fixed surface” where each detected configuration may correspond to a different device mode.; Holman, ¶ [0025]-[0026]) or in a coffee table mode in which the local device is on a surface (Though not described in the context of a “coffee table mode”, The system defines a mode in which “the device 1 is lying steady or still on a fixed surface” where the implementation shown in FIG. 1B is a device lying steady or still on a table which is understood to be coffee table based on the configuration displayed {a coffee table mode} and where “the context detector module 8 makes a determination that the device 1 is lying steady or still on a fixed surface”; Holman, ¶ [0031]; FIG. 1B); and adjust the target audio area based on a mode of the local device (“Based on the determination by the context detector module 8” of the device being in one of the two modes “either upward or downward tilt equalization is applied to an audio signal to be output through the speaker 2,” where a change in equalization of audio received from the target audio area is an adjustment of the target audio area.; Holman, ¶ [0032]).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the interference suppression systems of Yang to incorporate the teachings of Holman to include wherein the at least one processor further executes instructions to: determine whether the local device is in a hand-held mode in which the local device is held by the user or in a coffee table mode in which the local device is on a surface; and adjust the target audio area based on a mode of the local device. The system of Holman can recognize when the device transitions between a first mode for when the device is being held by a user and a second mode when the device is resting on a table top, and respond by “applying downward tilt equalization to an audio signal” when the portable device is lying flat on a table, which “eliminates or abates an apparent ‘brightness’ in the sound emitted by the speaker” and avoids “undesired brightness… caused by sound wave diffraction, reflection off nearby surfaces, reverberation, and similar deformations off the surface of the table,” thereby improving audio quality and reducing further echo in the target audio area, as recognized by Holman. (Holman, ¶ [0035]).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Brimijoin (U.S. Pat. App. Pub. No. 2022/0021972) discloses sound profiles that are individualized to respective users, where the system further comprises an audio controller which identifies one or more sound sources in the local area based on the captured sound, determines a target sound source of the one or more sound sources, and determines one or more filters to apply to a sound signal associated with the target sound source in the captured sound.
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sean E. Serraguard whose telephone number is (313)446-6627. The examiner can normally be reached 07:00-17:00 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel C. Washburn can be reached at (571) 272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Sean E Serraguard/Patent Examiner, Art Unit 2657