Last updated: May 29, 2026
Application No. 18/095,663
DUAL AUDIO STREAM PROCESSING AND TRANSMISSION

Non-Final OA §103
Filed
Jan 11, 2023
Examiner
MOHAMMED, ASSAD
Art Unit
2691
Tech Center
2600 — Communications
Assignee
Zoom Video Communications, Inc.
OA Round
4 (Non-Final)
Interview Optional

— +11.2% interview lift. Interview lift (+11.2%) is below the 15.0% threshold. A written response is recommended.
Based on 592 resolved cases, 2023–2026
Examiner Intelligence

MOHAMMED, ASSAD View full profile →
Grants 73% — above average
Career Allowance Rate
434 granted / 592 resolved
+11.3% vs TC avg
Moderate +11% lift
Without
With
+11.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 1m
Avg Prosecution
18 currently pending
Career history
612
Total Applications
across all art units
Statute-Specific Performance

§101
0.4%
-39.6% vs TC avg
§103
96.9%
+56.9% vs TC avg
§102
0.9%
-39.1% vs TC avg
§112
0.9%
-39.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 592 resolved cases
Office Action

§103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
2.	Claim(s) 1, 8, 15 are rejected under 35 U.S.C. 103 as being unpatentable over Davis (US 10,110,994) in view of Johnston et al. (US 2016/0142462) in further view of Saifee et al. (US 2024/0194189).
	Regarding claim 1, Davis teaches a method comprising: establishing, by a video conference provider, a virtual event having a plurality of participants, each participant of the plurality of participants exchanging one or more audio or video streams via the virtual event using a plurality of client devices (see fig. 1, col. 6, line 46-col. 7, line 42. The participants are in a video and audio bi-directional connection with one another.).
Davis discloses a conferencing room, wherein a first device captures audio from multiple participants. One or more audio capture devices for capturing audio signals emanating from various audio sources, such as the participants, at the first location.
However Davis is vague on receiving, by the video conference provider, a first audio stream from a first client device of the plurality of client devices; receiving, by the video conference provider and concurrently with the first audio stream, a second audio stream from the first client device; processing the first audio stream via a first audio process to generate a first processed stream; processing the second audio stream via a second audio process to generate a second processed stream, the second audio process using different audio processing than the first audio process; transmitting, by the video conference provider, the first processed stream to a second client device of the plurality of client devices; and transmitting, by the video conference provider and concurrently with the first processed stream, the second processed stream to the second client device.
Johnston teaches receiving, by the video conference provider, a first audio stream from a first client device of the plurality of client devices; receiving, by the video conference provider and concurrently with the first audio stream, a second audio stream from the first client device; processing the first audio stream via a first audio process to generate a first processed stream; processing the second audio stream via a second audio process to generate a second processed stream; transmitting, by the video conference provider, the first processed stream to a second client device of the plurality of client devices; and transmitting, by the video conference provider and concurrently with the first processed stream, the second processed stream to the second client device (see fig.1, ¶ 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time).  
The combination of Johnston to Davis will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.
Saifee teaches processing the first audio stream via a first audio process to generate a first processed stream; processing the second audio stream via a second audio process to generate a second processed stream, the second audio process using different audio processing than the first audio process (see fig. 1-2, ¶0012-0023.  The system obtains audio streams and separates the audio signals to speech and other sound sources that are different, the system then processes the streams determine the source selection. Evaluate the probability of one or more characteristics in an audio stream as well as determine the relevance of one or more audio streams. Source separation module receives and separates the audio signals into an first audio stream  and second audio stream. Through a de-mixing, weighting, and/or beamforming process provided by the source separation module, the audio streams contain audio signals from each source but with higher clarity from one of the sources (e.g., Source A or Source B).).
The combination of Saifee to Davis and Johnson incorporate a sound source collection and processing the audio streams.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Johnson to incorporate audio streams that have different audio sources that are processed by the system. The modification provides identifying the sound sources based on either speech and other sounds.

Regarding claim 8, Davis teaches a system comprising: a non-transitory computer-readable medium; a communications interface; and a processor communicatively coupled to the non-transitory computer-readable medium and the communications interface, the processor configured to execute processor- executable instructions stored in the non-transitory computer-readable medium to: establish, by a video conference provider, a virtual event having a plurality of participants, each participant of the plurality of participants exchanging one or more audio or video streams via the virtual event using a plurality of client devices (see fig. 1, col. 6, line 46-col. 7, line 42. The participants are in a video and audio bi-directional connection with one another.).
Davis discloses a conferencing room, wherein a first device captures audio from multiple participants. One or more audio capture devices for capturing audio signals emanating from various audio sources, such as the participants, at the first location.
However Davis is vague on receive, by the video conference provider, a first audio stream from a first client device of the plurality of client devices; receive, by the video conference provider and concurrently with the first audio stream, a second audio stream from a first client device of the plurality of client devices; process, by the video conference provider, the first audio stream via a first audio process to generate a first processed stream; process, by the video conference provider, the second audio stream via a second audio process to generate a second processed stream, the second audio process using different audio processing than the first audio process; transmit, by the video conference provider, the first processed stream to a second client device of the plurality of client devices; and transmit, by the video conference provider and concurrently with the first processed stream, the second processed stream to the second client device.
Johnston teaches receive, by the video conference provider, a first audio stream from a first client device of the plurality of client devices; receive, by the video conference provider and concurrently with the first audio stream, a second audio stream from a first client device of the plurality of client devices; process, by the video conference provider, the first audio stream via a first audio process to generate a first processed stream; process, by the video conference provider, the second audio stream via a second audio process to generate a second processed stream; transmit, by the video conference provider, the first processed stream to a second client device of the plurality of client devices; and transmit, by the video conference provider and concurrently with the first processed stream, the second processed stream to the second client device (see fig.1, ¶ 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time).  
The combination of Johnston to Davis will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.
Saifee teaches process, by the video conference provider, the first audio stream via a first audio process to generate a first processed stream; process, by the video conference provider, the second audio stream via a second audio process to generate a second processed stream, the second audio process using different audio processing than the first audio process (see fig. 1-2, ¶0012-0023.  The system obtains audio streams and separates the audio signals to speech and other sound sources that are different, the system then processes the streams determine the source selection. Evaluate the probability of one or more characteristics in an audio stream as well as determine the relevance of one or more audio streams. Source separation module receives and separates the audio signals into an first audio stream  and second audio stream. Through a de-mixing, weighting, and/or beamforming process provided by the source separation module, the audio streams contain audio signals from each source but with higher clarity from one of the sources (e.g., Source A or Source B).).
The combination of Saifee to Davis and Johnson incorporate a sound source collection and processing the audio streams.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Johnson to incorporate audio streams that have different audio sources that are processed by the system. The modification provides identifying the sound sources based on either speech and other sounds.

Regarding claim 15, Davis teaches a non-transitory computer-readable medium comprising processor-executable instructions configured to cause one or more processors to: establish, by a video conference provider, a virtual event having a plurality of participants, each participant of the plurality of participants exchanging one or more audio or video streams via the virtual event using a plurality of client devices; receive, by the video conference provider, a first audio stream from a first client device of the plurality of client devices (see fig. 1, col. 6, line 46-col. 7, line 42. The participants are in a video and audio bi-directional connection with one another.).
Davis discloses a conferencing room, wherein a first device captures audio from multiple participants. One or more audio capture devices for capturing audio signals emanating from various audio sources, such as the participants, at the first location.
However Davis is vague on receive, by the video conference provider and concurrently with the first audio stream, a second audio stream from a first client device of the plurality of client devices; process, by the video conference provider, the first audio stream via a first audio process to generate a first processed stream; process, by the video conference provider, the second audio stream via a second audio process to generate a second processed stream, the second audio process using different audio processing than the first audio process; transmit, by the video conference provider, the first processed stream to a second client device of the plurality of client devices; and transmit, by the video conference provider and concurrently with the first processed stream, the second processed stream to the second client device.
Johnston teaches receive, by the video conference provider and concurrently with the first audio stream, a second audio stream from a first client device of the plurality of client devices; process, by the video conference provider, the first audio stream via a first audio process to generate a first processed stream; process, by the video conference provider, the second audio stream via a second audio process to generate a second processed stream; transmit, by the video conference provider, the first processed stream to a second client device of the plurality of client devices; and transmit, by the video conference provider and concurrently with the first processed stream, the second processed stream to the second client device (see fig.1, ¶ 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time).  
The combination of Johnston to Davis will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.
Saifee teaches process, by the video conference provider, the first audio stream via a first audio process to generate a first processed stream; process, by the video conference provider, the second audio stream via a second audio process to generate a second processed stream, the second audio process using different audio processing than the first audio process (see fig. 1-2, ¶0012-0023.  The system obtains audio streams and separates the audio signals to speech and other sound sources that are different, the system then processes the streams determine the source selection. Evaluate the probability of one or more characteristics in an audio stream as well as determine the relevance of one or more audio streams. Source separation module receives and separates the audio signals into an first audio stream  and second audio stream. Through a de-mixing, weighting, and/or beamforming process provided by the source separation module, the audio streams contain audio signals from each source but with higher clarity from one of the sources (e.g., Source A or Source B).).
The combination of Saifee to Davis and Johnson incorporate a sound source collection and processing the audio streams.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Johnson to incorporate audio streams that have different audio sources that are processed by the system. The modification provides identifying the sound sources based on either speech and other sounds.

3.	Claim(s) 2, 4, 9, 10, 13, 16, 17, 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Davis (US 10,110,994) in view of Johnston et al. (US 2016/0142462) in further view of Saifee et al. (US 2024/0194189).
Regarding claim 2, Davis and Saifee do not teach the method of claim 1, wherein the first processed stream and the second processed stream are transmitted by the video conference provider to the second client device simultaneously.  
Johnston teaches wherein the first processed stream and the second processed stream are transmitted by the video conference provider to the second client device simultaneously (see fig.1, ¶ 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time).  
The combination of Johnston to Davis and Saifee will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Saifee to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.

Regarding claim 4, Davis and Saifee do not teach the method of claim 1, wherein the first audio process is different than the second audio process.  
Johnston teaches wherein the first audio process is different than the second audio process (see fig.1-2, ¶ 0023, 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time. The multiple streams of participants provide different profiles of speakers to be displayed visually. This provides who is speaking at the same time as others in a room. These audio streams that are captured are different voice streams from the plurality of participants in the room.).  
The combination of Johnston to Davis and Saifee will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Saifee to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.

Regarding claim 9, Davis and Saifee do not teach the system of claim 8, wherein the first audio stream and the second audio stream are received simultaneously by the video conference provider.  
Johnston teaches wherein the first audio stream and the second audio stream are received simultaneously by the video conference provider (see fig.1, ¶ 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time).  
The combination of Johnston to Davis and Saifee will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Saifee to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.

Regarding claim 10, Davis and Saifee do not teach the system of claim 9, wherein the first audio process is different than the second audio process.  
Johnston teaches wherein the first audio process is different than the second audio process (see fig.1-2, ¶ 0023, 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time. The multiple streams of participants provide different profiles of speakers to be displayed visually. This provides who is speaking at the same time as others in a room. These audio streams that are captured are different voice streams from the plurality of participants in the room.).  
The combination of Johnston to Davis and Saifee will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Saifee to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.

Regarding claim 13, Davis and Saifee do not teach the system of claim 8, wherein the processor is further configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: identify, by the video conference provider, a first audio-capturing device associated with the first audio stream; identify, by the video conference provider, a second audio-capturing device associated with the second audio stream; and transmit, by the video conference provider, a prompt to select a first audio profile associated with the first audio-capturing device and a second audio profile associated with the second audio-capturing device.  
	Johnston teaches wherein the processor is further configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: identify, by the video conference provider, a first audio-capturing device associated with the first audio stream; identify, by the video conference provider, a second audio-capturing device associated with the second audio stream; and transmit, by the video conference provider, a prompt to select a first audio profile associated with the first audio-capturing device and a second audio profile associated with the second audio-capturing device (see fig. 1-2, ¶ 0018, 0021-0024. The audio data collected provide to determine each audio stream as well as its participants that area speaking from the same room.). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Davis and Saifee to incorporate identifying speakers from the voices captured by the microphones to be displayed. The modification provides for determining who is speaking in the conferencing session.  

Regarding claim 16, Davis and Saifee do not teach the non-transitory computer-readable medium of claim 15, wherein the processor is further configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: identify, by the video conference provider, a first audio-capturing device associated with the first audio stream; identify, by the video conference provider, a second audio-capturing device associated with the second audio stream; and determine, by the video conference provider, a first audio profile for processing the first audio stream via the first audio process; and determine, by the video conference provider, a second audio profile for processing the second audio stream via the second audio process.  
	Johnston teaches wherein the processor is further configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: identify, by the video conference provider, a first audio-capturing device associated with the first audio stream; identify, by the video conference provider, a second audio-capturing device associated with the second audio stream; and determine, by the video conference provider, a first audio profile for processing the first audio stream via the first audio process; and determine, by the video conference provider, a second audio profile for processing the second audio stream via the second audio process (see fig. 1-2, ¶ 0018, 0021-0024. The audio data collected provide to determine each audio stream as well as its participants that area speaking from the same room.). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Davis and Saifee to incorporate identifying speakers from the voices captured by the microphones to be displayed. The modification provides for determining who is speaking in the conferencing session.  

Regarding claim 17, Davis and Saifee do not teach the non-transitory computer-readable medium of claim 16, wherein: the instructions to determine, by the video conference provider, the first audio profile for processing the first audio stream via the first audio process cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to receive, from the first client device, an indication of the first audio profile for processing the first audio stream via the first audio process; the instructions to determine, by the video conference provider, the second audio profile for processing the second audio stream via the second audio process cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to receive, from the first client device, an indication of the second audio profile for processing the second audio stream via the second audio process; and the first audio profile is different than the second audio profile.  
Johnston teaches the instructions to determine, by the video conference provider, the first audio profile for processing the first audio stream via the first audio process cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to receive, from the first client device, an indication of the first audio profile for processing the first audio stream via the first audio process; the instructions to determine, by the video conference provider, the second audio profile for processing the second audio stream via the second audio process cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to receive, from the first client device, an indication of the second audio profile for processing the second audio stream via the second audio process; and the first audio profile is different than the second audio profile (see fig.1-2, ¶ 0018, 0021-0023, 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time. The multiple streams of participants provide different profiles of speakers to be displayed visually. This provides who is speaking at the same time as others in a room. These audio streams that are captured are different voice streams from the plurality of participants in the room. The audio data collected provide to determine each audio stream as well as its participants that area speaking from the same room.).
The combination of Johnston to Davis and Saifee will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Saifee to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.

Regarding claim 19, Davis and Saifee do not teach the non-transitory computer-readable medium of claim 15, wherein the first processed stream and the second processed stream are transmitted by the video conference provider to the second client device simultaneously.  
Johnston teaches wherein the first processed stream and the second processed stream are transmitted by the video conference provider to the second client device simultaneously (see fig.1, ¶ 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time).  
The combination of Johnston to Davis and Saifee will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Saifee to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.

Regarding claim 20, Davis and Saifee do not teach the non-transitory computer-readable medium of claim 19, wherein the first audio stream and the second audio stream are received from the first client device by the video conference provider simultaneously.  
Johnston teaches wherein the first audio stream and the second audio stream are received from the first client device by the video conference provider simultaneously (see fig.1, ¶ 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time).  
The combination of Johnston to Davis and Saifee will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Saifee to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.

4.	Claim(s) 3, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Davis (US 10,110,994) in view of Johnston et al. (US 2016/0142462) in further view of Saifee et al. (US 2024/0194189) in further view of Fortuna et al. (US 11,579,839).
	Regarding claim 3, Davis, Johnston and Saifee do not teach the method of claim 1, the method further comprising: receiving, from the first client device, an indication of a first audio profile for processing the first audio stream via the first audio process; and receiving, from the first client device, an indication of a second audio profile for processing the second audio stream via the second audio process.  
Fortuna teaches receiving, from the first client device, an indication of a first audio profile for processing the first audio stream via the first audio process; and receiving, from the first client device, an indication of a second audio profile for processing the second audio stream via the second audio process (see col, 2, lines 44-67, col. 4, lines 23-40, col. 5, lines 31-58. The audio streams that are captured by the device microphones. Multiple microphones pick up different audio streams (representing different sound sources and can provide modification based on audio profiles for each audio stream provided.).  
The combination of Fortuna to Davis, Johnston and Saifee will provide having a different streams and applying the audio profile to the streams.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis, Johnston and Saifee to incorporate The audio streams that are captured by the devices. Multiple microphones pick up different audio streams and using an audio profile for modifying or classify the signal.  The modification provides using audio profile for defining a audio stream modification.  

Regarding claim 14, Davis, Johnston and Saifee do not teach the system of claim 13, wherein the processor is further configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: receive, from the first client device, an indication of a first audio profile for processing the first audio stream via the first audio process; and receive, from the first client device, an indication of a second audio profile for processing the second audio stream via the second audio process.  
Fortuna teaches wherein the processor is further configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to: receive, from the first client device, an indication of a first audio profile for processing the first audio stream via the first audio process; and receive, from the first client device, an indication of a second audio profile for processing the second audio stream via the second audio process (see col, 2, lines 44-67, col. 4, lines 23-40, col. 5, lines 31-58. The audio streams that are captured by the device microphones. Multiple microphones pick up different audio streams (representing different sound sources and can provide modification based on audio profiles for each audio stream provided.).  
The combination of Fortuna to Davis, Johnston and Saifee will provide having a different streams and applying the audio profile to the streams.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis, Johnston and Saifee to incorporate The audio streams that are captured by the devices. Multiple microphones pick up different audio streams and using an audio profile for modifying or classify the signal.  The modification provides using audio profile for defining an audio stream modification.  

5.	Claim(s) 5, 6, 7, 11 are rejected under 35 U.S.C. 103 as being unpatentable over Davis (US 10,110,994) in view of Johnston et al. (US 2016/0142462) in further view of Saifee et al. (US 2024/0194189) further in view of Brennan (US 2009/0038468).
	 Regarding claim 5, Davis, Johnston and Saifee do not teach the method of claim 1, wherein the first audio stream comprises a first type of audio signal and the second audio stream comprises a second type of audio signal.  
	Brennan teaches wherein the first audio stream comprises a first type of audio signal and the second audio stream comprises a second type of audio signal (see fig. 1, 3, 0049-0050, 0082. The audio data comprising of a input voice instrument and source music input are received by the system. The system receives two separate tracks from the user (this would be processed from a single device of the user).). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Davis, Johnston and Saifee to incorporate having different audio types being provided in the different streams. The modification provides having two separate audio tracks are provided to the server to output by a network system. This provides for a virtual concert over a virtual network for collaboration. 

Regarding claim 6, Davis, Johnston and Saifee do not teach the method of claim 5, wherein the first type of audio signal comprises speech content and the second type of audio signals comprise musical content.  
	Brennan teaches wherein the first type of audio signal comprises speech content and the second type of audio signals comprise musical content (see fig. 1, 3, 0049-0050, 0082. The audio data comprising of a input voice instrument and source music input are received by the system. The system receives two separate tracks from the user (this would be processed from a single device of the user).). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Davis, Johnston and Saifee to incorporate having different audio types being provided in the different streams. The modification provides having two separate audio tracks are provided to the server to output by a network system. This provides for a virtual concert over a virtual network for collaboration. 

Regarding claim 7, Davis, Johnston and Saifee do not teach the method of claim 5, wherein processing, by the video conference provider, the first audio stream via the first audio process to generate the first processed stream comprises: processing, by the video conference provider, the first audio stream via a denoise process to generate the first processed stream, wherein the first processed stream comprises a denoised version of the first audio stream.  
Brennan teaches wherein processing, by the video conference provider, the first audio stream via the first audio process to generate the first processed stream comprises: processing, by the video conference provider, the first audio stream via a denoise process to generate the first processed stream, wherein the first processed stream comprises a denoised version of the first audio stream (see fig. 1, 3, 0045, 0049-0050, 0055-0056, 0082. The audio data comprising of a input voice instrument and source music input are received by the system. The system receives two separate tracks from the user (this would be processed from a single device of the user). The system then sends the received dual audio data to the network for output. The audio data is then presented to audience in the virtual environment. This is processed in the network environment for remote attendees. This would be processed simultaneously in order to provide audio streams in sync. The process for the first and second audio data will are different in regards to suppression (denoise). First audio is processed normal and the second audio is processed with a suppression in order to suppression selected content from the music source.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Davis, Johnston and Saifee to incorporate suppress the audio signals in the virtual session having the original first audio and cleaned up version of the second audio source. The modification process for the first and second audio data having different audio suppression (denoise). First audio is processed normal and the second audio is processed with a suppression in order to suppression selected content from the music source. This provides for a virtual concert over a virtual network for collaboration. 

Regarding claim 11, Davis, Johnston and Saifee do not teach the system of claim 10, wherein: the processor-executable instructions to process, by the video conference provider, the first audio stream via the first audio process to generate the first processed stream further cause the processor to execute processor-executable instructions stored in the non-transitory computer- readable medium to process the first audio stream via a denoise process; and the processor-executable instructions to process, by the video conference provider, the second audio stream via the second audio process to generate the second processed stream further cause the processor to execute processor-executable instructions stored in the non- transitory computer-readable medium to process the second audio stream to maintain original audio properties.  
Brennan teaches wherein: the processor-executable instructions to process, by the video conference provider, the first audio stream via the first audio process to generate the first processed stream further cause the processor to execute processor-executable instructions stored in the non-transitory computer- readable medium to process the first audio stream via a denoise process; and the processor-executable instructions to process, by the video conference provider, the second audio stream via the second audio process to generate the second processed stream further cause the processor to execute processor-executable instructions stored in the non- transitory computer-readable medium to process the second audio stream to maintain original audio properties (see fig. 1, 3, 0045, 0049-0050, 0055-0056, 0082. The audio data comprising of a input voice instrument and source music input are received by the system. The system receives two separate tracks from the user (this would be processed from a single device of the user). The system then sends the received dual audio data to the network for output. The audio data is then presented to audience in the virtual environment. This is processed in the network environment for remote attendees. This would be processed simultaneously in order to provide audio streams in sync. The process for the first and second audio data will are different in regards to suppression (denoise). First audio is processed normal and the second audio is processed with a suppression in order to suppression selected content from the music source.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Davis, Johnston and Saifee to incorporate suppress the audio signals in the virtual session having the original first audio and cleaned up version of the second audio source. The modification process for the first and second audio data having different audio suppression (denoise). First audio is processed normal and the second audio is processed with a suppression in order to suppression selected content from the music source. This provides for a virtual concert over a virtual network for collaboration. 

6.	Claim(s) 12 is rejected under 35 U.S.C. 103 as being unpatentable over Davis (US 10,110,994) in view of Johnston et al. (US 2016/0142462) in further view of Saifee et al. (US 2024/0194189) further in view of Brennan (US 2009/0038468) in further view of Zhou (US 2020/0221223).
	Regarding claim 12, Davis, Johnston and Saifee do not teach the system of claim 10, wherein: the processor-executable instructions to process, by the video conference provider, the first audio stream via the first audio process to generate the first processed stream cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to process the first audio stream via a denoise process; and the processor-executable instructions to process, by the video conference provider, the second audio stream via the second audio process to generate the second processed stream cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to process the second audio via a high-fidelity quality process.
Brennan teaches wherein: the processor-executable instructions to process, by the video conference provider, the first audio stream via the first audio process to generate the first processed stream cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to process the first audio stream via a denoise process (see fig. 1, 3, 0045, 0049-0050, 0055-0056, 0082. The audio data comprising of a input voice instrument and source music input are received by the system. The system receives two separate tracks from the user (this would be processed from a single device of the user). The system then sends the received dual audio data to the network for output. The audio data is then presented to audience in the virtual environment. This is processed in the network environment for remote attendees. This would be processed simultaneously in order to provide audio streams in sync. The process for the first and second audio data will are different in regards to suppression (denoise). First audio is processed normal and the second audio is processed with a suppression in order to suppression selected content from the music source.)
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Davis, Johnston and Saifee to incorporate suppress the audio signals in the virtual session having the original first audio and cleaned up version of the second audio source. The modification process for the first and second audio data having different audio suppression (denoise). First audio is processed normal and the second audio is processed with a suppression in order to suppression selected content from the music source. This provides for a virtual concert over a virtual network for collaboration. 
	Zhou teaches the processor-executable instructions to process, by the video conference provider, the second audio stream via the second audio process to generate the second processed stream cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to process the second audio via a high-fidelity quality process (see ¶ 0079. The different audio signal can be enhanced or suppressed. Therefore different audio signals can be subject to suppression or enhancement based on the algorithm for spatial differentiation of sound sources.).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Davis, Johnston and Saifee and Brennan to incorporate  separate audio signals processed differently which can be enhanced or suppressed. The modification provides for suppression or enhancement of an audio signal.  

7.	Claim(s) 18 is rejected under 35 U.S.C. 103 as being unpatentable over Davis (US 10,110,994) in view of Johnston et al. (US 2016/0142462) in further view of Saifee et al. (US 2024/0194189)in further view of Rangarajan et al. (US 2017/0054987).
	Regarding claim 18, Davis and Saifee do not teach the non-transitory computer-readable medium of claim 15, wherein: the instructions to transmit, by the video conference provider, the first processed stream to the second client cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to transmit, by the video conference provider, the first processed stream via a first channel to the second client device; the instructions to transmit, by the video conference provider, the second processed stream to the second client cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to transmit, by the video conference provider, the second processed stream via a second channel to the second client device; and the first channel comprises a different bandwidth than the second channel.  
Johnston teaches wherein: the instructions to transmit, by the video conference provider, the first processed stream to the second client cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to transmit, by the video conference provider, the first processed stream via a first channel to the second client device; the instructions to transmit, by the video conference provider, the second processed stream to the second client cause the processor to execute further processor-executable instructions stored in the non-transitory computer-readable medium to transmit, by the video conference provider, the second processed stream via a second channel to the second client device; and the first channel comprises a different bandwidth than the second channel (see fig.1, ¶ 0038-0039, 0049. The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. This will be processed and transmitted by the conferencing server. The conferencing server 101 processes both audio and video streams that area transmitted from one location to another location. This is done concurrently or simultaneously when the remote attendees hear all participants talking at the same time).  
The combination of Johnston to Davis and Saifee will provide having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. These streams are captured by the microphones in the conferencing room to be transmitted to a remote location.
It would have been obvious to one of ordinary skill in the art before the effective filing data of the claimed invention to modify Davis and Saifee to incorporate The audio streams that are captured by the devices in the conferencing room. Multiple microphones pick up different audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees. The modification provides having a AV conferencing server that is able to transmit all audio streams (representing the fact that more than one person in the room is talking at the same time), then more than one audio stream may be transmitted to all other conference attendees.
Rangarajan teaches the first channel comprises a different bandwidth than the second channel (see ¶ 0101. A client device receives two different channels. These channels would have different bandwidth allocation for transmission. The first channel would have a higher network bandwidth that the second channel.). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Davis, Johnston and Saifee to incorporate each channel having different network bandwidth allocation. The modification provides bandwidth allocation for each channel for transmission. 

Conclusion
8	Any inquiry concerning this communication or earlier communications from the examiner should be directed to ASSAD MOHAMMED whose telephone number is (571)270-7253. The examiner can normally be reached 9:00AM-5:00PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at 571-272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ASSAD MOHAMMED/Examiner, Art Unit 2691   

/DUC NGUYEN/Supervisory Patent Examiner, Art Unit 2691
Read full office action
Prosecution Timeline

Show 2 earlier events
Feb 25, 2025
Response Filed
May 27, 2025
Final Rejection mailed — §103
Jul 28, 2025
Response after Non-Final Action
Aug 08, 2025
Request for Continued Examination
Aug 11, 2025
Response after Non-Final Action
Sep 04, 2025
Non-Final Rejection mailed — §103
Dec 04, 2025
Response Filed
Apr 01, 2026
Non-Final Rejection mailed — §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/483,358
Patent 12632613
SEGMENTED BLURRING IN A CONFERENCE ROOM
2y 7m to grant Granted May 19, 2026
18/194,485
Patent 12621413
SYSTEMS AND METHODS FOR UPDATING A SECURITY USER INTERFACE BASED ON SCHEDULED MODES
3y 1m to grant Granted May 05, 2026
18/105,074
Patent 12604149
ELECTRONIC DEVICE AND METHOD THEREOF FOR OUTPUTTING AUDIO DATA
3y 2m to grant Granted Apr 14, 2026
18/340,183
Patent 12598441
AUDIO SIGNAL PROCESSING METHOD AND AUDIO SIGNAL PROCESSING APPARATUS
2y 9m to grant Granted Apr 07, 2026
18/585,594
Patent 12587801
RE-MIXING A COMPOSITE AUDIO PROGRAM FOR PLAYBACK WITHIN A REAL-WORLD VENUE
2y 1m to grant Granted Mar 24, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

4-5
Expected OA Rounds
73%
Grant Probability
84%
With Interview (+11.2%)
3y 1m (~0m remaining)
Median Time to Grant
High
PTA Risk
Based on 592 resolved cases by this examiner. Grant probability derived from career allowance rate.