Last updated: April 19, 2026
Application No. 18/788,621
Buffering Push-To-Talk Messages

Non-Final OA §103
Filed
Jul 30, 2024
Examiner
NGUYEN, QUYNH H
Art Unit
2693
Tech Center
2600 — Communications
Assignee
Zoom Video Communications, Inc.
OA Round
1 (Non-Final)
Interview Optional

— +17.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 1078 resolved cases, 2023–2026
Examiner Intelligence

NGUYEN, QUYNH H View full profile →
Grants 87% — above average
Career Allow Rate
941 granted / 1078 resolved
+25.3% vs TC avg
Strong +17% interview lift
Without
With
+17.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 8m
Avg Prosecution
29 currently pending
Career history
1107
Total Applications
across all art units
Statute-Specific Performance

§101
18.6%
-21.4% vs TC avg
§103
42.7%
+2.7% vs TC avg
§102
7.4%
-32.6% vs TC avg
§112
10.3%
-29.7% vs TC avg
Black line = Tech Center average estimate • Based on career data from 1078 resolved cases
Office Action

§103
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Claim Rejections - 35 USC § 103
1.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

2.	Claims 1-3, 5, 8-10, 12, 15, 17, 19 are rejected under 35 U.S.C. 103 as being unpatentable over submitted prior art Chotai et al. (EP 2022249) in view of Centelles Martin et al. (2025/0023934).
	As to claim 1, Chotai teaches a method comprising:
	receiving, at a push-to-talk server (Fig. 1, PTT server 12) while audio is being played back at a push-to-talk client device, one or more audio messages ([0017] – PTT server 12 that manages or controls a logical floor 13 wherein one of a plurality of participants 16 is permitted to speak at a time via corresponding endpoint devices 16; when a participant wishes to speak during the session he transmits, via his corresponding end point device 16, a talk burst request message to the server 12);
	storing the one or more audio messages in a buffer at the push-to-talk server in an order determine based at least in part on an initiation time of the one or more audio messages ([0026] – the buffered talk burst floor control temple which basically allows an open floor. And that is, anyone is free to speak at any time. All requests are time-stamped, buffered and played out in the order that they were generated);
	determining an importance for at least a portion of the one or more audio messages in the buffer ([0019] – “Barge-In” floor control algorithm; the idea is to grant a specially designated person(s) permission to barge-in and capture the floor from someone else anytime they want or need to speak to the group. Typically, 911 operators, command center operators/dispatchers, and the like, are persons that might appropriately be conferred with barge-in privileges. Once the barge-in floor control template is applied – either as a general policy or as an overlapping policy on top of another floor control algorithm – any participant that has been granted barge-in privileges is free to take over the floor from anyone who already has the floor and who may be in the middle of speaking);
	reordering the one or more audio messages based on the importance ([0018] - ; [0019] - “Barge-In” floor control algorithm; the idea is to grant a specially designated person(s) permission to barge-in and capture the floor from someone else anytime they want or need to speak to the group. Typically, 911 operators, command center operators/dispatchers, and the like, are persons that might appropriately be conferred with barge-in privileges. Once the barge-in floor control template is applied – either as a general policy or as an overlapping policy on top of another floor control algorithm – any participant that has been granted barge-in privileges is free to take over the floor from anyone who already has the floor and who may be in the middle of speaking; hence it is technically reordering audio messages based on barge-in control algorithm with 911 operators, dispatchers…with barge-in privileges); and
	transmitting the one or more audio messages for playback at the push-to-talk client device based on the order of the one or more audio messages ([0019], [0026] – all requests are time-stamped, buffered and played out in the order that they were generated).
	Chotai does not explicitly discuss using an artificial intelligence engine and based on nonverbal features.
	Centelles Martin teaches The processing device 86 receives data from the monitoring module 17 wherein the communication device 12 is configured with push-to-talk (PTT) technology, the user may engage the voice command button 26 to communicate audio to other communication devices 12 of the plurality of communication devices 12 ([0023]); keyword information for each user of a channel 16a-16c and levels of usage or non-usage at a user-specific level and communicates that information to an artificial intelligence (AI) engine 90 on the server 62. The AI engine 90 may process the information captured by the monitoring module 17 in one or more of the machine learning models 84 trained to adjust the turn-based communications based on the information received ([0053]) and The server includes an artiticial intelligence engine configured to generate at least one machine learning model trained to determine the modification ([0083]); audio processing unit configured to recognize the voice by encoding phonetic information and nonverbal vocalizations (e.g., laughter, cries, screams, grunts) ([0041, 0051]); and the monitoring module 17 to detect attributes that indicate a level of importance, urgency, or value associated with the communication and the processing routines may apply one or more learning algorithms (e.g., a machine learning algorithm, neural network, etc.) to determine a modification for the at least one channel 16a-16c. The modification may be any modification to the turn-based communications and/or the at least one channel 16a-16c to prioritize important communications over less important communications. For example, the modification may be an adjustment to membership of the at least one channel 16a-16c, an adjustment to allotted talking time for participants on the at least one channel 16a-16c, and/or an adjustment to the priority of one of the at least one channel 16a-16c over another of the at least one channel 16a-16c. In this way, the control circuitry 18 may determine the modification based on the at least one communication attribute and adjust a messaging feature or delivery priority of the at least one channel 16a-16c in response to the modification. The messaging feature or delivery priority includes at least one of a timer for push-to-talk (PTT) messaging, a membership of the user identity 14a-14d to the at least one channel 16a-16c, and a priority level for the at least one channel 16a-16c ([0018-0019]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Centelles Martin into the teachings of Chotai for the purpose of applying one or more learning algorithm such as neural network to determine a modification to prioritize important communications over less important communications.
	As to claims 2 and 9, Centelles Martin teaches the method of claim 1 and the non-transitory computer readable medium of claim 8, wherein the nonverbal features comprise at least one of a voice tonality, volume, or a pitch ([0041, 0051] - audio processing unit configured to recognize the voice by encoding phonetic information and nonverbal vocalizations (e.g., laughter, cries, screams, grunts)).
	As to claims 3, 10, and 17, Centelles Martin teaches the method of claim 1, the non-transitory computer readable medium of claim 8 and the system of claim 15, wherein determining the importance based on verbal features comprising natural language words ([0018] -  the monitoring module 17 may monitor communications and detect attributes that indicate a level of importance, urgency, or value of the communication based on other attributes that may be associated with messages communicated among the communication devices 12).
	As to claims 5, 12, and 19, Chotai teaches the method of claim 1, the non-transitory computer readable medium of claim 8 and the system of claim 15, further comprising: stopping playback of the audio at the push-to-talk client device before completion of the audio; and immediately starting playback of an audio message from the buffer, based on the importance of the audio message being in a highly importance range [0018] – in a “Priority -based” floor control different subscribers or participants are assigned to different weights or priority values. Participants having higher weights assigned to the name, i.e., a higher priority, therefore have a better chance to capturing the floor in an arbitration contest with another participant having a lower priority weight. For example, in an emergency response or natural disaster situation, a Police Chief for Fire Chief granted the highest priority such that he will gain access to the floor every time he wants to communicate instructions to his subordinates and hence stopping playback audio before completion of the other audio; [0019] – grant a specially designated persons permission to barge-in and capture the floor from someone anytime they want or need to speak to the group; 911 operators, command center operators/dispatchers, and the like, are persons appropriately be conferred with barge-in privileges. Once the barge-in floor control template is applied any participant that has been granted barge-in privileges is free to take over floor from anyone who already has the floor and who may be in the middle of speaking).
	Claims 8 and 15 are rejected for the same reasons discussed above with respect to claim 1. Furthermore, Chotai teaches a non-transitory computer readable medium storing instructions operable to cause one or more processors to perform operations, a memory subsystem storing instructions and processing circuitry to execute the instructions ([0038]). Centelles Martin teaches control circuitry 18 may be communicatively coupled with the monitoring module 17 and configured to execute one or more processing routines to evaluate the attributes of the communications and associate the communications with a corresponding value, importance, or priority ([0019]), memory 50 and processor 48.

3.	Claims 4, 11, 18 are rejected under 35 U.S.C. 103 as being unpatentable over submitted prior art Chotai and Centelles Martin in view of Jabara et al. (20213/0231088).
	As to claim 4, Chotai and Centelles Martin do not explicitly discuss the method of claim 1, further comprising: removing at least one audio message from the buffer based on the importance score being in an unimportant range.
	Jabara teaches when the threshold is exceeded, the controller 182 begins to delete the oldest messages first. In another alternative embodiment, messages may be deleted on the basis of message type. For example, business messages may have a lower priority and be deleted first. In contrast, emergency messages may not be deleted until a specific instruction is received to delete the emergency message or until a Message Read Receipt is received ([0134]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Jabara into the teachings of Chotai and Centelles Martin for the purpose of reserving at least a portion of that capacity for emergency messages, status messages, and the like when the data storage may have a certain capacity.
	As to claim 11, Jabara teaches the non-transitory computer readable medium of claim 8, the operations further comprising when the threshold is exceeded, the controller 182 begins to delete the oldest messages first. In another alternative embodiment, messages may be deleted on the basis of message type. For example, business messages may have a lower priority and be deleted first. In contrast, emergency messages may not be deleted until a specific instruction is received to delete the emergency message or until a Message Read Receipt is received ([0134]).
	As to claim 18, Jabara teaches the system of claim 15, the processing circuitry further configured to execute the instructions to: remove at least one audio message from the buffer based on the importance score of the at least one audio message ([0134] - when the threshold is exceeded, the controller 182 begins to delete the oldest messages first. In another alternative embodiment, messages may be deleted on the basis of message type. For example, business messages may have a lower priority and be deleted first. In contrast, emergency messages may not be deleted until a specific instruction is received to delete the emergency message or until a Message Read Receipt is received).

6.	Claims 6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over submitted prior art Chotai and Centelles Martin in view of Du et al. (2024/0265605).
	As to claim 6, Chotai and Centelles Martin do not explicitly discuss the method of claim 1, wherein the artificial intelligence engine comprises a convolutional neural network for processing the nonverbal features and a transformer based engine for processing verbal features.
	Du teaches Encoder 312 configured to categorize verbal communication information 302, non-verbal communication information 304. Encoder 312 may be a neural network (e.g., deep-learning, a two-dimensional (2D) convolutional neural network (CNN), LSTM, Transformer, etc.) trained (e.g., pretrained) to generate the embeddings including being trained (e.g., pretrained) to identify the verbal and non-verbal communication, categorize the identified verbal communication information 302, non-verbal communication information 304 ([0065]); and Transformer model 316 may work with any combination of verbal communication information 302, non-verbal communication information 304 ([0073]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Du into the teachings of Chotai and Centelles Martin for the purpose of training the neural network (of the encoder 312) by using multiple sources of information, transformer model 316 may make more accurate predictions and understand the relationships between different types of information.
	As to claim 13, Chotai and Centelles Martin do not explicitly discuss the non-transitory computer readable medium of claim 8, wherein the artificial intelligence engine comprises a first sub-engine for processing the nonverbal features and a second sub-engine for processing verbal features.
	Du teaches Encoder 312 configured to categorize verbal communication information 302, non-verbal communication information 304. Encoder 312 may be a neural network (e.g., deep-learning, a two-dimensional (2D) convolutional neural network (CNN), LSTM, Transformer, etc.) trained (e.g., pretrained) to generate the embeddings including being trained (e.g., pretrained) to identify the verbal and non-verbal communication, categorize the identified verbal communication information 302, non-verbal communication information 304 ([0065]); and Transformer model 316 may work with any combination of verbal communication information 302, non-verbal communication information 304. Encoder 312 may combine multiple inputs into embeddings 313 to form a complete understanding of the input. Multi-head attention 315 weighs the importance of each type of information and make predictions based on the combined information. By using multiple sources of information, transformer model 316 may make more accurate predictions and understand the relationships between different types of information ([0073]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Du into the teachings of Chotai and Centelles Martin for the purpose of training the neural network (of the encoder 312) by using multiple sources of information, transformer model 316 may make more accurate predictions and understand the relationships between different types of information.
	As to claim 20, Chotai and Centelles Martin do not explicitly discuss the system of claim 15, wherein the artificial intelligence engine comprises a first artificial neural network for processing the nonverbal features and a second artificial neural network for processing verbal features.
	Du teaches Encoder 312 configured to categorize verbal communication information 302, non-verbal communication information 304. Encoder 312 may be a neural network (e.g., deep-learning, a two-dimensional (2D) convolutional neural network (CNN), LSTM, Transformer, etc.) trained (e.g., pretrained) to generate the embeddings including being trained (e.g., pretrained) to identify the verbal and non-verbal communication, categorize the identified verbal communication information 302, non-verbal communication information 304 ([0065]); and Transformer model 316 may work with any combination of verbal communication information 302, non-verbal communication information 304. Encoder 312 may combine multiple inputs into embeddings 313 to form a complete understanding of the input. Multi-head attention 315 weighs the importance of each type of information and make predictions based on the combined information. By using multiple sources of information, transformer model 316 may make more accurate predictions and understand the relationships between different types of information ([0073]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Du into the teachings of Chotai and Centelles Martin for the purpose of training the neural network (of the encoder 312) by using multiple sources of information, transformer model 316 may make more accurate predictions and understand the relationships between different types of information.

7.	Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over submitted prior art Chotai and Centelles Martin in view of Yuksel et al. (2022/0318619).
	As to claim 7, Chotai teaches determining an importance for at least a portion of the one or more audio messages in the buffer ([0019] – “Barge-In” floor control algorithm; the idea is to grant a specially designated person(s) permission to barge-in and capture the floor from someone else anytime they want or need to speak to the group. Typically, 911 operators, command center operators/dispatchers, and the like, are persons that might appropriately be conferred with barge-in privileges. Once the barge-in floor control template is applied – either as a general policy or as an overlapping policy on top of another floor control algorithm – any participant that has been granted barge-in privileges is free to take over the floor from anyone who already has the floor and who may be in the middle of speaking). Chotai and Centelles Martin do not explicitly discuss combining outputs of a first portion of the plurality of artificial neural networks by a second portion of the artificial neural networks.
	Yuksel teaches types of models may include artificial neural networks ([0019]). Generating the AI-based solution the processing logic may: identify a second machine learning model in a first database within the marketplace platform, wherein the second machine learning model is a first portion of the AI-based solution; identify a third machine learning model in a second database external to the marketplace platform, wherein the third machine learning model is a second portion of the AI-based solution; and generate the AI-based solution by combining the second machine learning model and the third machine learning model ([0078]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Yuksel into the teachings of Chotai and Centelles Martin for the purpose of generating the AI-based solution by combining machine learning models and/or artificial neural networks.
	As to claim 14, Chotai teaches determining an importance for at least a portion of the one or more audio messages in the buffer ([0019] – “Barge-In” floor control algorithm; the idea is to grant a specially designated person(s) permission to barge-in and capture the floor from someone else anytime they want or need to speak to the group. Typically, 911 operators, command center operators/dispatchers, and the like, are persons that might appropriately be conferred with barge-in privileges. Once the barge-in floor control template is applied – either as a general policy or as an overlapping policy on top of another floor control algorithm – any participant that has been granted barge-in privileges is free to take over the floor from anyone who already has the floor and who may be in the middle of speaking). Chotai and Centelles Martin do not explicitly discuss combining outputs of a first portion of the plurality of artificial intelligence sub-engines by a second portion of the artificial intelligence sub-engines.
	Yuksel teaches types of models may include artificial neural networks ([0019]). Generating the AI-based solution the processing logic may: identify a second machine learning model in a first database within the marketplace platform, wherein the second machine learning model is a first portion of the AI-based solution; identify a third machine learning model in a second database external to the marketplace platform, wherein the third machine learning model is a second portion of the AI-based solution; and generate the AI-based solution by combining the second machine learning model and the third machine learning model ([0078]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Yuksel into the teachings of Chotai and Centelles Martin for the purpose of generating the AI-based solution by combining machine learning models and/or artificial neural networks.

8.	Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over submitted prior art Chotai and Centelles Martin in view of Hu et al. (2021/0369163).
	As to claim 16, Chotai and Centelles Martin do not explicitly discuss system of claim 15, wherein the nonverbal features comprises at least one of a rate of speech or a pause duration.
	Hu teaches audio data input 102 can include speech from multiple speakers (e.g., clinicians, patients). The audio data input also include verbal and non-verbal information… the audio data input 102 can include non-verbal information, such as varying speech rates and energy levels, silences, and pauses ([0024]).
	It would have been obvious before the effective filing date of the claimed invention to incorporate the teachings of Hu into the teachings of Chotai and Centelles Martin for the purpose of deriving non-verbal cues from the audio characteristics of the recording (e.g., volume, pitches) and determining speech rate of the speaker if the speech rate is higher than the average speech rate of the speaker during the conversation.
Conclusion
9.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to QUYNH H NGUYEN whose telephone number is (571)272-7489. The examiner can normally be reached Monday-Thursday 7:30AM-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ahmad Matar can be reached on 571-272-7488. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/QUYNH H NGUYEN/Primary Examiner, Art Unit 2693
Read full office action
Prosecution Timeline

Jul 30, 2024
Application Filed
Feb 03, 2026
Non-Final Rejection — §103
Apr 10, 2026
Interview Requested
Precedent Cases

Applications granted by this same examiner with similar technology

18/367,310
Patent 12591740
METHODS AND SYSTEMS FOR GENERATING TEXTUAL FEATURES
2y 5m to grant Granted Mar 31, 2026
17/942,860
Patent 12567409
RESTRICTING THIRD PARTY APPLICATION ACCESS TO AUDIO DATA CONTENT
2y 5m to grant Granted Mar 03, 2026
18/663,662
Patent 12566920
System and Method to Generate and Enhance Dynamic Interactive Applications from Natural Language Using Artificial Intelligence
2y 5m to grant Granted Mar 03, 2026
18/459,819
Patent 12563141
SYSTEM AND METHOD OF CONNECTING A CALLER TO A RECIPIENT BASED ON THE RECIPIENT'S STATUS AND RELATIONSHIP TO THE CALLER
2y 5m to grant Granted Feb 24, 2026
18/468,679
Patent 12554761
DATA SOURCE CURATION FOR LARGE LANGUAGE MODEL (LLM) PROMPTS
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
87%
Grant Probability
99%
With Interview (+17.2%)
2y 8m
Median Time to Grant
Low
PTA Risk
Based on 1078 resolved cases by this examiner. Grant probability derived from career allow rate.