Last updated: April 19, 2026
Application No. 18/772,852
SEMI-DELEGATED CALLING BY AN AUTOMATED ASSISTANT ON BEHALF OF HUMAN PARTICIPANT

Non-Final OA §102§103
Filed
Jul 15, 2024
Examiner
BOGGS JR., JAMES
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Google LLC
OA Round
1 (Non-Final)
This examiner grants 60% of cases after interview

— +38.8% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 107 resolved cases, 2023–2026
Examiner Intelligence

BOGGS JR., JAMES View full profile →
Grants 60% of resolved cases
Career Allow Rate
64 granted / 107 resolved
-2.2% vs TC avg
Strong +39% interview lift
Without
With
+38.8%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
28 currently pending
Career history
135
Total Applications
across all art units
Statute-Specific Performance

§101
12.4%
-27.6% vs TC avg
§103
48.5%
+8.5% vs TC avg
§102
16.2%
-23.8% vs TC avg
§112
18.1%
-21.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 107 resolved cases
Office Action

§102 §103
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The disclosure is objected to because of the following informalities:
In paragraph 0043, line 7, “automated assistant 155” should read “automated assistant 115”.
In paragraph 0043, line 8, “automated assistant 155” should read “automated assistant 115”.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1 – 5, 7 –11 and 13 – 17 are rejected under 35 U.S.C. 102(a)(1) and 102(a)(2) as being anticipated by Woolsey et al. (US Patent No. 9,462,112), hereinafter Woolsey.
Regarding claim 1, Woolsey discloses a method implemented by one or more processors (Column 18, lines 31-32, "The illustrated device 110 can include a controller or processor"), the method comprising:
detecting, at a client device, an ongoing call between a given user of the client device and an additional user of an additional client device (Column 1, lines 40-44, "A digital assistant supported on a device such as a smartphone, personal computer, or game console is configured to be engaged as an active participant in communications between local and remote parties by listening to voice and video calls and participating in messaging sessions."; A digital assistant supported on a device listening to voice calls reads on detecting an ongoing call between a given user of the client device and an additional user of an additional client device.);
processing a stream of audio data, that captures at least one spoken utterance during the ongoing call, to generate recognized text, wherein the at least one spoken utterance is of the given user or the additional user (Column 5, lines 60-65, "As shown in FIG. 4, the digital assistant 350 can employ a natural language user interface (UI) 405 that can take voice commands 410 as inputs from the user 105. The voice commands 410 can be used to invoke various actions, features, and functions on a device 110, provide inputs to the systems and applications, and the like."; Column 7, lines 34-36, "As shown, the UI includes a box 810 that is configured for showing a textual representation of a received voice command or other user input."; A natural language user interface that takes voice commands as inputs from the user and shows a textual representation of a received voice command reads on processing a stream of audio data that captures at least one spoken utterance during the ongoing call to generate recognized text, where the at least one spoken utterance is of the given user or the additional user.);
identifying, based on processing the recognized text, that the at least one spoken utterance requests information for a parameter (Column 1, lines 44-50, "The digital assistant typically can be initiated by voice using a key word or phrase and then be requested to perform tasks, provide information and services, etc., using voice commands, natural language requests, or gestures in some cases. The digital assistant can respond to the request and take appropriate actions."; Column 8, lines 56-61, "After the local user initiates the digital assistant with the key phrase in this example, the user requests that the digital assistant send contact information for a restaurant to the remote user. The digital assistant responds at point 2 in the call at block 1210 in FIG. 12 by saying that the contact information will be sent to the remote user as a message."; The digital assistant responding to a request and taking appropriate actions reads on identifying that the at least one spoken utterance requests information for a parameter, and the example of the user requesting that the digital assistant send contact information demonstrates requesting information for a parameter, where the contact information reads on the information.);
determining, for the parameter and using access-restricted data that is personal to the given user, that a value, for the parameter, is resolvable (Column 6, lines 27-55, "FIG. 6 shows an illustrative taxonomy of functions 600 that may typically be supported by the digital assistant 350. Inputs to the digital assistant 350 typically can include user input 605 (in which such user input can include input from either or both the local and remote parties to a given communication), data from internal sources 610, and data from external sources 615. For example, data from internal sources 610 could include the current geolocation of the device 110 that is reported by a GPS (Global Positioning System) component on the device, or some other location-aware component. The externally sourced data 615 includes data provided, for example, by external systems, databases, services, and the like such as the service provider 130 (FIG. 1). The various inputs can be used alone or in various combinations to enable the digital assistant 350 to utilize contextual data 620 when it operates. Contextual data can include, for example, time/date, the user's location, language, schedule, applications installed on the device, the user's preferences, the user's behaviors (in which such behaviors are monitored/tracked with notice to the user and the user's consent), stored contacts (including, in some cases, links to a local user's or remote user's social graph such as those maintained by external social networking services), call history, messaging history, browsing history, device type, device capabilities, communication network type and/or features/functionalities provided therein, mobile data plan restrictions/limitations, data associated with other parties to a communication (e.g., their schedules, preferences, etc.), and the like."; Column 9, lines 12-23, "FIG. 15 depicts a screen capture of a UI 1500 that is displayed on the device of the remote user at point 3 in the call at block 1215 in FIG. 12. Here, the contact information sent by the digital assistant comes in as new message notification 1505 which is displayed at the top of the UI on the remote user's device. In this illustrative example, the notification shows the sender and a snippet of the content that is contained in the message. Typically, the remote user can launch the texting application to see the full content of the message which can include various kinds of contact information such as street address, link to website, phone number, map, etc."; The digital assistant performing functions using contextual data including the user’s stored contacts reads on using access-restricted data that is personal to the given user, and the example of the user requesting that the digital assistant send contact information demonstrates determining that a parameter is resolvable using data that is personal to the given user, where providing the contact information reads on determining that a parameter is resolvable.);
and in response to determining that the value is resolvable and without receiving any user input from the given user or the additional user: automatically resolving the value for the parameter (Column 8, lines 56-61, "After the local user initiates the digital assistant with the key phrase in this example, the user requests that the digital assistant send contact information for a restaurant to the remote user. The digital assistant responds at point 2 in the call at block 1210 in FIG. 12 by saying that the contact information will be sent to the remote user as a message."; Column 9, lines 12-23, "FIG. 15 depicts a screen capture of a UI 1500 that is displayed on the device of the remote user at point 3 in the call at block 1215 in FIG. 12. Here, the contact information sent by the digital assistant comes in as new message notification 1505 which is displayed at the top of the UI on the remote user's device. In this illustrative example, the notification shows the sender and a snippet of the content that is contained in the message. Typically, the remote user can launch the texting application to see the full content of the message which can include various kinds of contact information such as street address, link to website, phone number, map, etc."; The digital assistant sending contact information to the remote user reads on automatically resolving the value for the parameter in response to determining that the value is resolvable and without receiving any user input from the given user or the additional user.);
and automatically rendering, during the ongoing call, synthesized speech audio data that is based on the value (Column 8, lines 69-64, "The digital assistant responds at point 2 in the call at block 1210 in FIG. 12 by saying that the contact information will be sent to the remote user as a message. The generated audio in the digital assistant's response to the user's request can be heard by both the local and remote parties."; Column 13, lines 46-51, "Audio is injected into the stream of the call so that the local and remote users can hear the digital assistant acknowledge the user's request and announce the action it is taking in response to the request (i.e., whether it be sharing contact information, taking a note, adding someone to the call, etc.) in step 3030."; Injecting audio into the stream of the call so that the local and remote users can hear the digital assistant acknowledge the user's request and announce the action it is taking in response to the request reads on automatically rendering, during the ongoing call, synthesized speech audio data that is based on the value, and the example of the digital assistant responding in the call by saying that the contact information will be sent to the remote user as a message demonstrates audio data that is based on the value, where the contact information reads on the value.).
Regarding claim 2, Woolsey discloses the method as claimed in claim 1.
Woolsey further discloses:
wherein automatically rendering, during the ongoing call, the synthesized speech audio data that is based on the value is further in response to automatically resolving the value for the parameter (Column 8, lines 69-64, "The digital assistant responds at point 2 in the call at block 1210 in FIG. 12 by saying that the contact information will be sent to the remote user as a message. The generated audio in the digital assistant's response to the user's request can be heard by both the local and remote parties."; Column 13, lines 46-51, "Audio is injected into the stream of the call so that the local and remote users can hear the digital assistant acknowledge the user's request and announce the action it is taking in response to the request (i.e., whether it be sharing contact information, taking a note, adding someone to the call, etc.) in step 3030."; Injecting audio into the stream of the call so that the local and remote users can hear the digital assistant acknowledge the user's request and announce the action it is taking in response to the request reads on automatically rendering, during the ongoing call, synthesized speech audio data that is based on the value, and the example of the digital assistant responding in the call by saying that the contact information will be sent to the remote user as a message demonstrates audio data that is based on the value and in response to automatically resolving the value for the parameter, where the contact information reads on the value and sending the contact information reads on automatically resolving the value.).
Regarding claim 3, Woolsey discloses the method as claimed in claim 2.
Woolsey further discloses:
wherein automatically resolving the value for the parameter comprises: analyzing metadata of the ongoing call between the given user and the additional user (Column 6, lines 27-55, "FIG. 6 shows an illustrative taxonomy of functions 600 that may typically be supported by the digital assistant 350. Inputs to the digital assistant 350 typically can include user input 605 (in which such user input can include input from either or both the local and remote parties to a given communication), data from internal sources 610, and data from external sources 615. For example, data from internal sources 610 could include the current geolocation of the device 110 that is reported by a GPS (Global Positioning System) component on the device, or some other location-aware component. The externally sourced data 615 includes data provided, for example, by external systems, databases, services, and the like such as the service provider 130 (FIG. 1). The various inputs can be used alone or in various combinations to enable the digital assistant 350 to utilize contextual data 620 when it operates. Contextual data can include, for example, time/date, the user's location, language, schedule, applications installed on the device, the user's preferences, the user's behaviors (in which such behaviors are monitored/tracked with notice to the user and the user's consent), stored contacts (including, in some cases, links to a local user's or remote user's social graph such as those maintained by external social networking services), call history, messaging history, browsing history, device type, device capabilities, communication network type and/or features/functionalities provided therein, mobile data plan restrictions/limitations, data associated with other parties to a communication (e.g., their schedules, preferences, etc.), and the like."; Contextual data including time/date, the user's location, language, schedule, applications installed on the device, the user's preferences, the user's behaviors, stored contacts, call history, messaging history, browsing history, device type, device capabilities, communication network type, and data associated with other parties to a communication reads on metadata of the ongoing call.);
identifying, based on the analyzing, an entity associated with the additional user (Column 8, lines 56-61, "After the local user initiates the digital assistant with the key phrase in this example, the user requests that the digital assistant send contact information for a restaurant to the remote user. The digital assistant responds at point 2 in the call at block 1210 in FIG. 12 by saying that the contact information will be sent to the remote user as a message."; Column 9, lines 12-23, "FIG. 15 depicts a screen capture of a UI 1500 that is displayed on the device of the remote user at point 3 in the call at block 1215 in FIG. 12. Here, the contact information sent by the digital assistant comes in as new message notification 1505 which is displayed at the top of the UI on the remote user's device. In this illustrative example, the notification shows the sender and a snippet of the content that is contained in the message. Typically, the remote user can launch the texting application to see the full content of the message which can include various kinds of contact information such as street address, link to website, phone number, map, etc."; The digital assistant sending contact information to the remote user in response to a user request that the digital assistant send contact information to the remote user reads on identifying an entity associated with a user, where the user’s contact information reads on an entity associated with the user.);
and resolving the value based on the value being stored in association with the entity and the parameter (Column 8, lines 56-61, "After the local user initiates the digital assistant with the key phrase in this example, the user requests that the digital assistant send contact information for a restaurant to the remote user. The digital assistant responds at point 2 in the call at block 1210 in FIG. 12 by saying that the contact information will be sent to the remote user as a message."; Column 9, lines 12-23, "FIG. 15 depicts a screen capture of a UI 1500 that is displayed on the device of the remote user at point 3 in the call at block 1215 in FIG. 12. Here, the contact information sent by the digital assistant comes in as new message notification 1505 which is displayed at the top of the UI on the remote user's device. In this illustrative example, the notification shows the sender and a snippet of the content that is contained in the message. Typically, the remote user can launch the texting application to see the full content of the message which can include various kinds of contact information such as street address, link to website, phone number, map, etc."; The digital assistant sending contact information for a restaurant reads on resolving the value based on the value being stored in association with the entity and the parameter, where the user’s contact information reads on an entity associated with the user and sending the contact information for a restaurant reads on resolving the value based on the value being stored in association with the entity.).
Regarding claim 4, Woolsey discloses the method as claimed in claim 1.
Woolsey further discloses:
wherein automatically rendering, during the ongoing call, the synthesized speech audio data that is based on the value comprises: rendering the synthesized speech as part of the ongoing call (Column 8, lines 69-64, "The digital assistant responds at point 2 in the call at block 1210 in FIG. 12 by saying that the contact information will be sent to the remote user as a message. The generated audio in the digital assistant's response to the user's request can be heard by both the local and remote parties."; Injecting audio into the stream of the call so that the local and remote users can hear the digital assistant acknowledge the user's request and announce the action it is taking in response to the request reads on rendering the synthesized speech as part of the ongoing call.).
Regarding claim 5, Woolsey discloses the method as claimed in claim 4.
Woolsey further discloses:
further comprising: prior to rendering the synthesized speech audio data as part of the ongoing call: receiving, from the given user, user input to activate assistance during the ongoing call (Column 7, line 55 - Column 8, line 1, "When the user is involved in a voice or video communication with one or more remote parties, the digital assistant can be configured to be a part of the communication and perform tasks as needed. As shown in FIG. 10, the audio from the microphone 320 is split into two streams at a split point 1005 so that both the phone and video call apps 335 and 345 as well as the digital assistant 350 can receive audio signals from the user 105. Audio from the apps is combined with audio generated by the digital assistant to create a combined audio stream 1010 so that the remote user at the far end of the communication can hear what both the local user and the digital assistant say. The digital assistant exposes a listener 1015 that listens for a keyword or phrase from the user that is used to invoke the digital assistant."; Receiving a keyword or phrase from the user that is used to invoke the digital assistant during a voice communication reads on receiving user input to activate assistance during the ongoing call.);
 	and wherein automatically rendering the synthesized speech audio data is further in response to receiving the user input to activate the assistance (Column 13, lines 25-35, "In step 3005 a voice call is established between devices used by local and remote parties. The digital assistant sets up a listener so that during the call the local user can invoke the digital assistant by saying a key word or phrase in step 3010. Typically, as shown in step 3015, the digital assistant greets each of the parties on the call. As the digital assistant maintains an awareness of call context, including the identities of the parties, the greeting can be personalized by name in some cases. The greeting lets everybody know that the digital assistant is a party to the call and is ready to perform tasks and provide services."; The digital assistant greeting each of the parties on the call when invoked reads on the synthesized speech audio data being automatically rendered in response to receiving the user input to activate the assistance.).
Regarding claim 7, arguments analogous to claim 1 are applicable.  In addition, Woolsey discloses a system comprising: at least one processor; and memory storing instructions that, when executed, cause the at least one processor to be operable (Column 14, line 65 – Column 15, line 1, “Computer system 3300 includes a processor 3305, a system memory 3311, and a system bus 3314 that couples various system components including the system memory 3311 to the processor 3305.”; Column 17, lines 34-37, “More specifically, the CPU 3402 may operate as a finite-state machine, in response to executable instructions contained within the software modules disclosed herein.”) to perform the steps of claim 1.
Regarding claim 8, arguments analogous to claim 2 are applicable.
Regarding claim 9, arguments analogous to claim 3 are applicable.  
Regarding claim 10, arguments analogous to claim 4 are applicable.  
Regarding claim 11, arguments analogous to claim 5 are applicable.
Regarding claim 13, arguments analogous to claim 1 are applicable.  In addition, Woolsey discloses a non-transitory computer-readable storage medium storing instructions that, when executed, cause at least one processor to be operable to perform operations (Column 18, lines 61-65, “The memory 3520 may also be arranged as, or include, one or more computer-readable storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.”), the operations comprising the steps of claim 1.
Regarding claim 14, arguments analogous to claim 2 are applicable.
Regarding claim 15, arguments analogous to claim 3 are applicable.  
Regarding claim 16, arguments analogous to claim 4 are applicable.  
Regarding claim 17, arguments analogous to claim 5 are applicable.  
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6, 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Woolsey in view of Segalis et al. (US Patent Application Publication No. 2017/0358296), hereinafter Segalis.
Regarding claim 6, Woolsey discloses the method as claimed in claim 1, but does not specifically disclose further comprising: determining, based on processing the stream of audio data for a threshold duration of time after the at least one spoken utterance that requests information for the parameter, whether any additional spoken utterance, of the given user and received within the threshold duration, includes the value, and wherein automatically rendering, during the ongoing call, the synthesized speech audio data that is based on the value is further in response to determining that no additional spoken utterance, of the given user, is received within the threshold duration of time.
Segalis teaches:
determining, based on processing the stream of audio data for a threshold duration of time after the at least one spoken utterance that requests information for the parameter, whether any additional spoken utterance, of the given user and received within the threshold duration, includes the value, and wherein automatically rendering, during the ongoing call, the synthesized speech audio data that is based on the value is further in response to determining that no additional spoken utterance, of the given user, is received within the threshold duration of time (Paragraph 0183, lines 1-13, "In other implementations, the system hands off the phone conversation to the human user who requested the task. The system can alert the user of the in-progress phone call. The system can let the user know when there is a problem with completing the task or when the bot has been asked a question to which the bot does not know the answer. The bot may text, email, or in some other way communicate the details of the conversation for which the bot needs user input. In some implementations, the bot will wait a threshold amount of time, i.e., 5 seconds, for the user to respond before continuing the conversation without user input. Since the conversation is happening in real-time, the bot cannot wait a long period of time for user response."; The bot waiting a threshold amount of time for the user to respond when the bot has been asked a question to which the bot does not know the answer reads on determining whether an utterance is spoken for a threshold duration of time after a spoken utterance that requests information for a parameter, and continuing the conversation without user input when the user does not respond in the threshold amount of time reads on rendering synthesized speech audio data in response to determining that no additional spoken utterance is received within the threshold duration of time.).
Segalis is considered to be analogous to the claimed invention because it is in the same field of call assistance systems.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Woolsey to incorporate the teachings of Segalis to implement a bot waiting a threshold amount of time for the user to respond when the bot has been asked a question to which the bot does not know the answer and continuing the conversation without user input when the user does not respond in the threshold amount of time.  Doing so would allow for a semi-automated system to independently conduct conversations with a human during calls (Segalis; Paragraph 0039, lines 1-12).
Regarding claim 12, arguments analogous to claim 6 are applicable.
Regarding claim 18, arguments analogous to claim 6 are applicable.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Liu et al. (US Patent No. 11,238,239)
Raanani et al. (US Patent No. 10,586,539)
Vuskovic et al. (US Patent No. 9,865,260)
Jablokov et al. (US Patent No. 8,140,632)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JAMES BOGGS/Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Jul 15, 2024
Application Filed
Mar 09, 2026
Non-Final Rejection — §102, §103 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/163,848
Patent 12586600
Streaming Vocoder
2y 5m to grant Granted Mar 24, 2026
17/977,443
Patent 12573406
VOICE AUTHENTICATION BASED ON ACOUSTIC AND LINGUISTIC MACHINE LEARNING MODELS
2y 5m to grant Granted Mar 10, 2026
18/314,249
Patent 12572752
DYNAMIC CONTENT GENERATION METHOD
2y 5m to grant Granted Mar 10, 2026
18/483,896
Patent 12562170
BIOMETRIC AUTHENTICATION DEVICE, BIOMETRIC AUTHENTICATION METHOD, AND RECORDING MEDIUM
2y 5m to grant Granted Feb 24, 2026
18/131,866
Patent 12554931
Method and System of Improving Communication Skills for High Client Conversation Rate
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
60%
Grant Probability
99%
With Interview (+38.8%)
3y 3m
Median Time to Grant
Low
PTA Risk
Based on 107 resolved cases by this examiner. Grant probability derived from career allow rate.