Last updated: April 19, 2026
Application No. 18/603,683
TIME-BASED CONTEXT FOR VOICE USER INTERFACE

Non-Final OA §102§103§112
Filed
Mar 13, 2024
Examiner
NEWAY, SAMUEL G
Art Unit
2657
Tech Center
2600 — Communications
Assignee
Amazon Technologies, Inc.
OA Round
1 (Non-Final)
Interview Optional

— +7.6% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 686 resolved cases, 2023–2026
Examiner Intelligence

NEWAY, SAMUEL G View full profile →
Grants 75% — above average
Career Allow Rate
517 granted / 686 resolved
+13.4% vs TC avg
Moderate +8% lift
Without
With
+7.6%
Interview Lift
resolved cases with interview
Typical timeline
3y 0m
Avg Prosecution
29 currently pending
Career history
715
Total Applications
across all art units
Statute-Specific Performance

§101
16.6%
-23.4% vs TC avg
§103
34.5%
-5.5% vs TC avg
§102
17.1%
-22.9% vs TC avg
§112
20.1%
-19.9% vs TC avg
Black line = Tech Center average estimate • Based on career data from 686 resolved cases
Office Action

§102 §103 §112
DETAILED ACTION
This is responsive to the application filed 13 March 2024.
Claims 1-20 are pending and considered below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 11 and 19 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim 11, in line 2, recites the limitation “generating, using the metadata, an audio signal inaudible to a human”. However, no such generation using a metadata was disclosed in the original specification.
Claim 19 recites a similar limitation and is likewise rejected.


The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1, in line 14, recites the limitation “the metadata”. It is unclear if the limitation refers back to the metadata of line 6 or the one of line 9. The limitation will be interpreted as ‘the metadata detected in media data’.
Independent claims 5 and 13 as well as dependent claims 3, 9, 11, 17 and 19 suffer from similar deficiencies and are likewise rejected.
All dependent claims are rejected for depending upon a rejected claim without providing a remedy.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-2, 5-8, 12-16 and 20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Chen et al. (US 2019/0370843).
Claim 5:
Chen discloses a computer-implemented method comprising: 
receiving additional data (sponsored content or advertisement) corresponding to at least a first portion of first media data to be output by a first device; generating a modified first portion of the first media data that includes metadata (code/tag) representing a unique identifier corresponding to the additional data (“determine an appropriate sponsored content or advertisement, which can be combined or otherwise associated with a particular stream or session of media content playback”, [0056], see also “a code/tag can be used to retrieve a particular audio advertisement or other type of advertisement either from the media server, or from the advertisement server”, [0063] and “providing the code/tag to the client, which the client can then use to request the corresponding content from the media server”, [0064]); 
sending, to the first device, a first directive (a user response prompt) to send back metadata detected in media data by the first device (“advertising audio files include an advertising portion and a user response prompt. In some implementations, the advertising portion is a separate audio file from the user response prompt. The user response prompt typically includes an intent that the user should speak if they are interested in the advertisement content”, [0068], see also “analyzing the intent transmission received from the voice interactive device to recognize identifying data or metadata indicating that an advertisement was playing during the voice intent”, [0070] where metadata is included in the intent transmitted by the voice interactive device); 
causing the first device to present a first output based on the first media data (“A voice interactive device plays the audio content through one or more speakers that can be part of the voice interactive device or that are in communication with the voice interactive device. In one embodiment, and as discussed below at least with reference to FIG. 5, audio content files transmitted to the voice interactive device are coded such that the voice interactive device cannot discern which files are advertisements”, [0067]); 
receiving, from the first device, an indication that the first device has queued the modified first portion for output, the indication including the metadata detected in media data (“a determination (operation 306) is made whether the voice intent was said during an advertisement. Determining whether the voice intent was said during an advertisement (operation 306) can be performed in various ways. For example, operation 306 can include comparing a time stamp on the intent with audio file queues including audio advertisement files. As another example, operation 306 can include analyzing the intent transmission received from the voice interactive device to recognize identifying data or metadata indicating that an advertisement was playing during the voice intent”, [0070]); and 
in response to receiving the indication, enabling a language processing component configured to process user commands to use the additional data (“If the voice intent was said during advertising audio playback, then one or more advertisement response-based results are provided to the user (operation 310). Operation 310 can include parsing the intent to interpret the user's signal. During operation 310, the intent can be redirected from an audio content response to an advertising response. Advertising responses can include various results, such as delivering a message to a device associated with the user and sending an email to an account associated with the user.”, [0072]).
Claim 6:
Chen discloses the computer-implemented method of claim 5, further comprising: in response to receiving the indication, identifying, using the additional data, an image for display during presentation of the modified first portion (“Results can be coupons or hyperlinks to coupons, such as a 50% discount on tickets to an advertised concert, or a free taco from the restaurant advertised. Coupons can be presented as codes, such as bar codes and/or QR codes”, [0083]); and causing the first device to display the image during a window of time corresponding to presentation of the modified first portion (“After providing the result (operation 408), one or more additional non-advertising media files can be transmitted for playback by the voice interactive device”, [0085]).
Claim 7:
Chen discloses the computer-implemented method of claim 6, further comprising: in response to receiving the indication, identifying, using the additional data, a uniform resource locator (URL) identifying an online resource corresponding to the modified first portion; and causing the first device to associate the URL with an element of the image such that selection of the element causes the first device to access the online resource (“Results can be coupons or hyperlinks to coupons, such as a 50% discount on tickets to an advertised concert, or a free taco from the restaurant advertised. Coupons can be presented as codes, such as bar codes and/or QR codes”, [0083]).
Claim 8:
Chen discloses the computer-implemented method of claim 5, further comprising: receiving audio data representing an utterance; determining that the audio data was received during presentation of the modified first portion; in response to determining that the audio data was received during presentation of the modified first portion, determining, using the additional data, language processing data corresponding to the modified first portion; and performing an action using the language processing data (“If the voice intent was said during advertising audio playback, then one or more advertisement response-based results are provided to the user (operation 310). Operation 310 can include parsing the intent to interpret the user's signal. During operation 310, the intent can be redirected from an audio content response to an advertising response. Advertising responses can include various results, such as delivering a message to a device associated with the user and sending an email to an account associated with the user.”, [0072]).
Claim 12:
Chen discloses the computer-implemented method of claim 5, wherein generating the modified first portion includes appending the unique identifier to the first portion of the media data (“determine an appropriate sponsored content or advertisement, which can be combined or otherwise associated with a particular stream or session of media content playback”, [0056], see also “a code/tag can be used to retrieve a particular audio advertisement or other type of advertisement either from the media server, or from the advertisement server”, [0063] and “providing the code/tag to the client, which the client can then use to request the corresponding content from the media server”, [0064]).
Claim 1:
Chen discloses a computer-implemented method comprising: 
determining first audio data to send to a first device in response to a user request (“a user 192 can interact 194 with the user interface at voice interactive device 16, and issue requests to access media content, for example the playing of a selected music or video item at their device, or at a controlled device, or the streaming of a media channel or video stream to their device, or to a controlled device”, [0052]); 
determining second audio data, independent of the first audio data, to send to the first device in conjunction with the first audio data; receiving additional data (sponsored content or advertisement) corresponding to the second audio data; generating modified second audio data that includes metadata (code/tag) representing a unique identifier corresponding to the additional data (“determine an appropriate sponsored content or advertisement, which can be combined or otherwise associated with a particular stream or session of media content playback”, [0056], see also “a code/tag can be used to retrieve a particular audio advertisement or other type of advertisement either from the media server, or from the advertisement server”, [0063] and “providing the code/tag to the client, which the client can then use to request the corresponding content from the media server”, [0064]); 
causing the first device to present a first output based on the first audio data (“A voice interactive device plays the audio content through one or more speakers that can be part of the voice interactive device or that are in communication with the voice interactive device. In one embodiment, and as discussed below at least with reference to FIG. 5, audio content files transmitted to the voice interactive device are coded such that the voice interactive device cannot discern which files are advertisements”, [0067]);
sending, to the first device, a first directive (a user response prompt) to send back metadata detected in audio data by the first device (“advertising audio files include an advertising portion and a user response prompt. In some implementations, the advertising portion is a separate audio file from the user response prompt. The user response prompt typically includes an intent that the user should speak if they are interested in the advertisement content”, [0068], see also “analyzing the intent transmission received from the voice interactive device to recognize identifying data or metadata indicating that an advertisement was playing during the voice intent”, [0070] where metadata is included in the intent transmitted by the voice interactive device); 
causing the first device to present a second output based on the modified second audio data (“A voice interactive device plays the audio content through one or more speakers that can be part of the voice interactive device or that are in communication with the voice interactive device. In one embodiment, and as discussed below at least with reference to FIG. 5, audio content files transmitted to the voice interactive device are coded such that the voice interactive device cannot discern which files are advertisements”, [0067]); 
receiving, from the first device, an indication that the first device has queued the modified second audio data for output, the indication including the metadata (“a determination (operation 306) is made whether the voice intent was said during an advertisement. Determining whether the voice intent was said during an advertisement (operation 306) can be performed in various ways. For example, operation 306 can include comparing a time stamp on the intent with audio file queues including audio advertisement files. As another example, operation 306 can include analyzing the intent transmission received from the voice interactive device to recognize identifying data or metadata indicating that an advertisement was playing during the voice intent”, [0070]); 
in response to receiving the metadata, identifying, using the additional data, first language processing data corresponding to the modified second audio data; and enabling a language processing component configured to process user commands received from the first device to use the first language processing data (“If the voice intent was said during advertising audio playback, then one or more advertisement response-based results are provided to the user (operation 310). Operation 310 can include parsing the intent to interpret the user's signal. During operation 310, the intent can be redirected from an audio content response to an advertising response. Advertising responses can include various results, such as delivering a message to a device associated with the user and sending an email to an account associated with the user.”, [0072]).
Claim 2:
Chen discloses the computer-implemented method of claim 1, further comprising: in response to receiving the metadata, identifying, using the additional data, an image to display during output of the modified second audio data (“Results can be coupons or hyperlinks to coupons, such as a 50% discount on tickets to an advertised concert, or a free taco from the restaurant advertised. Coupons can be presented as codes, such as bar codes and/or QR codes”, [0083]); and sending, to the first device, a second directive to display the image during a window of time corresponding to output of the modified second audio data (“After providing the result (operation 408), one or more additional non-advertising media files can be transmitted for playback by the voice interactive device”, [0085]).
Claims 13-16 and 20:
Chen discloses a system, comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor ([0100]), cause the system to perform the steps of process claims 5-8 and 12 as shown above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Chen et al. (US 2019/0370843) in view of Wright et al. (US 2013/0198792).
Claim 11:
Chen discloses the computer-implemented method of claim 5, but does not explicitly disclose generating, using the metadata, an audio signal inaudible to a human, wherein: generating the modified first portion includes modifying audio data of the first portion to include the audio signal, and the first device decodes the audio signal to determine the unique identifier.
In an analogous art similarly generating a modified first portion, Wright discloses generating, using metadata, an audio signal inaudible to a human, wherein: generating the modified first portion includes modifying audio data of the first portion to include the audio signal, and the first device decodes the audio signal to determine the unique identifier (“A code or watermark is information embedded in or included with signals that, when decoded, conveys the information. Codes or watermarks are often embedded in a signal so as to be imperceptible to a human (e.g., an inaudible code or watermark embedded in an audio signal). Codes or watermarks may include metadata (e.g., data identifying the content and/or information associated with the content) that is contained within a data stream of the content”, [0022]).
It would have been obvious to one with ordinary skill in the art before the effective filing date of the claimed invention to combine the references to yield the predictable result of using audio watermarks to represent Chen’s metadata because such is a well-known practice long implemented in standards such as MPEG (see Wright, [0022]).
Claim 19:
Chen discloses a system, comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor ([0100]), cause the system to perform the steps of process claim 11 as shown above.

Allowable Subject Matter
Claims 3-4, 9-10 and 17-18 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: 
The prior art of record, individually or in combination, does not disclose the computer-implemented method of claim 5, further comprising: in response to receiving the indication, determining that the first device will present the first output during a window of time following a first time corresponding to detection, by the first device, of the metadata; receiving, from the first device, audio data representing an utterance; determining that the audio data was received at a second time corresponding to the window of time; in response to determining that the audio data was received at a second time corresponding to the window of time, determining, using the additional data, a named entity corresponding to the modified first portion; and performing an action using the named entity, or
the computer-implemented method of claim 5, further comprising: determining first media content responsive to a user request; determining second media content, independent of the first media content, for presentation with the first media content, the first portion corresponding to the second media content; sending, to the first device, a plurality of identifiers corresponding to portions of media data to be presented sequentially, the plurality of identifiers including at least a first identifier corresponding to the modified first portion and a second identifier corresponding to a second portion of the first media data, the second portion corresponding to the first media content; receiving, from the first device, a first request for media content, the first request including the first identifier; and in response to receiving the first request, sending the modified first portion to the first device, wherein the first device: stores the first portion in a first buffer for an undetermined interval of time, and sends the indication upon storing the modified first portion in a second buffer in preparation for output, or
the computer-implemented method of claim 1, further comprising: in response to receiving the metadata, determining that the first device will output the modified second audio data during a window of time following a first time corresponding to detection, by the first device, of the metadata; receiving, from the first device, third audio data representing an utterance; processing the third audio data to determine an intent; determining that the third audio data was received at a second time corresponding to the window of time; in response to determining that the third audio data was received at a second time corresponding to the window of time, determining, using the additional data, a named entity corresponding to the modified second audio data; and performing an action using the intent and the named entity, or
the computer-implemented method of claim 1, further comprising: identifying an insertion point in the first audio data, the insertion point representing a place where secondary content is to be inserted into the first audio data; generating a plurality of identifiers corresponding to segments of audio data to be output sequentially, the plurality of identifiers including: a first identifier corresponding to a first portion of the first audio data before the insertion point, a second identifier corresponding to the modified second audio data, and a third identifier corresponding to a second portion of the first audio data after the insertion point; sending the plurality of identifiers to the first device; receiving, from the first device, a first request for audio content, the first request including the second identifier; and in response to receiving the first request, sending the modified second audio data to the first device, wherein the first device: stores the modified second audio data in a first buffer for an undetermined interval of time, and sends the indication upon queuing the modified second audio data in a second buffer in preparation for output.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Ghadi et al. (US 2016/0316233) discloses a method for inserting advertisements into a media program and tracking a delivery of the advertisements. The method includes the following steps: (i) generating a request for the media program; (ii) receiving the request from a user device for retrieving a plurality of advertisements; (iii) obtaining a start time and an end time of a plurality of advertisement slots associated with the media program; (iv) selecting advertisements for delivering at the plurality of advertisement slots within the media program; (v) splitting the media program into a plurality of sub-programs; (vi) splitting the advertisements into a plurality of ad chunks; (vii) creating an URL playlist by stitching URL's of the plurality of sub-programs with URL's of the plurality of ad chunks at the plurality of advertisement slots; and (viii) obtaining the URL playlist to fetch and play content associated with each URL in the URL playlist sequentially.
McCoy et al. (US 2014/0115625) discloses certain aspects of a system and method for inserting an advertisement in a media stream may include a content access server. The content access server may receive the media stream from one or more content providers. The media stream may comprise one or more pre-determined positions for inserting the advertisement. The content access server may insert the advertisement in the media stream at one of the one or more pre-determined positions. The advertisement is selected from one or more advertisements in real-time based on a first metadata associated with the media stream, a location of the one or more pre-determined positions in the media stream, and one or more parameters associated with a user.
Jonnadula et al. (US 2016/0119661) discloses techniques for on-demand metadata insertion into single-stream content. In one or more implementations, media content is obtained responsive to a request. The media content can be included in a content stream that also includes alternate content that is spliced into the content stream. Metadata is injected into the content stream at runtime in association with a starting point of the alternate content. The metadata can enable a media player to identify the alternate content and a location of the alternate content within the content stream. The content stream is then transmitted as a single stream to the media player for playback of both the media content and the alternate content.
McLeod et al. (US 2016/0189223) describes techniques for streaming digital media content, such as music, video, or television content. In accordance with an embodiment, the system includes support for delivery of media content with enhanced user-sponsor interaction. User interaction with a media device can be provided by, for example, voice or tactile command, in addition or as an alternative to the device's regular user interface. For example, a user can interact with an advertisement or other sponsor-directed content, by speaking to or shaking their device, to signal a preference for a particular type of content or advertisement. As another example, a spoken or shake action can be used to trigger or to pause an advertisement break within a media stream, so that the user can control advertisement breaks to better suit their particular lifestyle.
Gillespie et al. (US 2019/0206405) discloses a method for resolving entities using multi-modal functionality. Voice activated electronic devices may, in some embodiments, be capable of displaying content using a display screen. Contextual metadata representing the content rendered by the display screen may describe entities having similar attributes as an identified intent from natural language understanding processing. When natural language understanding processing attempts to resolve one or more declared slots for a particular intent, matching slots from the contextual metadata may be determined, and the matching entities may be placed in an intent selected context file to be included with the natural language understanding's output data. The output data may be provided to a corresponding application for causing one or more actions to be performed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL G NEWAY whose telephone number is (571)270-1058. The examiner can normally be reached Monday-Friday 9:00am-5:00pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached at 571-272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SAMUEL G NEWAY/Primary Examiner, Art Unit 2657
Read full office action
Prosecution Timeline

Mar 13, 2024
Application Filed
Feb 13, 2026
Non-Final Rejection — §102, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/067,086
Patent 12602538
METHOD AND SYSTEM FOR EXEMPLAR LEARNING FOR TEMPLATIZING DOCUMENTS ACROSS DATA SOURCES
2y 5m to grant Granted Apr 14, 2026
18/100,645
Patent 12603177
INTERACTIVE CONVERSATIONAL SYMPTOM CHECKER
2y 5m to grant Granted Apr 14, 2026
18/642,010
Patent 12603092
AUTOMATED ASSISTANT CONTROL OF NON-ASSISTANT APPLICATIONS VIA IDENTIFICATION OF SYNONYMOUS TERM AND/OR SPEECH PROCESSING BIASING
2y 5m to grant Granted Apr 14, 2026
18/146,276
Patent 12596734
PARSE ARBITRATOR FOR ARBITRATING BETWEEN CANDIDATE DESCRIPTIVE PARSES GENERATED FROM DESCRIPTIVE QUERIES
2y 5m to grant Granted Apr 07, 2026
18/385,484
Patent 12596892
MACHINE TRANSLATION SYSTEM FOR ENTERTAINMENT AND MEDIA
2y 5m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
75%
Grant Probability
83%
With Interview (+7.6%)
3y 0m
Median Time to Grant
Low
PTA Risk
Based on 686 resolved cases by this examiner. Grant probability derived from career allow rate.
TIME-BASED CONTEXT FOR VOICE USER INTERFACE

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email