Last updated: April 19, 2026
Application No. 17/073,282
METHOD AND SYSTEM FOR ENABLING AN INTERACTION OF A USER WITH ONE OR MORE ADVERTISEMENTS WITHIN A PODCAST

Non-Final OA §101§103§112
Filed
Oct 16, 2020
Examiner
ANSARI, AZAM A
Art Unit
3621
Tech Center
3600 — Transportation & Electronic Commerce
Assignee
Affle International Pte. Ltd.
OA Round
7 (Non-Final)
This examiner grants 48% of cases after interview

— +49.7% interview lift. A telephonic interview to clarify the technical implementation could significantly improve the outcome.
Based on 338 resolved cases, 2023–2026
Examiner Intelligence

ANSARI, AZAM A View full profile →
Grants 48% of resolved cases
Career Allow Rate
162 granted / 338 resolved
-4.1% vs TC avg
Strong +50% interview lift
Without
With
+49.7%
Interview Lift
resolved cases with interview
Typical timeline
3y 8m
Avg Prosecution
38 currently pending
Career history
376
Total Applications
across all art units
Statute-Specific Performance

§101
34.2%
-5.8% vs TC avg
§103
38.9%
-1.1% vs TC avg
§102
8.1%
-31.9% vs TC avg
§112
9.2%
-30.8% vs TC avg
Black line = Tech Center average estimate • Based on career data from 338 resolved cases
Office Action

§101 §103 §112
DETAILED ACTION
Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/03/2026 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .   

Examiner’s Comment
This Action is in response to the Request for Continued Examination filed on 02/03/2026 with Amended Claims and Applicant's Remarks filed on 02/03/2026.
Applicant has amended claims 1, 11, and 20 according to Amendments filed on 02/03/2026. Claims 1-5, 8-15, and 17-20 are pending and currently under consideration for patentability.


Claim Rejections - 35 USC § 112(b)
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-5, 8-15, and 17-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. Independent Claims 1, 11, and 20 recite the element "the determined halts" in the claim limitation; identifying, at the advertisement interaction system with the processor, one or more insertion points and one or more advertisement slots based on the determined halts in the podcast and contextual relevance of subject matter of the podcast. There is insufficient antecedent basis for this limitation in the claim.


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-5, 8-15, and 17-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter. The claims are directed to a judicial exception (i.e., a law of nature, natural phenomenon, or abstract idea) without significantly more.
Step 1:	In a test for patent subject matter eligibility, claims 1-5, 8-15, and 17-20 are found to be in accordance with Step 1 (see 2019 Revised Patent Subject Matter Eligibility), as they are related to a process, machine, manufacture, or composition of matter. Claims 1-5, 8-10 recite a method, claims 11-15, 17-19 recite a system, and claim 20 recites a non-transitory computer-readable storage medium. When assessed under Step 2A, Prong I, they are found to be directed towards an abstract idea. The rationale for this finding is explained below: 
Step 2A, Prong I: Under Step 2A, Prong I, claims 1, 11, and 20 are directed to an abstract idea without significantly more, as they all recite a judicial exception. Claims 1, 11, and 20 recite limitations directed to the abstract idea including receiving a first set of data associated with the podcast, wherein the first set of data is received from a podcast publisher, wherein the podcast is uploaded by the podcast publisher; collecting a second set of data associated with the one or more advertisements, wherein the second set of data is collected from an advertiser; fetching a third set of data associated with a communication device of the user, wherein the user accesses the podcast […] in real-time; gathering a fourth set of data associated with the user accessing the podcast […], wherein the fourth set of data comprises user image data, user verbal commands, and relationship status data associated with the user; analyzing the first set of data, the second set of data, the third set of data, and the fourth set of data, wherein the analysis is performed for identifying one or more triggers and a plurality of attributes associated with the one or more advertisements, the podcast, and the user and for predicting behavior and journey of the user, wherein the analysis is performed in real time; identifying the one or more triggers for enabling the interaction of the user with the one or more advertisements within the podcast in real-time and the plurality of attributes, wherein the one or more triggers comprises at least one of system generated triggers, user generated triggers, and advertiser generated triggers, and wherein the plurality of attributes comprises one or more keywords associated with the one or more advertisements and the podcast, topic transitions within the podcast, halts in the one or more advertisements and the podcast, relevant context of the one or more advertisements, an optimal time for the one or more triggers, an optimal position for the one or more advertisements, a threshold time of the halt within the one or more advertisements and the podcast, interests of the user, and user attentiveness throughout the podcast; and identifying one or more insertion points and one or more advertisement slots based on halts in the podcast and contextual relevance of subject matter of the podcast; and identifying one or more insertion points and one or more advertisement slots based on the halts in the podcast and contextual relevance of subject matter of the podcast; and initializing the interaction between the user and the one or more advertisements in real-time, wherein the interaction between the user and the one or more advertisements is initiated based on the identification of the one or more triggers and the plurality of attributes. These further limitations are not seen as any more than the judicial exception. Claims 1, 11, and 20 recite additional limitations including at an advertisement interaction system with a processor, with/through a communication device, Computer-readable memory(s); “analysis performed by the one or more machine learning algorithms and the natural language processing module”; and “performing, by a natural language processing module, a multi-modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription; and analyzing, using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, the first set of data, the second set of data, the third set of data, and the fourth set of data.” Initializing an interaction of a user with one or more advertisements within a podcast is considered to be an abstract idea, specifically, certain methods of organizing human activity; such as commercial interactions, advertising, marketing, and sales. Furthermore, the analysis falls under another abstract idea, specifically mental processes; such as concepts performed in the human mind (including an observation, evaluation, judgement, opinion) because receiving first set of data; collecting second set of data; fetching third set of data; gathering fourth set of data; analyzing first, second, third, and fourth data; identifying triggers within the podcast based on the analysis; identifying insertion points and advertisement slots based on identified triggers / determined halts; and initializing interaction between the user and the advertisements by inserting the advertisements at the identified insertion points are all concepts that can be performed by a user in their mind with pen and paper and the necessary information. Therefore, under Step 2A, Prong I, claims 1, 11, and 20 are directed towards an abstract idea. 
Step 2A, Prong II: Step 2A, Prong II is to determine whether any claim recites any additional element that integrate the judicial exception (abstract idea) into a practical application. Claims 1, 11, and 20 recite additional limitations including at an advertisement interaction system with a processor, with/through a communication device, Computer-readable memory(s); “analysis performed by the one or more machine learning algorithms and the natural language processing module”; and “performing, by a natural language processing module, a multi-modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription; and analyzing, using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, the first set of data, the second set of data, the third set of data, and the fourth set of data.” The additional limitations reciting – “at an advertisement interaction system with a processor, with/through a communication device, Computer-readable memory(s); “analysis performed by the one or more machine learning algorithms and the natural language processing module”; and “analyzing, using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, the first set of data, the second set of data, the third set of data, and the fourth set of data”” are recited in a manner that merely uses the computer (i.e. advertisement interaction system or communication device and machine learning algorithm) as the tool to perform the abstract idea. These additional elements in claims 1, 11, and 20 are not found to integrate the judicial exception into a practical application because alone, and in combination, these additional elements are seen as adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f), adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g), and generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). The additional limitations do no more than link the judicial exception to a particular technological environment or field of use, i.e. system/device, and therefore do not integrate the abstract idea into a practical application. The courts decided that although the additional elements did limit the use of the abstract idea, the court explained that this type of limitation merely confines the use of the abstract idea to a particular technological environment and this fails to add an inventive concept to the claims (See Affinity Labs of Texas v. DirecTV, LLC,). Under Step 2A, Prong II, these claims remain directed towards an abstract idea. 
Step 2B: Claims 1, 11, and 20 recite additional limitations including at an advertisement interaction system with a processor, with/through a communication device, Computer-readable memory(s); “analysis performed by the one or more machine learning algorithms and the natural language processing module”; and “performing, by a natural language processing module, a multi-modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription; and analyzing, using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, the first set of data, the second set of data, the third set of data, and the fourth set of data.” The additional limitations reciting - “at an advertisement interaction system with a processor, with/through a communication device, Computer-readable memory(s); “analysis performed by the one or more machine learning algorithms and the natural language processing module”; and “analyzing, using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, the first set of data, the second set of data, the third set of data, and the fourth set of data”” do not integrate the judicial exception (abstract idea) into a practical application because of the analysis provided in Step 2A, Prong II. Claims 1, 11, and 20 also recite additional limitations – “performing, by a natural language processing module, a multi-modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription.” However, merely applying a well-known technique (i.e. multi-modal natural language analysis or natural language processing to data (i.e. advertisements, podcasts, and user verbal commands) in order to transcript the data into speech and non-speech data is a limitation that is not indicative of integration into a practical application because it is essentially adding insignificant extra-solution activity to the judicial exception. Applying natural language processing to transcript data into speech and non-speech represents a well-understood, routine, conventional (“WURC”) activity because it has been known that “In more detail, conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules may be optimized independently. This specification formulates audio to semantic understanding as a sequence-to-sequence problem." In ¶ [0067] of U.S. Publication 2021/0090570 to Aharoni. Furthermore, applying machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model to analyze data is a limitation that is not indicative of integration into a practical application because it is essentially adding insignificant extra-solution activity to the judicial exception. Applying machine learning algorithms trained using supervised and/or unsupervised learning techniques to analyze data represents a well-understood, routine, conventional (“WURC”) activity because it has been known that “It is noted that the semantic entity relation detection classifier training implementations described herein can perform the classifier training/learning using any semi-supervised or unsupervised machine learning method such as a conventional logistic regression method, or a conventional decision trees method, or a conventional support vector machine method, among other types of machine learning methods. It is also noted that the semantic entity relation detection classifier training implementations can be used to train a variety of classifiers including a conventional support vector machine, or a conventional artificial neural network, or a conventional Bayesian statistical classifier, among other types of classifiers.” In ¶ [0070] of U.S. Publication 2017/0068903 to Hakkani-Tur. Therefore, the independent claims do not include additional elements or a combination of elements that result in the claims amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements listed amount to no more than mere instructions to apply an exception using a generic computer component. In addition, the applicant’s specifications describe generic computer-based elements, Page 13 Lines 10-12 and Page 20 Lines 5-7, for implementing the advertisement interaction system and communication device, which do not amount to significantly more than the abstract idea of itself, which is not enough to transform an abstract idea into eligible subject matter. Furthermore, there is no improvement in the functioning of the computer or technological field, and there is no transformation of subject matter into a different state. Under Step 2B in a test for patent subject matter eligibility, these claims are not patent eligible. 
Dependent claims 2-5, 8-10 and 12-15, 17-19 further recite the method and system of claims 1 and 11, respectively. Dependent claims 2-5, 8-10, 12-15, and 17-19 when analyzed as a whole are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitation fail to establish that the claims are not directed to an abstract idea: 
Under Step 2A, Prong I, these additional claims only further narrow the abstract idea set forth in claims 1, 11, and 20. For example, claims 2-5, 8-10, 12-15, and 17-19 further describe the limitations for initializing/enabling an interaction of a user with one or more advertisements within a podcast – which is only further narrowing the scope of the abstract idea recited in the independent claims.  
Under Step 2A, Prong II, for dependent claims 2-5, 8-10, 12-15, and 17-19, there are no additional elements introduced. Thus, they do not present integration into a practical application, or amount to significantly more.
Under Step 2B, the dependent claims do not include any additional elements that are sufficient to amount to significantly more than the judicial exception. Additionally, there is no improvement in the functioning of the computer or technological field, and there is no transformation of subject matter into a different state. As discussed above with respect to integration of the abstract idea into a practical application, the additional claims do not provide any additional elements that would amount to significantly more than the judicial exception. Under Step 2B, these claims are not patent eligible.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-5, 8-15, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Publication 2016/0292736 to Yruski in view of U.S. Publication 2011/0258049 to Ramer and in further view of U.S. Publication 2014/0108309 to Frank.

Claims 1-10, 11-19, and 20 are method, system, and computer-readable media claims, respectively, with substantially indistinguishable features between each group.  For purposes of compact prosecution, the Office has grouped the common method, system and non-transitory computer readable storage medium claims in applying applicable prior art.

With respect to Claim 1:
Yruski teaches:
A computer-implemented method for enabling an interaction of a user with one or more advertisements within a podcast, the computer-implemented method comprising (Yruski: ¶ [0012]):
receiving, at an advertisement interaction system with a processor, a first set of data associated with the podcast, wherein the first set of data is received from a podcast publisher, wherein the podcast is uploaded by the podcast publisher (i.e. receiving content metadata about the podcast from the content manager who uploads the podcast) (Yruski: ¶ [0045] “A content publisher manager 115 inputs metadata and rules concerning content via block 106. The content metadata and rules also may serve as control information for use in determining which users and/or which content to match to ads in an ad campaign. The content metadata may include information about the content, for example.”);
collecting, at the advertisement interaction system with the processor, a second set of data associated with the one or more advertisements, wherein the second set of data is collected from an advertiser (i.e. collecting ad content metadata about the advertisements from the ad campaign manager) (Yruski: ¶ [0045] “In operation, an ad campaign manager 113 inputs metadata and rules concerning an ad campaign via block 102. The ad campaign metadata and rules may serve as control information for use in determining which users and/or which content to match to ads in an ad campaign. The ad metadata may include information about the ads in the campaign, for example.”);
fetching, at the advertisement interaction system with the processor, a third set of data associated with a communication device of the user, wherein the user accesses the podcast using the communication device in real-time (i.e. fetching usage data with respect to user accessing podcast with the user device) (Yruski: ¶ [0045] “The user profiles block includes an aggregation of information concerning usage, preferences, geographic information, demographic information about users. For instance, user information may be included in user profiles for individual users or for groups of users or for user organizations, for example. The user profiles may serve as control information for use in determining which users and/or which content to match to ads in an ad campaign.” Furthermore, as cited in ¶ [0051] “A user profile information storage repository 226 stores user profile-related information. For the content received by or subscribed to by a given user device 206 and for the user profile information associated with the given user device 206, a match server 227 reconciles the stored ad campaign-related information, the stored content-related information and corresponding user profile-related information, in order to identify which ads to send to the user device 206.”);
gathering, at the advertisement interaction system with the processor, a fourth set of data associated with the user accessing the podcast through the communication device […] (i.e. gathering usage data with respect to user accessing podcast with the user device) (Yruski: ¶ [0045] “The user profiles block includes an aggregation of information concerning usage, preferences, geographic information, demographic information about users. For instance, user information may be included in user profiles for individual users or for groups of users or for user organizations, for example. The user profiles may serve as control information for use in determining which users and/or which content to match to ads in an ad campaign.” Furthermore, as cited in ¶ [0051] “A user profile information storage repository 226 stores user profile-related information. For the content received by or subscribed to by a given user device 206 and for the user profile information associated with the given user device 206, a match server 227 reconciles the stored ad campaign-related information, the stored content-related information and corresponding user profile-related information, in order to identify which ads to send to the user device 206.”);
analyzing, at the advertisement interaction system with the processor using one or more [[machine learning]] algorithms […], the first set of data, the second set of data, the third set of data, and the fourth set of data […] (i.e. analyzing podcast data, advertisement data, usage data, and device data using complex algorithms) (Yruski: ¶¶ [0178] [0179] “In this example, for each 100 ads delivered to user devices for the campaign, 40 will be Ad1, 35 will beAd2 and 25 will be Ad3. As explained below, for yield management more complex algorithms may be used by an ad campaign provider to decide which ad goes out to which user for insertion into which content feed…Generally speaking ad distribution criteria (i.e., determining which ads go to what user device and how often) may consider user attributes from storage repository 404b, as factors such as, (1) user behavior and usage, (2) user preferences and inclinations ( e.g., target ads based upon user profiles), (3) consumption predictions (e.g., for 1000 bought ads, how many impressions do we need to download to the users 1000.times. consumption factor?) Thus, in general, different users might receive different ones or combinations of Ad1, Ad2 and Ad3.”);
identifying, at the advertisement interaction system with the processor, the one or more triggers for enabling the interaction of the user with the one or more advertisements within the podcast in real-time and the plurality of attributes based on the analysis performed by the one or more [[machine learning]] algorithms and [[the natural language processing module]], wherein the one or more triggers comprising at least one of system generated triggers, user generated triggers and advertiser generated triggers (i.e. identifying rules when to dynamical insert ads into podcast) (Yruski: ¶¶ [0230] [0231] “For example, ads may be inserted dynamically in order to achieve a degree of smart ad rotation. These criteria may include factors such as time of day during which an ad provider wants the ad to play. The insertion information also may include ad rotation rules such as, play Ad1 in the morning; play Ad2 in the afternoon; and play Ad3 in the evening. Client-side implementation of placement rules in accordance with insertion information may require the agent to access context information from the device concerning context in which content or ads are actually played. The context may be time of day during which, or location of play (obtained from a user's zip code, user's IP address or GPS device for example) at which, a device user presses a button to actually play a content file. The agent inserts an ad dynamically that is appropriate to the context in accordance with the rules criteria specified by the insertion information…The consumer can find the previously listened/viewed advertisement/offer in the past 30 days or 60 days, for example. This enables the consumer to not miss out any personalized offers (ex. 30% off dinner or dress) sent their way. The consumer then can take follow up actions such as go to the web to get further information about the "short" advertisement or making an online purchase; Making a phone call to the number indicated on the recalled advertisement; or make a VOIP call through the computer interface.”), and
wherein the plurality of attributes comprises one or more keywords associated with the one or more advertisements and the podcast, topic transitions within the podcast, halts in the one or more advertisements and the podcast, relevant context of the one or more advertisements, an optimal time for the one or more triggers, an optimal position for the one or more advertisements, a threshold time of the halt within the one or more advertisements and the podcast, interests of the user, and user attentiveness throughout the podcast (i.e. analyzing podcast data, advertisement data, usage data, and device data using complex algorithms) (Yruski: ¶¶ [0178] [0179] “In this example, for each 100 ads delivered to user devices for the campaign, 40 will be Ad1, 35 will beAd2 and 25 will be Ad3. As explained below, for yield management more complex algorithms may be used by an ad campaign provider to decide which ad goes out to which user for insertion into which content feed…Generally speaking ad distribution criteria (i.e., determining which ads go to what user device and how often) may consider user attributes from storage repository 404b, as factors such as, (1) user behavior and usage, (2) user preferences and inclinations ( e.g., target ads based upon user profiles), (3) consumption predictions (e.g., for 1000 bought ads, how many impressions do we need to download to the users 1000.times. consumption factor?) Thus, in general, different users might receive different ones or combinations of Ad1, Ad2 and Ad3.” Furthermore, ¶¶ [0063]-[0076] describe the attributes of the content to include image/video/audio data about the content, the podcast content, keywords to describe podcast, information about the podcast publisher, and short description of podcast. Furthermore, as cited in ¶¶ [0106] [0107] “At time=t4, the podcast application, such as iTunes initiates an update of content associated with the RSS feed. The request is intercepted by the ad insertion module 706, which includes a listener 812 on localhost (127.0.0.1). The listener is a part of the plug-in code, which listens on the local host IP 127.0.0.1 and intercept calls by the iTunes application (or any other media manger application). At time=tS, the ad insertion module 706 forwards the intercepted request over the network 116 to the content server 214. At time=t6, the content server 806 receives the request sent by the ad insertion module 706…At time=t7, the content server 806 returns the requested content over the network to the ad insertion module 406. At time=t8, the ad insertion module 706 receives tl1e requested content update. At time=t9, ads stored in ad information storage buffer 704 are combined with the content delivered by the content server 214. At time=t10, ad-infused content is streamed to be played.”); 
identifying, at the advertisement interaction system with the processor, one or more insertion points and one or more advertisement slots based on the determined halts in the podcast and contextual relevance of subject matter of the podcast (i.e. identifying insertion points for ad slots based on halts or contextual relevance) (Yruski: ¶¶ [0191] [0192] “An ad <slot> item describes ad slot characteristics, which may include location where ad is to be inserted within content and style of the ad. As to location, in the first illustrated example in FIG. 23, one ad is permitted every 15 minutes, (at times 0, 15 minutes, 30 minutes and 45 minutes) and the ad may not exceed 15 seconds. In the second example in FIG. 23, two ads are permitted in the beginning (at time 0) (i.e. there are two slots associated with the same timecode) and two adds are permitted in 30 minutes after the beginning (absolute time). There is no time limit on ad length in this second example…In order to obtain an ad list with associated ad insertion points for particular content, the agent/plug-in sends a query to the ad-server: e.g., http://www.podbridge.com/QueryCampaign.” Furthermore, as cited in ¶ [0183] “For instance, ads may be delivered based upon content, such as a daily talk show. Moreover, for example, the same ads may be spliced into successive episodes of the talk show. Therefore, at least until the cached ads expire or a new episode arrives with metadata indicating a need to insert different ads, the agent can continue to obtain the ads from the cache without the need to go back to the network. If the metadata of a subsequent content episode (e.g. talk show episode) indicates a need for different ads, then the agent must retrieve those new ads before they can be inserted into the content.” Furthermore, as cited in ¶¶ [0214] [0216] “In some embodiments, ads are inserted on-the-fly (i.e., as content is received), and at each insertion point, the main stream fades out before an ad, and fades in after an ad. The position of ad insert points is reflected as the count of mp3 frames from the start of the stream. In the case of content encoded as mp3 frames, during the processing of the main content, the management agent counts the number of mp3 frames streamed until the stitch frame has been reached. From this point the fade-out effect is performed on a number of frames , until complete silence. The ad is streamed after the fade-out effect…From the Byte offset the ad insertion agent finds the start of the next mp3 frame on the stream. The fade-in effect is performed on the mp3 frames from the previous segment of the main content. The next segment is streamed until the next insert point is reached. In this manner no content information is lost in the unedited parts of the main content. The controlled fade-ins and fade-outs between main content and ads permit a user to experience a smoother transition between content and advertisement.”); 
initializing, at the advertisement interaction system with the processor, the interaction between the user and the one or more advertisements by inserting the one or more advertisements at the identified one or more insertion points, wherein the interaction between the user and the one or more advertisements is initiated based on the identification of the one or more triggers, the plurality of attributes, the one or more insertion points and the one or more advertisement slots (Examiner interprets the claim language using BRI to be based on the identification of a trigger, displaying/transmitting an advertisement in real-time.) (i.e. ads are dynamically inserted into podcast) (Yruski: ¶ [0230] “For example, ads may be inserted dynamically in order to achieve a degree of smart ad rotation. These criteria may include factors such as time of day during which an ad provider wants the ad to play. The insertion information also may include ad rotation rules such as, play Ad1 in the morning; play Ad2 in the afternoon; and play Ad3 in the evening. Client-side implementation of placement rules in accordance with insertion information may require the agent to access context information from the device concerning context in which content or ads are actually played. The context may be time of day during which, or location of play (obtained from a user's zip code, user's IP address or GPS device for example) at which, a device user presses a button to actually play a content file. The agent inserts an ad dynamically that is appropriate to the context in accordance with the rules criteria specified by the insertion information.” Furthermore, as cited in ¶ [0214] “In some embodiments, ads are inserted on-the-fly (i.e., as content is received), and at each insertion point, the main stream fades out before an ad, and fades in after an ad. The position of ad insert points is reflected as the count of mp3 frames from the start of the stream. In the case of content encoded as mp3 frames, during the processing of the main content, the management agent counts the number of mp3 frames streamed until the stitch frame has been reached. From this point the fade-out effect is performed on a number of frames , until complete silence. The ad is streamed after the fade-out effect.”).
Yruski does not explicitly disclose performing, by a natural language processing module of the advertisement interaction system, a multi- modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription; analyzing, at the advertisement interaction system with the processor using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, […] set of data; and wherein the analysis is performed for identifying one or more triggers and a plurality of attributes associated with the one or more advertisements, the podcast, and the user and for predicting behavior and journey of the user, wherein the analysis is performed in real time.
However, Ramer further discloses:
performing, by a natural language processing module of the advertisement interaction system, a multi- modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription (i.e. natural language analysis applied to content, ads, and user verbal commands, wherein the verbal commands are converted from voice signals to speech transcription) (Ramer: ¶ [0237] “A voice recognition facility 160 may be a software component enabling a machine or device (e.g., a cellular phone) to understand human spoken language and to carry out spoken commands. Typically, a human voice is received by the device and converted to analog audio. The analog audio may in turn be converted into a digital format using, for example, an analog-to-digital converter, which digital data may be interpreted using voice recognition techniques. Generally this is done through the use of a digital database storing a vocabulary of words or syllables, coupled with a means of comparing this stored data with the digital voice signals received by the device. The speech patterns of a unique user may be stored on a hard drive (locally or remotely) or other memory device, and may be loaded into memory, in whole or in part, when the program is run. A comparator may use, for example, correlation or other discrete Fourier transform or statistical techniques to compare the stored patterns against the output of the analog-digital converter.” Furthermore, as cited in ¶ [0255] “Disambiguation may include part-of-speech disambiguation, word sense disambiguation, phrase identification, named entry recognition, or full sentential parsing. Part-of-speech disambiguation refers to the process of assigning a part-of-speech tag ( e.g., noun, verb, adjective) to each word in a query.”); and
analyzing, at the advertisement interaction system with the processor using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, […] set of data (i.e. analyzing sets of data based on natural language processing and machine learning algorithms trained through supervised and unsupervised modelling) (Ramer: ¶ [0299] “Usage patterns may be analyzed using various predictive algorithms, such as regression techniques (least squares and the like), neural net algorithms, learning engines, random walks, Monte Carlo simulations, and others. For example, a usage pattern may indicate that a user has made many work-related phone calls during a holiday (such as by determining that the user was located at work and making calls all day).” Furthermore, as cited in ¶ [0310] “For example, if a user has consistently declined, or failed to view, music-oriented programming content (whether on a cellular phone, TV, or Internet), then a query for the term "U2" might return information on Soviet-era spy planes, notwithstanding that for other users such a query would return content related to the rock group U2. As in analysis of usage patterns, a wide range of algorithms, including learning algorithms, regression analyses, neural nets, and the like may be used to understand patterns in declined content that assist with handling queries and results.” Furthermore, as cited in ¶ [1965] “This general category includes, but is not limited to, methods discussed below. Neural networks are nonlinear sophisticated modeling techniques that are able to model complex functions and can be applied to problems of prediction, classification or control. Neural network analytic techniques may be employed when the exact nature of the relationship between inputs and output is not known. In such instances, neural network techniques may assist in learning more about the relationship between the inputs and outputs through supervised and unsupervised training. Support vector machines may be used in machine learning to detect and exploit complex patterns in data by clustering, classifying and ranking the data.”); and
wherein the analysis is performed for identifying one or more triggers and a plurality of attributes associated with the one or more advertisements, the podcast, and the user and for predicting behavior and journey of the user, wherein the analysis is performed in real time (i.e. identifying triggers and attributes associated with advertisement, podcast, and user and for predicting behavior and/or journey of user, wherein the analysis is performed in real time) (Ramer: ¶ [1426] “In embodiments, a user behavior or activity may be analyzed in relation to a particular content, content type, content category, and so forth. Referring to FIG. 36A, in an example, an activity factor may be learned based at least in part on logistic regression methodology. Realtime mobile traffic data may be associated with a user profile, including a historical user profile that includes prior activities, behaviors, and the like. Data derived from the updated request profile may be used in a responsiveness model (based on logistic regression or some other statistical modeling methodology).” Furthermore, as cited in ¶ [0191] “The results facility 148 may include general content and services, specific content catalogs, carrier premium content, carrier portal content, device based results, or home computer desktop search results. The general content and services provided in the results facility 148 could be podcasts, websites, general images available online, general videos available online, websites transcoded for MCF, or websites designed for mobile browser facilities.” Furthermore, as cited in ¶ [1090] “This category of user profile may, in tum, be used to predict actions or events such as a future purchase, advertisement conversion, or some other action or event that is associated with the category of user profile. In the current example, it may be known to a wireless provider that the three browse activities of visiting the websites of a florist, a caterer, and a photographer within some proximity of each other is highly associated with a user that is a bride to be. Thus, this type of web browsing activity may categorize this user in the "Bride-to-Be" category. This category may be stored in the mobile subscriber characteristics database 112 that is associated with her phone, and sponsored content, such as wedding-related advertisements may be presented to the display 172 of her mobile communication facility 102 based at least in part that she fits the category of "Bride-to-Be."”).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to add Ramer’s performing, by a natural language processing module of the advertisement interaction system, a multi- modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription; analyzing, at the advertisement interaction system with the processor using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model set of data; and wherein the analysis is performed for identifying one or more triggers and a plurality of attributes associated with the one or more advertisements, the podcast, and the user and for predicting behavior and journey of the user, wherein the analysis is performed in real time to Yruski’s method for enabling an interaction of a user with one or more advertisements within a podcast. One of ordinary skill in the art would have been motivated to do so in order to “enable targeting using advanced behavioral relevance algorithms” and to “determine optimal ad to deliver in real-time based on targeting. The combination of targeting criteria and campaign priority may ensure that the best ad is being returned for any given ad request.” (Ramer: ¶¶ [1237] [1271]).
Yruski and Ramer do not explicitly disclose wherein the fourth set of data comprises user image data, user verbal commands, relationship status data associated with the user.
However, Frank further discloses wherein the fourth set of data comprises user image data, user verbal commands, relationship status data associated with the user (i.e. user image data, user verbal commands and relationship status associated with user) (Frank: ¶ [0195] “In one embodiment, the label generator 464 is configured to utilize a facial expression analyzer to analyze an image of a face of the user 114 captured substantially during the window 467, in order to generate the label 465. By stating that the image was captured substantially during the window 467, it is meant that the image was captured in the window, or very close to it (such as one or two seconds before and/or after the window). A facial expression analyzer is a type of measurement ERP that receives as input images of a face, and predicts an emotional response corresponding to the image.” Furthermore, as cited in ¶¶ [0270] [0271] “Optionally, a connection on a social network may be referred to as a "friend" on the social network. Forming the connection may provide the second user with a privilege allowing the second user to view content posted by the first user and/or information regarding the first user… Often users that connect share an emotional relationship to some degree ( e.g., they are family, friends, and/or acquaintances). Thus, in some cases, the user 114 is likely to express natural and/or stronger emotional responses to content suggested by a friend on the social network 602 and/or to content posted and/or created by a friend on the social network 602; in such cases, the user 114 is likely to be more emotionally involved due to the personal nature of the content, compared to cases in which the user 114 consumes content generated by strangers and/or from an unknown source, to which there is no personal attachment.” Furthermore, as cited in ¶ [0344] “In yet another example, the description 453 may include audio content created by the user ( e.g., recording of verbal commands issued by the user). The interaction analyzer 462 may utilize speech-to-text conversion algorithms and/or semantic analysis in order to identify an action taken by the user.”).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to add Frank’s fourth set of data comprises user image data, user verbal commands, relationship status data associated with the user to Yruski’s method for enabling an interaction of a user with one or more advertisements within a podcast. One of ordinary skill in the art would have been motivated to do so in order to “enable the affective computing applications to improve the user experience, for example by selecting and/or customizing content according to the user's liking.” (Frank: ¶ [0002]).
With respect to Claims 11 and 20:
All limitations as recited have been analyzed and rejected to claim 1. Claim 11 recites “A computer system comprising: one or more processors; and a memory coupled to the one or more processors, the memory for storing instructions which, when executed by the one or more processors, cause the one or more processors to perform a method for enabling an interaction of a user with one or more advertisements within a podcast, the method comprising:” (Yruski: ¶¶ [0053] [0054] [0099] [0100]) the steps performed by method claim 1. Claim 20 recites “A non-transitory computer-readable storage medium encoding computer executable instructions that, when executed by at least one processor, performs a method for enabling an interaction of a user with one or more advertisements within a podcast, the method comprising: ” (Yruski: ¶¶ [0053] [0054] [0099] [0100]) the steps performed by method claim 1. Claims 11 and 20 do not teach or define any new limitations beyond claim 1. Therefore they are rejected under the same rationale.

With respect to Claim 2:
Yruski teaches:
The computer-implemented method as recited in claim 1, wherein the first set of data comprises audio data, video data, image data, subject matter of the podcast, theme of the podcast, keywords associated with the podcast, podcast publisher profile, and topics covered in the podcast (i.e. content metadata about the podcast) (Yruski: ¶ [0045] “A content publisher manager 115 inputs metadata and rules concerning content via block 106. The content metadata and rules also may serve as control information for use in determining which users and/or which content to match to ads in an ad campaign. The content metadata may include information about the content, for example.” Furthermore, ¶¶ [0063]-[0076] describe the content metadata to include image/video/audio data about the content, the podcast content, keywords to describe podcast, information about the podcast publisher, and short description of podcast.).
With respect to Claim 12:
All limitations as recited have been analyzed and rejected to claim 2. Claim 12 does not teach or define any new limitations beyond claim 2. Therefore it is rejected under the same rationale.

With respect to Claim 3:
Yruski teaches:
The computer-implemented method as recited in claim 1, wherein the second set of data comprises audio data of the one or more advertisements, video data of the one or more advertisements, image data of the one or more advertisements, subject matter of the one or more advertisements, theme of the one or more advertisements, and keywords associated with the one or more advertisements (i.e. ad content metadata about the advertisements, wherein audio/video/image data about the ads and subject matter, theme and keywords associated with the ads) (Yruski: ¶ [0045] “In operation, an ad campaign manager 113 inputs metadata and rules concerning an ad campaign via block 102. The ad campaign metadata and rules may serve as control information for use in determining which users and/or which content to match to ads in an ad campaign. The ad metadata may include information about the ads in the campaign, for example.” Furthermore, as cited in ¶ [0160] “Moreover in some embodiments, an ad campaign provider can search and add particular categories of content to a campaign based upon keywords, for example. For instance, assume an ad provider wants to advertise a windshield product. The ad provider may want to advertise the windshield product on a fixed list of automotive channels. Thus, for example, an ad campaign provider GUI (not shown) may permit an ad campaign provider to define ad hoc categories based upon keywords, content type (e.g., news show, financial reporting, adult) or source of the podcast provider (e.g. company A, company B, website C, Mr. D). For instance, in one GUI embodiment not shown), an ad campaign provider can type "automotive" or "car" into a search box and get a list of content feeds that have the "automotive" keyword in its metadata. An ad campaign provider can explicitly choose the specific content feeds to advertise on.”).
With respect to Claim 13:
All limitations as recited have been analyzed and rejected to claim 3. Claim 13 does not teach or define any new limitations beyond claim 3. Therefore it is rejected under the same rationale.

With respect to Claim 4:
Yruski does not explicitly disclose the computer-implemented method as recited in claim 1, wherein the third set of data comprises real-time location of the communication device, a location history of the communication device, sound data from a microphone of the communication device, image data from a camera of the communication device, accelerometer data from an accelerometer of the communication device, gyroscope data from a gyroscope of the communication device, real-time movement data, and sensor data from a sensor of the communication device.
However, Ramer further discloses wherein the third set of data comprises real-time location of the communication device, a location history of the communication device, sound data from a microphone of the communication device, image data from a camera of the communication device, accelerometer data from an accelerometer of the communication device, gyroscope data from a gyroscope of the communication device, real-time movement data, and sensor data from a sensor of the communication device (Ramer: ¶ [1972] “In embodiments, user related behavioral data may contain location data relating to a user, a user's mobile communication facility, or some other type of location data. The location data may be past, current, or future ( e.g., the location of a future vacation inferred from a plane ticket purchase). The user location may be derived from a location metric of a mobile communication facility, such as a GPS signal, a camera, SMS, keyword, triangulation of cellular towers, and the like.” Furthermore, as cited in ¶ [0117] “The voice entry 122 function of the mobile communication facility may be used through the speaker-receiver device of the mobile communication facility 102 or by use of the standard SMS lexicon and syntax, and it may be adaptive to individual users' voice commands and usage patterns that are stored on and accessed from the mobile subscriber characteristics database 112. The voice entry 122 function may permit voice dialing, voice memo, voice recognition, speech recognition, or other functions related to audible input.” Furthermore, as cited in ¶ [2063] “In embodiments, the methods and systems described herein for selecting and presenting relevant sponsored content to a user may be used in conjunction with mobile communication facilities motion detection technologies such as a gyroscope, compass, accelerometer, or some other means of detection movement.”).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to add Ramer’s third set of data comprising real-time location of the communication device, a location history of the communication device, sound data from a microphone of the communication device, image data from a camera of the communication device, accelerometer data from an accelerometer of the communication device, gyroscope data from a gyroscope of the communication device, real-time movement data, and sensor data from a sensor of the communication device to Yruski’s method for enabling an interaction of a user with one or more advertisements within a podcast. One of ordinary skill in the art would have been motivated to do so in order to “enable targeting using advanced behavioral relevance algorithms” and to “determine optimal ad to deliver in real-time based on targeting. The combination of targeting criteria and campaign priority may ensure that the best ad is being returned for any given ad request.” (Ramer: ¶¶ [1237] [1271]).
With respect to Claim 14:
All limitations as recited have been analyzed and rejected to claim 4. Claim 14 does not teach or define any new limitations beyond claim 4. Therefore it is rejected under the same rationale.

With respect to Claim 5:
Yruski teaches:
The computer-implemented method as recited in claim 1, wherein the fourth set of data is associated with a profile of the user, wherein the fourth set of data comprises name data, age data, e-mail identity data, contact number data, gender data, geographic location data, demographic data, relationship status data, past podcast search keywords data, real-time podcast search keywords data, past podcast reviews data, past podcast interactions data, past advertisement interactions data, user verbal commands, user text, user image data, communication device operated commands, past gestures data, and real-time gestures data (Yruski: ¶ [0059] “The user-related information also may comprise user attributes that express user qualities or characteristics. For example, user attributes may comprise a user's gender, age, listening and or viewing habits, geographic info (e.g. zip code) and whether there are children in the house.” Furthermore, as cited in ¶ [0045] “The user profiles block includes an aggregation of information concerning usage, preferences, geographic information, demographic information about users. For instance, user information may be included in user profiles for individual users or for groups of users or for user organizations, for example. The user profiles may serve as control information for use in determining which users and/or which content to match to ads in an ad campaign.” Furthermore, ¶¶ [0063]-[0076] describe the content metadata that is included in the usage pattern to include image/video/audio data about the content, the podcast content, keywords to describe podcast, information about the podcast publisher, and short description of podcast.).
With respect to Claim 15:
All limitations as recited have been analyzed and rejected to claim 5. Claim 15 does not teach or define any new limitations beyond claim 5. Therefore it is rejected under the same rationale.

With respect to Claim 8:
Yruski teaches:
The computer-implemented method as recited in claim 1, wherein the system generated triggers facilitate triggering of the interaction between the user and the one or more advertisements, wherein the system generated triggers comprises at least one of a significant keyword within the one or more advertisements, the halt within the one or more advertisements above the threshold time, referring to topics of the interests of the user within the one or more advertisements according to the profile of the user, and the user attentiveness throughout the one or more advertisements (i.e. advertisements are inserted into podcast according to keyword or topics of interest, wherein the ad insertion occurs within an interval of time and user’s usage patterns are taken into consideration) (Yruski: ¶ [0160] “Moreover in some embodiments, an ad campaign provider can search and add particular categories of content to a campaign based upon keywords, for example. For instance, assume an ad provider wants to advertise a windshield product. The ad provider may want to advertise the windshield product on a fixed list of automotive channels. Thus, for example, an ad campaign provider GUI (not shown) may permit an ad campaign provider to define ad hoc categories based upon keywords, content type (e.g., news show, financial reporting, adult) or source of the podcast provider (e.g. company A, company B, website C, Mr. D). For instance, in one GUI embodiment not shown), an ad campaign provider can type "automotive" or "car" into a search box and get a list of content feeds that have the "automotive" keyword in its metadata. An ad campaign provider can explicitly choose the specific content feeds to advertise on.” Furthermore, as cited in ¶¶ [0106] [0107] “At time=t4, the podcast application, such as iTunes initiates an update of content associated with the RSS feed. The request is intercepted by the ad insertion module 706, which includes a listener 812 on localhost (127.0.0.1). The listener is a part of the plug-in code, which listens on the local host IP 127.0.0.1 and intercept calls by the iTunes application (or any other media manger application). At time=tS, the ad insertion module 706 forwards the intercepted request over the network 116 to the content server 214. At time=t6, the content server 806 receives the request sent by the ad insertion module 706…At time=t7, the content server 806 returns the requested content over the network to the ad insertion module 406. At time=t8, the ad insertion module 706 receives tl1e requested content update. At time=t9, ads stored in ad information storage buffer 704 are combined with the content delivered by the content server 214. At time=t10, ad-infused content is streamed to be played.”).
With respect to Claim 17:
All limitations as recited have been analyzed and rejected to claim 8. Claim 17 does not teach or define any new limitations beyond claim 8. Therefore it is rejected under the same rationale.

With respect to Claim 9:
Yruski does not explicitly disclose the computer-implemented method as recited in claim 1, wherein the user generated triggers facilitate triggering of the interaction between the user and the one or more advertisements by the user, wherein the user generated triggers comprises at least one of the user verbal commands, the user text, user facial expressions, user gestures, and hardware button commands associated with the communication device.
However, Ramer further discloses wherein the user generated triggers facilitate triggering of the interaction between the user and the one or more advertisements by the user, wherein the user generated triggers comprises at least one of the user verbal commands, the user text, user facial expressions, user gestures, and hardware button commands associated with the communication device (Ramer: ¶ [2063] “In another aspect, a 3D accelerometer may be provided that may be utilized for step recognition in a sport application and for tap gesture recognition in a user interface. The tap gestures may be utilized for controlling applications such as a music player, a sport application, and the like. The accelerometer may also be used in alarm clock sensors to detect movement of a sleeping person. For example, such a sensor may sense rapid eye movement ( e.g., REM) phase of a user's sleep and may wake the user accordingly.” Furthermore, as cited in ¶ [0237] “A voice recognition facility 160 may be a software component enabling a machine or device ( e.g., a cellular phone) to understand human spoken language and to carry out spoken commands. Typically, a human voice is received by the device and converted to analog audio. The analog audio may in turn be converted into a digital format using, for example, an analog-to-digital converter, which digital data may be interpreted using voice recognition techniques. Generally this is done through the use of a digital database storing a vocabulary of words or syllables, coupled with a means of comparing this stored data with the digital voice signals received by the device. The speech patterns of a unique user may be stored on a hard drive (locally or remotely) or other memory device, and may be loaded into memory, in whole or in part, when the program is run. A comparator may use, for example, correlation or other discrete Fourier transform or statistical techniques to compare the stored patterns against the output of the analog-digital converter.”).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to add Ramer’s wherein the user generated triggers facilitate triggering of the interaction between the user and the one or more advertisements by the user, wherein the user generated triggers comprises at least one of the user verbal commands, the user text, user facial expressions, user gestures, and hardware button commands associated with the communication device to Yruski’s method for enabling an interaction of a user with one or more advertisements within a podcast. One of ordinary skill in the art would have been motivated to do so in order to “enable targeting using advanced behavioral relevance algorithms” and to “determine optimal ad to deliver in real-time based on targeting. The combination of targeting criteria and campaign priority may ensure that the best ad is being returned for any given ad request.” (Ramer: ¶¶ [1237] [1271]).
With respect to Claim 18:
All limitations as recited have been analyzed and rejected to claim 9. Claim 18 does not teach or define any new limitations beyond claim 9. Therefore it is rejected under the same rationale.

With respect to Claim 10:
Yruski teaches:
The computer-implemented method as recited in claim 1, wherein the advertiser generated triggers facilitate triggering of the interaction between the user and the one or more advertisements by the advertiser, wherein the advertiser generated triggers comprises at least one of an advertiser defined time in the one or more advertisements, an advertiser defined keyword in the one or more advertisements, and podcast publisher commands to an podcast publisher interaction trigger button (i.e. advertisements are triggered according to time, keyword, and explicit commands to the podcast publisher) (Yruski: ¶ [0160] “Moreover in some embodiments, an ad campaign provider can search and add particular categories of content to a campaign based upon keywords, for example. For instance, assume an ad provider wants to advertise a windshield product. The ad provider may want to advertise the windshield product on a fixed list of automotive channels. Thus, for example, an ad campaign provider GUI (not shown) may permit an ad campaign provider to define ad hoc categories based upon keywords, content type (e.g., news show, financial reporting, adult) or source of the podcast provider (e.g. company A, company B, website C, Mr. D). For instance, in one GUI embodiment not shown), an ad campaign provider can type "automotive" or "car" into a search box and get a list of content feeds that have the "automotive" keyword in its metadata. An ad campaign provider can explicitly choose the specific content feeds to advertise on.” Furthermore, as cited in ¶¶ [0105]-[0107] “In operation, at time=t1, a user actuates a link on web page 802 to request an RSS feed associated with content served by content server 806. In response to the request, at time=t2, the RSS feed is delivered over the network 116 to the user device. However, a transform module 810 within the ad insertion module 406 intercepts the RSS feed and changes all content URLs to point to localhost (127.0.0.1). At time=t3, the RSS feed with the transformed URLs is delivered to the user device. The content provider's web page 802 includes a "get podcast" button associated with Java script used to download the plug-in. When a user selects the "get podcast" button, the user is asked for permission to install the plug-in client, if it was not previously installed. If the client already has been installed, the client takes control and adds the feed to the podcast application, such as iTunes.…At time=t4, the podcast application, such as iTunes initiates an update of content associated with the RSS feed. The request is intercepted by the ad insertion module 706, which includes a listener 812 on localhost (127.0.0.1). The listener is a part of the plug-in code, which listens on the local host IP 127.0.0.1 and intercept calls by the iTunes application (or any other media manger application). At time=tS, the ad insertion module 706 forwards the intercepted request over the network 116 to the content server 214. At time=t6, the content server 806 receives the request sent by the ad insertion module 706…At time=t7, the content server 806 returns the requested content over the network to the ad insertion module 406. At time=t8, the ad insertion module 706 receives tl1e requested content update. At time=t9, ads stored in ad information storage buffer 704 are combined with the content delivered by the content server 214. At time=t10, ad-infused content is streamed to be played.”).
With respect to Claim 19:
All limitations as recited have been analyzed and rejected to claim 10. Claim 19 does not teach or define any new limitations beyond claim 10. Therefore it is rejected under the same rationale.


Response to Arguments
Applicant’s arguments see pages 13-26 of the Remarks disclosed, filed on 02/03/2026, with respect to the 35 U.S.C. § 101 rejection(s) of claim(s) 1-5, 8-15, and 17-20 have been considered but are not persuasive: 
The Applicant asserts “Amended claim 1 does not claim advertising rules, marketing strategies, or economic decision-making. Instead, amended independent claim 1 recites a specific process that operates on continuously accessed audio content and produces machine-derived control conditions. Specifically, amended claim 1 requires performing, by a natural language processing module, a multi-modal natural language analysis for dynamic transcription of the podcast, advertisements, and user verbal commands into transcript data comprising speech- based and non-speech-based transcription. This is not a business activity. This is audio signal transformation into structured transcript data, including classification of speech VS. non-speech segments. That transformation is a technical prerequisite for every downstream operation in the claim. The claim then requires analyzing, using one or more machine-learning algorithms, the first, second, third, and fourth data sets to identify halts, topic transitions, contextual relevance, and user attentiveness in real time. These are machine-detectable temporal and semantic features of an audio stream. They are not human judgments and not economic abstractions. A human listener cannot, in real time, dynamically transcribe a podcast, classify speech VS. non-speech, detect halts above threshold durations, correlate them with topic transitions, and do so while the audio is still playing. Finally, the claim requires identifying one or more insertion points based on the determined halts and contextual relevance and initializing interaction by inserting advertisements at those insertion points. Thus, the focus of the claim is how a machine derives insertion points from real-time audio analysis, not advertising per se. The Examiner's "Organizing Human Activity" characterization of claim 1 Is legally improper. A claim is not abstract merely because its output influences human activity. What matters is how the output is produced. Here, the interaction is not organized by rules or human judgment. It is triggered by machine-identified temporal and contextual conditions extracted from transcript data. The Examiner's approach improperly collapses "machine-derived audio segmentation and contextual extraction" into "advertising interaction". This is the exact analytical error rejected in McRO, where the court held that claims are not abstract when they recite specific computational steps that generate results, even if those results are later used in creative or commercial contexts.” The Examiner respectfully disagrees. Examiner would like to note that the claim limitation – “performing, by a natural language processing module, a multi-modal natural language analysis for dynamic transcription of the podcast, advertisements, and user verbal commands into transcript data comprising speech- based and non-speech-based transcription” is analyzed as an additional limitation under step 2B. Furthermore, merely applying a well-known technique (i.e. multi-modal natural language analysis or natural language processing to data (i.e. advertisements, podcasts, and user verbal commands) in order to transcript the data into speech and non-speech data is a limitation that is not indicative of integration into a practical application because it is essentially adding insignificant extra-solution activity to the judicial exception. Applying natural language processing to transcript data into speech and non-speech represents a well-understood, routine, conventional (“WURC”) activity because it has been known that “In more detail, conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules may be optimized independently. This specification formulates audio to semantic understanding as a sequence-to-sequence problem." In ¶ [0067] of U.S. Publication 2021/0090570 to Aharoni. Examiner would also like to note that “analyzing, using one or more machine-learning algorithms, the first, second, third, and fourth data sets to identify halts, topic transitions, contextual relevance, and user attentiveness in real time” and “identifying one or more insertion points based on the determined halts and contextual relevance and initializing interaction by inserting advertisements at those insertion points” is recited at such a high level that a person with a pen and paper or the necessary information can achieve this. 
The Applicant also asserts “Amended claim 1 does not merely recite the receipt and analysis of data followed by an abstract business outcome. Instead, the claim recites a specific computer-implemented processing pipeline that transforms real-time podcast audio into structured transcript data and then uses that transformed data to determine insertion points within a continuously accessed audio stream. The alleged abstract idea-interaction with advertisements-cannot be performed independently of this technical processing. Rather, the interaction is explicitly conditioned on machine-determined halts, topic transitions, and contextual relevance identified from transcript data generated in real time. As such, the claim applies any alleged abstract idea through a concrete technological process that governs how the computer processes audio content. In particular, amended claim 1 requires performing, by a natural language processing module of the advertisement interaction system, a multi-modal natural language analysis for dynamic transcription of the podcast, advertisements, and user verbal commands into transcript data comprising both speech-based and non-speech-based transcription. This transcription step is not recited as a post-solution reporting or visualization activity. Instead, the transcript data is expressly required for the subsequent machine-learning analysis that identifies halts, topic transitions, and contextual relevance. Without this transcription, the remaining steps of the claim cannot be executed. Accordingly, the transcription is an integral technical operation that enables the claimed functionality and cannot be dismissed as insignificant extra-solution activity. The Examiner asserts that applying natural language processing and machine learning to data is well-understood, routine, and conventional. Even if such techniques exist in isolation, the Prong II inquiry does not ask whether individual tools are known, but whether the claim as a whole integrates the alleged abstract idea into a practical application. Here, the claim does not merely apply natural language processing for understanding or classification. Rather, the claim uses transcript data generated from audio to identify temporal halts and contextual boundaries within a podcast, and then uses those machine-identified features to determine insertion points where interaction can occur without disrupting the audio flow. This constitutes a specific application of transcript analysis to control interaction timing within a real-time audio stream, which is a technological implementation, not an abstract business rule. Furthermore, amended claim 1 recites that the machine-learning analysis is performed in real time while the user accesses the podcast, and that insertion points are identified based on halts in the podcast and contextual relevance of the subject matter. These limitations meaningfully restrict how the claimed system operates. The system is not free to insert advertisements arbitrarily or based on static metadata; it must continuously analyze live audio-derived transcript data and detect technical conditions within the audio itself. This real- time constraint ties the alleged abstract idea to a specific manner of computer operation and reflects a practical application. The Examiner further characterizes the claim as merely linking an abstract idea to a technological environment, namely podcasts. This characterization is incorrect. The podcast is not a field-of-use label but a source of technical signal characteristics that the computer must analyze. The claim requires detecting halts and topic transitions within the podcast audio itself. These are audio-processing constraints that directly affect how the computer operates and what computations it must perform. Because the environment dictates the technical processing steps, the claim does more than merely confine an abstract idea to a particular field of use. Additionally, the Examiner concludes that the claim does not improve computer functioning. This conclusion overlooks the fact that amended claim 1 changes how the computer processes audio streams. Rather than treating a podcast as an undifferentiated media file, the claimed method causes the computer to dynamically transcribe the audio, distinguish speech from non-speech, detect halts and topic transitions, and compute interaction points based on those detected features. This constitutes an improvement in the computer's handling of streaming audio content by enabling machine-actionable segmentation and timing control that did not previously exist. Such improvements to data processing and content handling fall squarely within the types of improvements recognized as patent-eligible under the USPTO guidance. The Examiner's analysis also improperly fragments the claim by evaluating individual elements in isolation. When the claim is considered as an ordered combination, it recites a tightly integrated sequence of operations in which transcript generation, machine-learning analysis, insertion point identification, and interaction initialization are causally linked. The interaction is not simply "enabled" by a computer; it is technically constrained and triggered by machine-derived conditions extracted from live audio. This interdependence demonstrates that the claim applies any alleged abstract idea in a specific, practical manner rather than merely appending generic computer implementation language.” The Examiner respectfully disagrees. Claims 1, 11, and 20 recite additional limitations including at an advertisement interaction system with a processor, with/through a communication device, Computer-readable memory(s); “analysis performed by the one or more machine learning algorithms and the natural language processing module”; and “performing, by a natural language processing module, a multi-modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription; and analyzing, using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, the first set of data, the second set of data, the third set of data, and the fourth set of data.” The additional limitations reciting – “at an advertisement interaction system with a processor, with/through a communication device, Computer-readable memory(s); “analysis performed by the one or more machine learning algorithms and the natural language processing module”; and “analyzing, using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, the first set of data, the second set of data, the third set of data, and the fourth set of data”” are recited in a manner that merely uses the computer (i.e. advertisement interaction system or communication device and machine learning algorithm) as the tool to perform the abstract idea. These additional elements in claims 1, 11, and 20 are not found to integrate the judicial exception into a practical application because alone, and in combination, these additional elements are seen as adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f), adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g), and generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h). The additional limitations do no more than link the judicial exception to a particular technological environment or field of use, i.e. system/device, and therefore do not integrate the abstract idea into a practical application. The courts decided that although the additional elements did limit the use of the abstract idea, the court explained that this type of limitation merely confines the use of the abstract idea to a particular technological environment and this fails to add an inventive concept to the claims (See Affinity Labs of Texas v. DirecTV, LLC,). 
The Applicant finally asserts “Even if amended claim 1 is assumed to be directed to an abstract idea, the claim recites an inventive concept because it defines a non-conventional and non-generic combination of computer-implemented operations that go beyond merely organizing human activity. In particular, amended claim 1 requires real-time acquisition of multiple heterogeneous data sets originating from distinct technical sources, including podcast publisher data, advertiser data, communication-device data, and user interaction data. Claim 1 further requires coordinated processing of those data sets within an advertisement interaction system while a podcast is being accessed in real time. Amended claim 1 further recites performing multi-modal natural language analysis together with machine-learning-based analysis to identify system-generated, user-generated, and advertiser-generated triggers and a plurality of attributes, including halts, topic transitions, contextual relevance, optimal timing, and user attentiveness. These triggers and attributes are not merely informational outputs but are expressly used to control system behavior by determining when and where interaction may occur within an ongoing podcast. This rule-based, real-time control of interaction based on continuously updated analytical results reflects a specific technical implementation rather than a generic instruction to apply data analysis on a computer. Additionally, the claim requires identifying insertion points and advertisement slots based on the combined analysis of the podcast content and user-related data and initializing interaction by inserting advertisements at those insertion points during real-time playback. This is not a conventional or generic computer function, but a particular operational arrangement in which analytical results directly govern execution timing within a streaming audio environment. The Examiner provides no factual support that such a real-time, trigger- driven interaction control mechanism was routine or conventional at the time of filing. When considered as an ordered combination, the elements of amended claim 1 define a specific technical solution that integrates multiple data sources, real-time analysis, trigger identification, and interaction control into a unified system. This arrangement amounts to significantly more than the alleged abstract idea and therefore satisfies Step 2B of the subject matter eligibility analysis.” The Examiner respectfully disagrees. The Examiner would like to note that “identifying insertion points and advertisement slots based on the combined analysis of the podcast content and user-related data and initializing interaction by inserting advertisements at those insertion points during real-time playback” is recited at such a high level that a person with a pen and paper or the necessary information can achieve this and “inserting advertisements at those insertion points during real-time playback” is recited nowhere in the claims. Examiner recommends how the claims are inserting the advertisements at those insertion points with the how being rooted in computer technology instead high level abstraction. Furthermore, the additional limitations reciting - “at an advertisement interaction system with a processor, with/through a communication device, Computer-readable memory(s); “analysis performed by the one or more machine learning algorithms and the natural language processing module”; and “analyzing, using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, the first set of data, the second set of data, the third set of data, and the fourth set of data”” do not integrate the judicial exception (abstract idea) into a practical application because of the analysis provided in Step 2A, Prong II. Claims 1, 11, and 20 also recite additional limitations – “performing, by a natural language processing module, a multi-modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription.” However, merely applying a well-known technique (i.e. multi-modal natural language analysis or natural language processing to data (i.e. advertisements, podcasts, and user verbal commands) in order to transcript the data into speech and non-speech data is a limitation that is not indicative of integration into a practical application because it is essentially adding insignificant extra-solution activity to the judicial exception. Applying natural language processing to transcript data into speech and non-speech represents a well-understood, routine, conventional (“WURC”) activity because it has been known that “In more detail, conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments. These modules may be optimized independently. This specification formulates audio to semantic understanding as a sequence-to-sequence problem." In ¶ [0067] of U.S. Publication 2021/0090570 to Aharoni. Examiner would also like to note that applying machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model to analyze data is a limitation that is not indicative of integration into a practical application because it is essentially adding insignificant extra-solution activity to the judicial exception. Applying machine learning algorithms trained using supervised and/or unsupervised learning techniques to analyze data represents a well-understood, routine, conventional (“WURC”) activity because it has been known that “It is noted that the semantic entity relation detection classifier training implementations described herein can perform the classifier training/learning using any semi-supervised or unsupervised machine learning method such as a conventional logistic regression method, or a conventional decision trees method, or a conventional support vector machine method, among other types of machine learning methods. It is also noted that the semantic entity relation detection classifier training implementations can be used to train a variety of classifiers including a conventional support vector machine, or a conventional artificial neural network, or a conventional Bayesian statistical classifier, among other types of classifiers.” In ¶ [0070] of U.S. Publication 2017/0068903 to Hakkani-Tur. The independent claims do not include additional elements or a combination of elements that result in the claims amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements listed amount to no more than mere instructions to apply an exception using a generic computer component. In addition, the applicant’s specifications describe generic computer-based elements, Page 13 Lines 10-12 and Page 20 Lines 5-7, for implementing the advertisement interaction system and communication device, which do not amount to significantly more than the abstract idea of itself, which is not enough to transform an abstract idea into eligible subject matter. There is no improvement in the functioning of the computer or technological field, and there is no transformation of subject matter into a different state. Therefore, the rejection(s) of claim(s) 1-5, 8-15, and 17-20 under 35 U.S.C. § 101 is maintained above with an updated analysis.
Applicant’s arguments see pages 26-29 of the Remarks disclosed, filed on 02/03/2026, with respect to the 35 U.S.C. § 103 rejection(s) of claim(s) 1-5, 8-15, and 17-20 over Yruski in view of Ramer and in further view of Frank have been considered but are not persuasive:
The Applicant asserts “Yruski is fundamentally directed to metadata-driven advertisement insertion based on campaign rules, schedules, and predefined slot information supplied by content managers and advertisers. The cited portions of Yruski describe ad placement decisions being made based on externally provided metadata, such as time-of-day rules, rotation rules, byte offsets, or preconfigured slot definitions. The "complex algorithms" discussed in Yruski operate on campaign yield management and distribution criteria, not on real-time semantic analysis of podcast content itself. Yruski nowhere teaches or suggests analyzing the podcast audio stream to determine halts, topic transitions, or contextual relevance of subject matter as a basis for identifying insertion points. Instead, ad insertion locations are predetermined or rule- driven and exist independently of any linguistic or semantic understanding of the podcast content. Amended claim 1, however, requires that one or more insertion points and one or more advertisement slots are identified based on determined halts in the podcast and contextual relevance of subject matter of the podcast, where those halts and contextual relevance are outputs of a preceding analysis step. Yruski's "slots" are not identified based on halts or contextual relevance, but are defined a priori by campaign configuration or RSS metadata. The Examiner's interpretation equates any predefined ad slot with a halt-based, context-derived insertion point, which is not supported by Yruski's disclosure.”  The Examiner respectfully disagrees. Examiner would like to note that the claims have no recitation of determining halts but instead “halts” is used to describe a plurality of attributes. Furthermore, the Examiner would like to refer the Applicant to ¶¶ [0214] [0216] of the Yruski reference; “In some embodiments, ads are inserted on-the-fly (i.e., as content is received), and at each insertion point, the main stream fades out before an ad, and fades in after an ad. The position of ad insert points is reflected as the count of mp3 frames from the start of the stream. In the case of content encoded as mp3 frames, during the processing of the main content, the management agent counts the number of mp3 frames streamed until the stitch frame has been reached. From this point the fade-out effect is performed on a number of frames , until complete silence. The ad is streamed after the fade-out effect…From the Byte offset the ad insertion agent finds the start of the next mp3 frame on the stream. The fade-in effect is performed on the mp3 frames from the previous segment of the main content. The next segment is streamed until the next insert point is reached. In this manner no content information is lost in the unedited parts of the main content. The controlled fade-ins and fade-outs between main content and ads permit a user to experience a smoother transition between content and advertisement.” It is clear from the disclosure above that the Yruski reference teaches a plurality of attributes including halts such as points in the main stream where the content fades out/in.
The Applicant also asserts “Importantly, amended claim 1 does not merely require collecting heterogeneous user data. It requires that the first, second, third, and fourth sets of data are analyzed together, based on enabled multi-modal natural language analysis and machine learning, for identifying triggers and attributes associated with the advertisements, the podcast, and the user, and that these outputs are then used to identify insertion points and advertisement slots. Neither Frank alone nor the combination of Frank with Yruski and Ramer teaches or suggests this ordered, interdependent analytical sequence. The Examiner's combination further suffers from an impermissible "mix-and-match" approach. Individual features are extracted from unrelated technical contexts, campaign metadata (Yruski), query NLP and behavioral prediction (Ramer), and affective computing (Frank), and then assembled to mirror the claim language. However, §103 requires more than the mere possibility of combination; it requires a teaching, suggestion, or motivation to combine the references in the manner claimed. The Office Action does not identify any disclosure in the references that would lead a skilled artisan to restructure Yruski's metadata- based system into one that first performs multi-modal NLP and ML analysis and only then determines triggers, insertion points, and advertisement slots based on podcast halts and contextual relevance.” The Examiner respectfully disagrees. Yruski teaches initializing, at the advertisement interaction system with the processor, the interaction between the user and the one or more advertisements by inserting the one or more advertisements at the identified one or more insertion points, wherein the interaction between the user and the one or more advertisements is initiated based on the identification of the one or more triggers and the plurality of attributes, the one or more insertion points and the one or more advertisement slots  (Examiner interprets the claim language using BRI to be based on the identification of a trigger, displaying/transmitting an advertisement in real-time.) (i.e. ads are dynamically inserted into podcast) (Yruski: ¶ [0230] “For example, ads may be inserted dynamically in order to achieve a degree of smart ad rotation. These criteria may include factors such as time of day during which an ad provider wants the ad to play. The insertion information also may include ad rotation rules such as, play Ad1 in the morning; play Ad2 in the afternoon; and play Ad3 in the evening. Client-side implementation of placement rules in accordance with insertion information may require the agent to access context information from the device concerning context in which content or ads are actually played. The context may be time of day during which, or location of play (obtained from a user's zip code, user's IP address or GPS device for example) at which, a device user presses a button to actually play a content file. The agent inserts an ad dynamically that is appropriate to the context in accordance with the rules criteria specified by the insertion information.” Furthermore, as cited in ¶ [0214] “In some embodiments, ads are inserted on-the-fly (i.e., as content is received), and at each insertion point, the main stream fades out before an ad, and fades in after an ad. The position of ad insert points is reflected as the count of mp3 frames from the start of the stream. In the case of content encoded as mp3 frames, during the processing of the main content, the management agent counts the number of mp3 frames streamed until the stitch frame has been reached. From this point the fade-out effect is performed on a number of frames , until complete silence. The ad is streamed after the fade-out effect.”). Yruski does not explicitly disclose analyzing, at the advertisement interaction system with the processor using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, […], wherein the analysis is performed for identifying one or more triggers and a plurality of attributes associated with the one or more advertisements, the podcast, and the user and for predicting behavior and journey of the user, wherein the analysis is performed in real time. However, Ramer further discloses analyzing, at the advertisement interaction system with the processor using one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, […] set of data (i.e. analyzing sets of data based on natural language processing and machine learning algorithms trained through supervised and unsupervised modelling) (Ramer: ¶ [0299] “Usage patterns may be analyzed using various predictive algorithms, such as regression techniques (least squares and the like), neural net algorithms, learning engines, random walks, Monte Carlo simulations, and others. For example, a usage pattern may indicate that a user has made many work-related phone calls during a holiday (such as by determining that the user was located at work and making calls all day).” Furthermore, as cited in ¶ [0310] “For example, if a user has consistently declined, or failed to view, music-oriented programming content (whether on a cellular phone, TV, or Internet), then a query for the term "U2" might return information on Soviet-era spy planes, notwithstanding that for other users such a query would return content related to the rock group U2. As in analysis of usage patterns, a wide range of algorithms, including learning algorithms, regression analyses, neural nets, and the like may be used to understand patterns in declined content that assist with handling queries and results.” Furthermore, as cited in ¶ [1965] “This general category includes, but is not limited to, methods discussed below. Neural networks are nonlinear sophisticated modeling techniques that are able to model complex functions and can be applied to problems of prediction, classification or control. Neural network analytic techniques may be employed when the exact nature of the relationship between inputs and output is not known. In such instances, neural network techniques may assist in learning more about the relationship between the inputs and outputs through supervised and unsupervised training. Support vector machines may be used in machine learning to detect and exploit complex patterns in data by clustering, classifying and ranking the data.”); and wherein the analysis is performed for identifying one or more triggers and a plurality of attributes associated with the one or more advertisements, the podcast, and the user and for predicting behavior and journey of the user, wherein the analysis is performed in real time (i.e. identifying triggers and attributes associated with advertisement, podcast, and user and for predicting behavior and/or journey of user, wherein the analysis is performed in real time) (Ramer: ¶ [1426] “In embodiments, a user behavior or activity may be analyzed in relation to a particular content, content type, content category, and so forth. Referring to FIG. 36A, in an example, an activity factor may be learned based at least in part on logistic regression methodology. Realtime mobile traffic data may be associated with a user profile, including a historical user profile that includes prior activities, behaviors, and the like. Data derived from the updated request profile may be used in a responsiveness model (based on logistic regression or some other statistical modeling methodology).” Furthermore, as cited in ¶ [0191] “The results facility 148 may include general content and services, specific content catalogs, carrier premium content, carrier portal content, device based results, or home computer desktop search results. The general content and services provided in the results facility 148 could be podcasts, websites, general images available online, general videos available online, websites transcoded for MCF, or websites designed for mobile browser facilities.” Furthermore, as cited in ¶ [1090] “This category of user profile may, in tum, be used to predict actions or events such as a future purchase, advertisement conversion, or some other action or event that is associated with the category of user profile. In the current example, it may be known to a wireless provider that the three browse activities of visiting the websites of a florist, a caterer, and a photographer within some proximity of each other is highly associated with a user that is a bride to be. Thus, this type of web browsing activity may categorize this user in the "Bride-to-Be" category. This category may be stored in the mobile subscriber characteristics database 112 that is associated with her phone, and sponsored content, such as wedding-related advertisements may be presented to the display 172 of her mobile communication facility 102 based at least in part that she fits the category of "Bride-to-Be."”). Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to add Ramer’s enabling, at the advertisement interaction system with the processor, a multi- modal natural language analysis on the one or more advertisements, the podcast and the user verbal commands using a natural language processing module for dynamic transcription of the one or more advertisements, the podcast and the user verbal commands in transcript data, wherein the transcript data comprises a speech-based transcription and a non-speech-based transcription; and analyzing, at the advertisement interaction system with the processor based on the enabled multi-modal natural language analysis and one or more machine learning algorithms trained using at least one of a supervised machine learning model or an unsupervised machine learning model, […], wherein the analysis is performed for identifying one or more triggers and a plurality of attributes associated with the one or more advertisements, the podcast, and the user and for predicting behavior and journey of the user, wherein the analysis is performed in real time to Yruski’s method for enabling an interaction of a user with one or more advertisements within a podcast. One of ordinary skill in the art would have been motivated to do so in order to “enable targeting using advanced behavioral relevance algorithms” and to “determine optimal ad to deliver in real-time based on targeting. The combination of targeting criteria and campaign priority may ensure that the best ad is being returned for any given ad request.” (Ramer: ¶¶ [1237] [1271]). The Examiner would also like to note that one cannot show non-obviousness by attacking references individually where the rejections are based on combinations of references. See In re Keller, 642 F.2d 413,208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091,231 USPQ 375 (Fed. Cir. 1986). Therefore, the rejection(s) of claim(s) 1-5, 8-15, and 17-20 under 35 U.S.C. § 103 is provided above with updated citations.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. The following references are cited to further show the state of the art:
U.S. Publication 2007/0078712 to Ott for disclosing automatically delivering media files with advertisements over a network. A method and system is disclosed for automatically adding an advertisement to the beginning or the end of a media file, such as a podcast episode, when the media file is requested by a consumer. In another aspect, the media file may be automatically searched for an advertisement marker, such as a specific tone or data element in the media file, that acts as a submission point for the automatic insertion of an advertisement into the media file. Aspects of the present invention allow for automatic insertion of advertisements after the creation of the media file, potentially without any interaction between the creator and the advertiser. The systems can be implemented at a central server, at the media file source, at a consumer's media player or distributed throughout various computing devices.
U.S. Publication 2017/0046429 to Barrand for disclosing enabling sharing of audio feeds. One method includes receiving, from a user over a network, a request to add an audio feed to a collection managed by the user; storing, in a database, a URL of the audio feed in relation to the collection; receiving, from the user over the network, a request to share the collection; and generating an RSS URL of the collection by searching the database for URLs of audio feeds stored in relation to the collection.
U.S. Publication 2009/0204243 to Marwaha for disclosing creating a customized text-to-speech podcast by receiving a text file, parsing and tagging the text file, creating multiple audio files by text-to-speech technology, and creating a podcast by combining the audio files. The podcast can be an audio podcast or a video podcast. Video podcasts associate related video content with the audio content.
U.S. Patent 9,563,826 to Lau for disclosing an advertisement is matched to subject matter in a portion of rich media content, such as a digital video, Flash™ animation, etc. For example, during the playing of rich media content, it may be determined by audio recognition techniques that the content's subject matter matches or correlates with an advertisement. Rendering preferences associated with the advertisement are then determined. The rendering preferences may be used to determine how the advertisement should be rendered (i.e., displayed in association with the content). The advertisement is then served to a device. The advertisement is served such that it can be rendered relative to a time that the portion of media is being displayed on the device.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Azam Ansari, whose telephone number is (571) 272-7047. The examiner can normally be reached from Monday to Friday between 8 AM and 4:30 PM.
If any attempt to reach the examiner by telephone is unsuccessful, the examiner's supervisor, Waseem Ashraf, can be reached at (571) 270-3948. 
Another resource that is available to applicants is the Patent Application Information Retrieval (PAIR). Information regarding the status of an application can be obtained from the (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAX. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pairdirect.uspto.gov. Should you have questions on access to the Private PAIR system, please feel free to contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Applicants are invited to contact the Office to schedule either an in-person or a telephonic interview to discuss and resolve the issues set forth in this Office Action. Although an interview is not required, the Office believes that an interview can be of use to resolve any issues related to a patent application in an efficient and prompt manner.

/AZAM A ANSARI/
Primary Examiner, Art Unit 3621                                                                                                                                                                                                        	
March 6, 2026
Read full office action
Prosecution Timeline

Oct 16, 2020
Application Filed
Feb 26, 2022
Non-Final Rejection — §101, §103, §112
Sep 10, 2022
Response after Non-Final Action
Jan 04, 2024
Response Filed
Apr 16, 2024
Final Rejection — §101, §103, §112
Jul 22, 2024
Request for Continued Examination
Jul 24, 2024
Response after Non-Final Action
Jul 27, 2024
Non-Final Rejection — §101, §103, §112
Jan 02, 2025
Response Filed
Feb 12, 2025
Final Rejection — §101, §103, §112
May 19, 2025
Request for Continued Examination
May 22, 2025
Response after Non-Final Action
Jun 07, 2025
Non-Final Rejection — §101, §103, §112
Sep 10, 2025
Response Filed
Nov 17, 2025
Final Rejection — §101, §103, §112
Feb 03, 2026
Request for Continued Examination
Feb 24, 2026
Response after Non-Final Action
Mar 06, 2026
Non-Final Rejection — §101, §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

17/261,208
Patent 12591892
SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR EARLY DETECTION OF A MERCHANT DATA BREACH THROUGH MACHINE-LEARNING ANALYSIS
2y 5m to grant Granted Mar 31, 2026
18/137,389
Patent 12499471
AUTOMATICALLY GENERATING A RETAILER-SPECIFIC BRAND PAGE BASED ON A MACHINE LEARNING PREDICTION OF ITEM AVAILABILITY
2y 5m to grant Granted Dec 16, 2025
18/232,315
Patent 12469042
SYSTEM FOR GENERATING A NON-FUNGIBLE TOKEN INCLUDING MUTABLE AND IMMUTABLE ATTRIBUTES AND RELATED METHODS
2y 5m to grant Granted Nov 11, 2025
17/322,277
Patent 12423918
AUGMENTED REALITY IN-APPLICATION ADVERTISEMENTS
2y 5m to grant Granted Sep 23, 2025
17/360,929
Patent 12417468
USER ENGAGEMENT MODELING FOR ENGAGEMENT OPTIMIZATION
2y 5m to grant Granted Sep 16, 2025
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

7-8
Expected OA Rounds
48%
Grant Probability
98%
With Interview (+49.7%)
3y 8m
Median Time to Grant
High
PTA Risk
Based on 338 resolved cases by this examiner. Grant probability derived from career allow rate.
METHOD AND SYSTEM FOR ENABLING AN INTERACTION OF A USER WITH ONE OR MORE ADVERTISEMENTS WITHIN A PODCAST

This examiner grants 48% of cases after interview

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email