Last updated: May 29, 2026

Application No. 18/805,857

AUDIO SEGMENT RECOMMENDATION

Non-Final OA §103

Filed

Aug 15, 2024

Priority

Nov 24, 2020 — continuation of 11/609,738 +1 more

Examiner

MCCORD, PAUL C

Art Unit

2692

Tech Center

2600 — Communications

Assignee

Spotify AB

OA Round

1 (Non-Final)

Interview Optional

— +26.1% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 69% grant rate with +26.1% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 575 resolved cases, 2023–2026

Examiner Intelligence

MCCORD, PAUL C View full profile →

Grants 69% — above average

Career Allowance Rate

398 granted / 575 resolved

+7.2% vs TC avg

Strong +26% interview lift

Without

With

+26.1%

Interview Lift

resolved cases with interview

Typical timeline

3y 5m

Avg Prosecution

28 currently pending

Career history

613

Total Applications

across all art units

Statute-Specific Performance

§101

0.6%

-39.4% vs TC avg

§103

92.4%

+52.4% vs TC avg

§102

3.6%

-36.4% vs TC avg

§112

1.1%

-38.9% vs TC avg

Black line = Tech Center average estimate • Based on career data from 575 resolved cases

Office Action

§103

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
 
Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 2-20 rejected under 35 U.S.C. 103 as being unpatentable over Kakoyainnis: 20200013380 hereinafter Ka further in view of Chawla: 11451598 hereinafter Chaw.

Regarding claim 2
Ka teaches:
A method (Ka: ¶ 67; Fig 5: such as practiced upon a computer, computer system by instantiation and/or execution of coded instructions from memory), comprising: receiving one or more media content items (Ka: Fig 5-7: computer system 102 receives an audio track, processes same, provides an audio track to a user device such as that of figure 6 which additionally receives, processes, and provides the audio track);
using machine learning to identify one or more audio segments of interest in each of the one or more media content items, based at least in part on an analysis of content included in a corresponding media content item, wherein each of the identified audio segments is associated with one or more automatically determined tags (Ka: Abstract; ¶ 21, 59, 64, 87, 133, etc.; Figs 5-7: system operates to segment an audio track corresponding to a particular one of the one or more media content items such as by employ of machine learning computer algorithms to enrich the corresponding media content item such as with a textual element such as a keyword, tag, descriptive copy and/or title automatically derived from the item, segment(s) thereof);
generating a video clip (Ka: Abstract, ¶ 59-61, 83, 139 etc.: system identifies or otherwise recommends segments for pairing with video, sequences of images etc. for delivery to a user such as for playback upon a media player), including:
using machine learning to automatically select, recommended audio segments from the identified audio segments based at least in part on one or more prior user interactions of a user and the automatically determined tags of the identified audio segments (Ka: Abstract, ¶ 59-61, 64, 83, 139 etc.: such as by utilizing an algorithm for constructing a video clip based on a user interest or listening history and additionally based on pairing of segments of audio and video such as based on tags, keywords, topics, etc.), and
identifying, for inclusion in the video clip, video segments of the one or more media content items that correspond to the recommended audio segments (Ka: Abstract, ¶ 59-61, 83, 139 etc.; Figs 5-7: system identifies segments for pairing with video, sequences of images, etc. for delivery to a user such as for playback upon a media player); and
providing the video clip for playback on a device associated with a user (id.).

Ka does not explicitly teach using a machine learning recommender to winnow particular segments used for generating a video clip for a specific user, thereby selecting audio segments from the identified audio segments to pair with video, etc. for the specific user, and
providing the video clip for playback on a device associated with the specific user.

In a related field of endeavor Chaw teaches a system for generating a preview segment of a recommended media comprising receiving one or more media content items (Chaw: Abstract; Col 1:46-1:67: system identifies one or more media items, segments thereof, to recommend to a user and gauges user interest in same);
using machine learning to identify one or more audio segments of interest in each of the one or more media content items based at least in part on an analysis of content included in a corresponding media content item, wherein each of the identified audio segments is associated with one or more automatically determined tags (Chaw: 2:20-2:27, 7:17-7:27, 8:53-8:64, etc.: system identifies segments, media items, etc. of potential interest for presentation, recommendation, etc. to a user; said segments, etc. based on tag data such as genre, musical qualities and selected using a machine learning system);
generating a video clip for a specific user (Chaw: 1:46-1:67; 2:20-2:27, 8:43-8:49, 8:60-9:7: system determines segments to  present to a particular user based on user specific factors including, user media consumption, user metadata preferences, etc. similarity of same to other user consumption, preferences, etc.), including:
using machine learning to automatically select, for the specific user, recommended audio segments from the identified audio segments based at least in part on one or more prior user interactions of the specific user and the automatically determined tags of the identified audio segments (Chaw: 1:46-1:67; 2:20-2:27, 8:43-8:49, 8:60-9:7, 9:63-9:67: system determines segments to  present to a particular user based on user specific factors including, user media consumption, user metadata preferences, etc. similarity of same to other user consumption, preferences, etc. such as by a machine learning system), and identifying, for inclusion in the video clip, video segments of the one or more media content items that correspond to the recommended audio segments (id. and 13:19-13:33: system provides selected, recommended, etc. segments to a user in the form of a digest, playlist, etc.); and providing the video clip for playback on a device associated with the specific user (id. and Fig 4: such as for playback on a user device). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to utilize the Chaw taught user specific recommendation to provide the Ka topically generated videos to a specific user(s) for at least the purpose of targeting or recommending a video more likely to be ingested by a user; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 3
Ka in view of Chaw teaches or suggests:
The method of claim 2, wherein the one or more prior user interactions of the specific user includes one or more other interactions by the specific user with respective audio segments in the media content items (Chaw: Col 2:16-2:27 user interactions in the form of “watch” data incorporated into the machine learning recommender). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 4
Ka in view of Chaw teaches or suggests:
The method of claim 2, wherein: selecting, for the specific user, the recommended audio segments from the identified audio segments based at least in part on the one or more prior user interactions of the specific user includes identifying types of media content that the specific user is not interested in based on interactions in which the specific user skips or ignores particular media content  (Chaw: Col 1:57-1:67, 2:16-2:27 user interactions in the form of “watch” data incorporated into the machine learning recommender; said data including a user choosing to skip the preview or ignore the preview in favor of deferment). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 5
Ka in view of Chaw teaches or suggests:
The method of claim 4, wherein the one or more prior user interactions of the specific user includes a swiping gesture causing a next or previous audio segment to be played back before completion of playback of the recommended audio segment (Chaw: Col 5:1-5:9; Figs 3-7: such as by the performance of user input upon a graphical user interface such as that of the figures). Examiner takes official notice that the recited user interface features were well known in the media player art before the effective filing date of the instant invention and would have comprised an obvious inclusion for at least the purpose of providing a user aspects of trick play to thereby deliver an expected media playback experience.

Regarding claim 6
Ka in view of Chaw teaches or suggests:
The method of claim 5, wherein: the one or more prior user interactions of the specific user includes a second swiping gesture causing additional content about the next or previous audio segment to be displayed. Examiner takes official notice that the recited user interface features were well known in the media player art before the effective filing date of the instant invention and would have comprised an obvious inclusion for at least the purpose of providing a user aspects of trick play to thereby deliver an expected media playback experience; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 7
Ka in view of Chaw teaches or suggests:
The method of claim 3, further comprising: updating a machine learning model based on an interaction by the specific user, the machine learning model configured to identify respective audio segments to recommend (Ka: ¶ 8: media content regularly updated); (Chaw: Col 2:1-2:15: media digests regularly iterated). It would have been obvious to one of ordinary skill in the art before the effective filing date of the instant application to regularly update the machine learning model based on the availability of new content and to include the taught user interaction features to winnow such new content; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 8
Ka in view of Chaw teaches or suggests:
The method of claim 2, wherein: the one or more prior user interactions of the specific user includes one or more interactions indicating types of content that the specific user is interested in, the one or more interactions selected from the group consisting of: the specific user listening to a particular audio segment to completion (Chaw: Col 2:20-2:32: such as by determining the time which a user spends watching a particular media item, a full length version of a media item, etc.); the specific user sharing the particular audio segment with other users of different electronic devices (Chaw: Col 3:20-3:27, 7:5-7:15; 7:49-7:53; Fig 4: such as by tracking user sharing of media items, historical sharing of items, number of shares of an item, etc.); and the specific user subscribing to the content corresponding to the particular audio segment (Ka: ¶ 68: such as by tracking a user subscription to a media item such as a podcast). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 9
Ka in view of Chaw teaches or suggests:
The method of claim 2, further comprising: receiving an indication of a user action associated with advancing an active audio segment; selecting for the specific user a second recommended audio segment from at least the identified audio segments; and automatically providing the second recommended audio segment (Ka: ¶ 149; Fig 13A: such as by tracking user operation to manipulate a media); (Chaw: Col 12:3-12:15; Fig 4: such as by tracking user operations for skipping forward to a next preview within a digest, selecting a preview of interest within a digest and providing the user with the indicated preview without further intervention). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 10
Ka in view of Chaw teaches or suggests:
The method of claim 2, further comprising: receiving an indication of a user action associated with an active audio segment; and providing a full corresponding media content item associated with the active audio segment (Chaw: Col 12:3-12:15; Fig 4: such as by tracking user operations for requesting a full item while enjoying a preview). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 11
Ka in view of Chaw teaches or suggests:
The method of claim 2, wherein the analysis of content included in the corresponding media content item includes identifying topics associated with identified word content in the corresponding media content item (Ka: ¶ 88, etc.: system operates to reify topics by keyword, term, etc.). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 12
Ka in view of Chaw teaches or suggests:
The method of claim 11, wherein the automatically determined tags of the identified audio segments are based on the identified topics (Ka: ¶ 22, 83, 88, 105, 106: tags are conflated with topic and a keyword may be considered a tag which reifies a topic). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 13
Ka in view of Chaw teaches or suggests:
The method of claim 2, wherein the analysis of content included in the corresponding media content item includes automatically transcribing each of the media content items (Ka: ¶ 7, 19, etc.: such as by conversion of  talk-based audio to text, tag associated therewith). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 14
Ka in view of Chaw teaches or suggests:
The method of claim 13, wherein transcribing each of the media content items includes automatically identifying one or more speakers of content in each of the media content items (Ka: ¶ 73, etc.: speech processed with a neural network). Examiner takes official notice that the speaker recognition was well known in the art before the effective filing date of the instant invention and would have comprised an obvious inclusion for at least the purpose of diarizing a text based media comprising plural speakers; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 15
Ka in view of Chaw teaches or suggests:
The method of claim 2, wherein the analysis of content included in the corresponding media content item includes automatically identifying advertisements in the corresponding media content item (Ka: ¶ 1, 2: such as by targeting advertising based on determined topics, metadata, etc. of particular media). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 16
Ka in view of Chaw teaches or suggests:
The method of claim 2, wherein the analysis of content included in the corresponding media content item includes automatically identifying music in the corresponding media content item (Chaw: Col 3:42-3:45, 6:63-6:67, etc.; Figs 3-7: such as by identifying relevant music based on metadata thereof, such as for a music based playlist, podcast, etc.). The claim is considered obvious over Ka as modified by Chaw as addressed in the base claim as it would have been obvious to apply the further teaching of Ka and/or Chaw to the modified device of Ka and Chaw; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 17
Ka in view of Chaw teaches or suggests:
The method of claim 2, wherein the analysis of content included in the corresponding media content item includes automatically identifying questions in the corresponding media content item. Examiner takes official notice that parsing a text for questions and answers by automatic identification of same was well known in the art before the effective filing date of the instant application and would have comprised an obvious inclusion for at least the purpose of identifying a discourse structure, determine questions relevant to resolve images or videos, classify speech acts with respect to additional labels for determining images or videos, identify questions relevant to particular topics, segments ascribed thereto, etc.; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 18
Ka in view of Chaw teaches or suggests:
The method of claim 2, wherein the video clip includes one or more visual indicators of a particular audio segment (Ka: Abstract; ¶ 11, 149, etc.; Fig 2, 13C: system pairs visual assets to identified content, such as by inclusion of a textual element, asset tag or other meta-, auxiliary, etc. data in the resulting container such as for display to a user or consumer of the media); (Chaw: Fig 4-7: system overlays name of a media item, at least a performer or the media, the one or more visual indicators selected from the group consisting of: 
a name of the corresponding media content item (Ka: ¶ 21, 80, etc.: textual elements comprise title associated with an audio elements, segment, etc.); (Chaw: Figs 4-7: such as the title of the media); speakers in the corresponding media content item (please see claim 14 supra diarizing speakers in a media considered well-known); subtitles and speaker information in the particular audio segment; tags corresponding to the particular audio segment (Ka: ¶ 21, 80, etc.: textual elements comprise metatags, keywords associated with the audio elements, segment, etc.) (Chaw: Figs 4-7: metadata or metatags such as title, artist, etc. displayed in concert with the assembled media); references to the corresponding media content item; and references to content applications for playing the corresponding media content item. While Ka in view of Chaw do not explicitly discuss the inclusion of every item in the group the remaining items are considered obvious as a matter of design choice on the part of an implementor of the system and/or of a creator of media therein; one of ordinary skill in the art would have expected only predictable results therefrom.

Regarding claim 19, 20—the claims are considered to recite substantially similar subject matter to that of claim 2 and are similarly rejected. 
 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to PAUL C MCCORD whose telephone number is (571)270-3701. The examiner can normally be reached 730-630 M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, CAROLYN EDWARDS can be reached at (571) 270-7136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PAUL C MCCORD/               Primary Examiner, Art Unit 2692

Read full office action

Prosecution Timeline

Aug 15, 2024

Application Filed

Feb 23, 2026

Non-Final Rejection mailed — §103

May 21, 2026

Applicant Interview (Telephonic)

May 21, 2026

Examiner Interview Summary

Precedent Cases

Applications granted by this same examiner with similar technology

18/072,479

Patent 12639525

PROMPTING LANGUAGE MODELS WITH WORKFLOW PLANS

3y 5m to grant Granted May 26, 2026

18/182,149

Patent 12632482

TRAINING A LEARNING-TO-RANK MODEL USING A LINEAR DIFFERENCE VECTOR

3y 2m to grant Granted May 19, 2026

19/028,592

Patent 12634652

MEDIA PLAYBACK BASED ON SENSOR DATA

1y 4m to grant Granted May 19, 2026

17/317,702

Patent 12626723

SYSTEM AND METHOD OF DETERMINING AUDITORY CONTEXT INFORMATION

5y 0m to grant Granted May 12, 2026

18/392,171

Patent 12625791

Adjusting a Playback Device

2y 4m to grant Granted May 12, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

1-2

Expected OA Rounds

69%

Grant Probability

95%

With Interview (+26.1%)

3y 5m (~1y 7m remaining)

Median Time to Grant

Low

PTA Risk

Based on 575 resolved cases by this examiner. Grant probability derived from career allowance rate.