Last updated: May 29, 2026

Application No. 18/791,022

Systems and Methods for Selecting a Set of Media Items Using a Diffusion Model

Non-Final OA §103

Filed

Jul 31, 2024

Priority

May 02, 2024 — provisional 63/641,750

Examiner

PEREZ-ARROYO, RAQUEL

Art Unit

2169

Tech Center

2100 — Computer Architecture & Software

Assignee

Spotify AB

OA Round

3 (Non-Final)

Interview Optional

— +31.7% interview lift. Interview already conducted in this application's prosecution history. This examiner has a 58% grant rate with +31.7% interview lift. Since an interview has already been tried, recommend written response with narrowed claims based on precedent claim evolution patterns.

Based on 298 resolved cases, 2023–2026

Examiner Intelligence

PEREZ-ARROYO, RAQUEL View full profile →

Grants 58% of resolved cases

Career Allowance Rate

173 granted / 298 resolved

+3.1% vs TC avg

Strong +32% interview lift

Without

With

+31.7%

Interview Lift

resolved cases with interview

Typical timeline

3y 4m

Avg Prosecution

20 currently pending

Career history

327

Total Applications

across all art units

Statute-Specific Performance

§101

8.7%

-31.3% vs TC avg

§103

86.1%

+46.1% vs TC avg

§102

2.5%

-37.5% vs TC avg

§112

1.4%

-38.6% vs TC avg

Black line = Tech Center average estimate • Based on career data from 298 resolved cases

Office Action

§103

DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on February 23, 2026 has been entered.
 
Response to Amendment
	This Office Action has been issued in response to Applicant’s Communication of amended application S/N 18/791,022 filed on January 26, 2026. Claims 1 to 18 are currently pending with the application.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA  to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 8, 11, 12, and 14 to 18 are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (U.S. Publication No. 2024/0282294) hereinafter Huang, in view of Labbé (U.S. Publication No. 2023/0113072), in view of Kumari et al. (U.S. Publication No. 2024/0185588) hereinafter Kumari, in view of CHOWDHURY et al. (U.S. Publication No. 2025/0284748) hereinafter Chowdhury, and further in view of Brown et al. (U.S. Publication No. 2020/0174733) hereinafter Brown.
	As to claim 1:
	Huang discloses:
A method performed at a computing system having one or more processors and memory, the method comprising: 
receiving a request to identify a set of media items for playback to a user [Paragraph 0033 teaches a user can provide a query for a particular type of music, with the user input component; Paragraph 0068 teaches obtaining a query indicating a desired type of audio content]; 
providing information about the request to a diffusion model (DM) component [Paragraph 0033 teaches a representation of the query is provided to a machine-learned diffusion model; Paragraph 0063 teaches the input can be fed to the machine learned generation model, which is a diffusion model]. 
Huang does not appear to expressly disclose a diffusion model (DM) that is trained to generate a set of vectors; generating, using the DM, through a process of diffusion, a set of vectors corresponding to the information about the request; selecting one or more vectors, from the set of vectors corresponding to the information about the request; providing the one or more vectors to a nearest neighbor (NN) component different from the DM; mapping, using the NN component, the one or more vectors to one or more respective uniform resource identifiers (URIs) corresponding to one or more respective media items; and playing back, using at least one of the URIs, at least one of the one or more respective media items.
Labbé discloses:
selecting one or more vectors, from the set of vectors corresponding to the information about the request [Paragraph 0192 teaches the set of DQN outputs could represent a MIR vector, to be fed into the affective inference process];
providing the one or more vectors to a nearest neighbor (NN) component different from the DM [Paragraph 0192 teaches the vector could be fed into the affective inference process; Paragraph 0273 teaches ranking process using a k-nearest neighbors vector similarity calculation to determine matches]; 
mapping, using the NN component, the one or more vectors to one or more respective media items [Paragraph 0186 teaches using a multi-model system to select particular subsets of audio; Paragraph 0192 teaches output vector can be matched with audio segments that best fit the features]; and 
playing back at least one of the one or more respective media items [Paragraph 0125 teaches sending the music playlist to the listener device to allow the listener to review the playlist; Paragraph 0189 teaches playback controls, and audio being played back to the user].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by selecting one or more vectors, from the set of vectors corresponding to the information about the request; providing the one or more vectors to a different component different from the DM; selecting, using the different component, a set of media items based on the set of one or more vectors; and presenting information about the set of media items to the user, as taught by Labbé [Paragraph 0186, 0192], because both applications are directed to selecting a set of media items based on the vectors; by using a multi-model system, subsets of audio may be selected that would be best for the user, therefore, improving the user’s experience (See Labbé Para [0186]).
Neither Huang nor Labbé appear to expressly disclose a diffusion model (DM) that is trained to generate a set of vectors; generating, using the DM, through a process of diffusion, a set of vectors corresponding to the information about the request; mapping the one or more vectors to one or more respective uniform resource identifiers (URIs) corresponding to one or more respective media items; and playing back, using at least one of the URIs, at least one of the one or more respective media items.
Kumari discloses:
a diffusion model (DM) that is trained to generate a set of vectors; generating, using the DM, through a process of diffusion, a set of vectors corresponding to the information about the request [Paragraph 0043 teaches diffusion model receives an input text, and generates text features from the input text by encoding the text, and vectors of image features are generated by the diffusion model, therefore, a diffusion model that is trained to generate a set of vectors, and using the diffusion model to generate a set of vectors corresponding to the information].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by incorporating a diffusion model (DM) that is trained to generate a set of vectors; generating, using the DM, through a process of diffusion, a set of vectors corresponding to the information about the request, as taught by Kumari [Paragraph 0043], because both applications are directed to management of media items; generating the set of vectors using a diffusion process as opposed to a different algorithm is a simple substitution of one known element for another to obtain predictable results.
Neither Huang, nor Labbé, nor Kumari appear to expressly disclose mapping the one or more vectors to one or more respective uniform resource identifiers (URIs) corresponding to one or more respective media items; and playing back, using at least one of the URIs, at least one of the one or more respective media items.
Chowdhury discloses:
mapping the one or more vectors to one or more respective identifiers corresponding to one or more respective media items [Paragraph 0020 teaches identifying similar item embeddings using vector techniques such as k-nearest neighbor, where item listing identifiers associated with the identified item embeddings are obtained as results; Paragraph 0074 teaches the identified items in the output (hence, mapped identifiers) are determined based on an item listing identifier associated with each item embedding returned from the search].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by mapping the one or more vectors to one or more respective identifiers corresponding to one or more respective media items, as taught by Chowdhury [Paragraph 0020, 0074], because the applications are directed to management of media items; mapping the vectors to items using identifiers is a simple substitution of one known element for another to obtain predictable results, since nearest neighbor components perform search based on embeddings.
Neither Huang, nor Labbé, nor Kumari, nor Chowdhury appear to expressly disclose uniform resource identifiers (URIs); and playing back, using at least one of the URIs, at least one of the media items.
Brown discloses:
uniform resource identifiers (URIs) [Paragraph 0067 teaches each audio item in the playback queue may comprise a uniform resource identifier (URI), a uniform resource locator (URL) or some other identifier]; and 
playing back, using at least one of the URIs, at least one of the media items [Paragraph 0067 each audio item in the playback queue may comprise a uniform resource identifier (URI), a uniform resource locator (URL) or some other identifier that may be used by a playback device for playback; Paragraph 0073 teaches playback devices retrieve for playback audio content (e.g. according to a corresponding URI)].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by incorporating uniform resource identifiers (URIs); and playing back, using at least one of the URIs, at least one of the media items, as taught by Brown [Paragraph 0067, 0073], because the applications are directed to management of media items; using URIs and playing the items using the URI, as opposed to a different type of item identifier is a simple substitution of one known element for another to obtain predictable results.
	Same rationale applies to claims 17 and 18, since they recite similar limitations, and are therefore, similarly rejected.

	As to claim 3:
	Huang as modified by Labbé discloses:
wherein the NN component is configured to exclude one or more media items from the selection of the set of media items [Paragraph 0273 teaches determines which stems to reject based on the rankings generated by the existing stem ranking process].
As to claim 4:
	Huang discloses:
providing the request to a language model component [Paragraph 0049 teaches the request can be text or natural language data, which can be processed by a machine learning model and generate a latent text embedding]; and 
receiving the information about the request from the language model component [Paragraph 0049 the machine learning model can generate a latent text embedding output, a textual segmentation output].

As to claim 5:
	Huang discloses:
wherein the language model component is configured to incorporate information about the user into the information about the request [Paragraph 0045 teaches model can use 
user-specific data received from the user computing device, thereby personalizing the model].

As to claim 6:
	Huang discloses:
the information about the request is provided to the DM as conditioning information [Paragraph 0015 teaches machine-learned diffusion model for generation of high-quality audio data conditioned on textual prompts; Paragraph 0029 teaches the input to a diffusion model can be a conditioning signal c].

As to claim 7:
	Huang as modified by Labbé discloses:
the DM is conditioned based on information about media items previously played back by the user [Paragraph 0150 teaches contextual information including values stored to represent a user’s profile (e.g., personality, age, gender, etc.), taste profile (e.g., music preferences); Paragraph 0162 teaches user profile data can also be leveraged as potential inputs to the neural networks, including taste profile].

As to claim 8:
Huang as modified by Chowdhury discloses:
the request includes identification of at least one media item [Paragraph 0003 teaches receiving a user query or a seed item for recommendation].

As to claim 12:
	Huang discloses:
the request to identify the set of media items comprises information about a desired media type, a desired music genre, a desired music artist, or a desired type of media artist [Paragraph 0033 teaches a user can provide a query for a particular type of music, with the user input component; Paragraph 0068 teaches obtaining a query indicating a desired type of audio content].

As to claim 14:
	Huang as modified by Labbé discloses:
sequencing the one or more respective media items, wherein presenting the information about the one or more respective media items comprises presenting the sequenced set of media items [Paragraph 0160 teaches audio segments are included in sequence in the audio stream].
As to claim 15:
	Huang as modified by Labbé discloses:
the one or more respective media items are sequenced based on information about the user, chronology, textual entailment, sentiment, or metadata information of the set of media items [Paragraph 0160 teaches audio segments are included in sequence in the audio stream, to enable the listener to achieve a desired affective state].

As to claim 16:
	Huang as modified by Labbé discloses:
filtering or sorting the one or more respective media items, and presenting information about the filtered or sorted one or more respective media items [Paragraph 0160 teaches audio segments are included in sequence in the audio stream; Paragraph 0179 teaches user may be presented with a screen showing metadata corresponding to the first audio segment in a music control display].

Claims 9, 10, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (U.S. Publication No. 2024/0282294) hereinafter Huang, in view of Labbé (U.S. 
Publication No. 2023/0113072), in view of Kumari et al. (U.S. Publication No. 2024/0185588) hereinafter Kumari, in view of CHOWDHURY et al. (U.S. Publication No. 2025/0284748) hereinafter Chowdhury, in view of Brown et al. (U.S. Publication No. 2020/0174733) hereinafter Brown, and further in view of Manning et al. (U.S. Publication No. 2024/0111800) hereinafter Manning.
As to claim 9:
	Huang as modified by Brown discloses:
	playing back, using at least one of the URIs, at least one of the one or more respective media items [Paragraph 0067 teaches each audio item in the playback queue may comprise a uniform resource identifier (URI), a uniform resource locator (URL) or some other identifier]. 
Neither Huang nor Brown appear to expressly disclose after playing back, receiving a second request to revise the one or more respective media items; providing information about the second request to the DM; receiving, from the DM, a second set of vectors corresponding to the information about the second request; and presenting information about a second set of media items to the user, the second set of media items selected using the second set of vectors.
	Manning discloses:
after playing back, receiving a second request to revise the one or more respective media items [Paragraph 0049 teaches as a user consumes items from a pool, the pool may be modified in a number of ways; Paragraph 0050 teaches monitoring a user’s behavior and interaction with items in a pool, including receiving explicit feedback from the user]; 
providing information about the second request to the DM [Paragraph 0050 teaches information obtained from such monitoring can be used by the system to modify and reorder the items available in a media item pool that corresponds to the user; Paragraph 0051 teaches determine that additional items with similar qualities should be prioritized and added to the pool as the session continues]; 
receiving, from the DM, a second set of vectors corresponding to the information about the second request [Paragraph 0061 teaches the pool is modified dynamically based on user feedback related to media items in the pool, where if a user plays a song through completely without skipping or providing other negative feedback, similar songs may be added to the pool, where similarity may be based upon a vector distance between the played-through song and the added song]; and 
presenting information about a second set of media items to the user, the second set of media items selected using the second set of vectors [Paragraph 0049 teaches as a user consumes items from a pool, the pool may be modified in a number of ways; Paragraph 0061 teaches pool may be re-sorted at after addition of the new media item, or playback may continue at without re-sorting the pool].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by after playing back, receiving a second request to revise the one or more respective media items; providing information about the second request to the DM; receiving, from the DM, a second set of vectors corresponding to the information about the second request; and presenting information about a second set of media items to the user, the second set of media items selected using the second set of vectors, as taught by Manning [Paragraph 0014, 0049, 0061], because the applications are directed to selecting a set of media items for the user; adapting the model based on user feedback or selections improves the user’s experience by providing a more efficient and accurate selection of media files (See Manning Para [0014, 0070]).

As to claim 10:
	Huang as modified by Manning discloses:
wherein the second request includes identification of one or more media items from the one or more respective media items to include in the second set of media items, and wherein the identification of the one or more media items is provided to the DM as at least a portion of conditioning information [Paragraph 0049 teaches every time the user completely consumes and/or “likes” an item, similar items may be added to the pool].

As to claim 13:
	Huang discloses all the limitations as set forth in the rejections of claim 1 above, but does not appear to expressly disclose the request to identify the set of media items comprises information about what to exclude from the set of media items.
Manning discloses:
the request to identify the set of media items comprises information about what to exclude from the set of media items [Paragraph 0061 teaches feedback can include negative feedback such as a skip or dislike].
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by incorporating information about what to exclude from the set of media items, as taught by Manning [Paragraph 0061], because the applications are directed to selecting a set of media items for the user; adapting the model based on user feedback or selections improves the user’s experience by providing a more efficient and accurate selection of media files (See Manning Para [0014, 0070]).

Response to Arguments
	The following is in response to arguments filed on January 26, 2026. Arguments have been carefully and respectfully considered, but are moot in view of new grounds of rejections, as necessitated by the amendments.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAQUEL PEREZ-ARROYO whose telephone number is (571)272-8969. The examiner can normally be reached Monday - Friday, 8:00am - 5:30pm, Alt Friday, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sherief Badawi can be reached at 571-272-9782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/RAQUEL PEREZ-ARROYO/Primary Examiner, Art Unit 2169

Read full office action

Prosecution Timeline

Show 4 earlier events

Jul 08, 2025

Examiner Interview Summary

Jul 28, 2025

Response Filed

Nov 24, 2025

Final Rejection mailed — §103

Jan 09, 2026

Interview Requested

Jan 26, 2026

Response after Non-Final Action

Feb 23, 2026

Request for Continued Examination

Mar 06, 2026

Response after Non-Final Action

Mar 11, 2026

Non-Final Rejection mailed — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

17/863,511

Patent 12632513

COMPUTER-READABLE RECORDING MEDIUM STORING LEARNING SUPPORT PROGRAM, LEARNING SUPPORT METHOD, AND LEARNING SUPPORT DEVICE

3y 10m to grant Granted May 19, 2026

19/108,804

Patent 12613912

ELECTRONIC DEVICE FOR AT LEAST ONE OF VIDEO MOMENT RETRIEVAL AND HIGHLIGHT DETECTION AND OPERATION METHOD THEREOF

1y 1m to grant Granted Apr 28, 2026

18/201,243

Patent 12608392

Unifying Runtime Catalog and Metastore for a Cloud Storage System

2y 11m to grant Granted Apr 21, 2026

17/498,124

Patent 12566786

NATURAL LANGUAGE PROCESSING WORKFLOW FOR RESPONDING TO CLIENT QUERIES

4y 4m to grant Granted Mar 03, 2026

18/478,228

Patent 12566726

ENABLING EXCLUSION OF ASSETS IN IMAGE BACKUPS

2y 5m to grant Granted Mar 03, 2026

Study what changed to get past this examiner. Based on 5 most recent grants.

Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.

Typically takes 5-10 seconds — AI-generated, attorney review required before filing

Prosecution Projections

3-4

Expected OA Rounds

58%

Grant Probability

90%

With Interview (+31.7%)

3y 4m (~1y 6m remaining)

Median Time to Grant

High

PTA Risk

Based on 298 resolved cases by this examiner. Grant probability derived from career allowance rate.