Prosecution Insights
Last updated: April 19, 2026
Application No. 18/791,022

Systems and Methods for Selecting a Set of Media Items Using a Diffusion Model

Non-Final OA §103
Filed
Jul 31, 2024
Examiner
PEREZ-ARROYO, RAQUEL
Art Unit
2169
Tech Center
2100 — Computer Architecture & Software
Assignee
Spotify AB
OA Round
3 (Non-Final)
58%
Grant Probability
Moderate
3-4
OA Rounds
3y 5m
To Grant
90%
With Interview

Examiner Intelligence

Grants 58% of resolved cases
58%
Career Allow Rate
171 granted / 296 resolved
+2.8% vs TC avg
Strong +32% interview lift
Without
With
+32.3%
Interview Lift
resolved cases with interview
Typical timeline
3y 5m
Avg Prosecution
28 currently pending
Career history
324
Total Applications
across all art units

Statute-Specific Performance

§101
21.9%
-18.1% vs TC avg
§103
47.6%
+7.6% vs TC avg
§102
8.7%
-31.3% vs TC avg
§112
15.0%
-25.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 296 resolved cases

Office Action

§103
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Continued Examination Under 37 CFR 1.114 A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on February 23, 2026 has been entered. Response to Amendment This Office Action has been issued in response to Applicant’s Communication of amended application S/N 18/791,022 filed on January 26, 2026. Claims 1 to 18 are currently pending with the application. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1 to 8, 11, 12, and 14 to 18 are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (U.S. Publication No. 2024/0282294) hereinafter Huang, in view of Labbé (U.S. Publication No. 2023/0113072), in view of Kumari et al. (U.S. Publication No. 2024/0185588) hereinafter Kumari, in view of CHOWDHURY et al. (U.S. Publication No. 2025/0284748) hereinafter Chowdhury, and further in view of Brown et al. (U.S. Publication No. 2020/0174733) hereinafter Brown. As to claim 1: Huang discloses: A method performed at a computing system having one or more processors and memory, the method comprising: receiving a request to identify a set of media items for playback to a user [Paragraph 0033 teaches a user can provide a query for a particular type of music, with the user input component; Paragraph 0068 teaches obtaining a query indicating a desired type of audio content]; providing information about the request to a diffusion model (DM) component [Paragraph 0033 teaches a representation of the query is provided to a machine-learned diffusion model; Paragraph 0063 teaches the input can be fed to the machine learned generation model, which is a diffusion model]. Huang does not appear to expressly disclose a diffusion model (DM) that is trained to generate a set of vectors; generating, using the DM, through a process of diffusion, a set of vectors corresponding to the information about the request; selecting one or more vectors, from the set of vectors corresponding to the information about the request; providing the one or more vectors to a nearest neighbor (NN) component different from the DM; mapping, using the NN component, the one or more vectors to one or more respective uniform resource identifiers (URIs) corresponding to one or more respective media items; and playing back, using at least one of the URIs, at least one of the one or more respective media items. Labbé discloses: selecting one or more vectors, from the set of vectors corresponding to the information about the request [Paragraph 0192 teaches the set of DQN outputs could represent a MIR vector, to be fed into the affective inference process]; providing the one or more vectors to a nearest neighbor (NN) component different from the DM [Paragraph 0192 teaches the vector could be fed into the affective inference process; Paragraph 0273 teaches ranking process using a k-nearest neighbors vector similarity calculation to determine matches]; mapping, using the NN component, the one or more vectors to one or more respective media items [Paragraph 0186 teaches using a multi-model system to select particular subsets of audio; Paragraph 0192 teaches output vector can be matched with audio segments that best fit the features]; and playing back at least one of the one or more respective media items [Paragraph 0125 teaches sending the music playlist to the listener device to allow the listener to review the playlist; Paragraph 0189 teaches playback controls, and audio being played back to the user]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by selecting one or more vectors, from the set of vectors corresponding to the information about the request; providing the one or more vectors to a different component different from the DM; selecting, using the different component, a set of media items based on the set of one or more vectors; and presenting information about the set of media items to the user, as taught by Labbé [Paragraph 0186, 0192], because both applications are directed to selecting a set of media items based on the vectors; by using a multi-model system, subsets of audio may be selected that would be best for the user, therefore, improving the user’s experience (See Labbé Para [0186]). Neither Huang nor Labbé appear to expressly disclose a diffusion model (DM) that is trained to generate a set of vectors; generating, using the DM, through a process of diffusion, a set of vectors corresponding to the information about the request; mapping the one or more vectors to one or more respective uniform resource identifiers (URIs) corresponding to one or more respective media items; and playing back, using at least one of the URIs, at least one of the one or more respective media items. Kumari discloses: a diffusion model (DM) that is trained to generate a set of vectors; generating, using the DM, through a process of diffusion, a set of vectors corresponding to the information about the request [Paragraph 0043 teaches diffusion model receives an input text, and generates text features from the input text by encoding the text, and vectors of image features are generated by the diffusion model, therefore, a diffusion model that is trained to generate a set of vectors, and using the diffusion model to generate a set of vectors corresponding to the information]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by incorporating a diffusion model (DM) that is trained to generate a set of vectors; generating, using the DM, through a process of diffusion, a set of vectors corresponding to the information about the request, as taught by Kumari [Paragraph 0043], because both applications are directed to management of media items; generating the set of vectors using a diffusion process as opposed to a different algorithm is a simple substitution of one known element for another to obtain predictable results. Neither Huang, nor Labbé, nor Kumari appear to expressly disclose mapping the one or more vectors to one or more respective uniform resource identifiers (URIs) corresponding to one or more respective media items; and playing back, using at least one of the URIs, at least one of the one or more respective media items. Chowdhury discloses: mapping the one or more vectors to one or more respective identifiers corresponding to one or more respective media items [Paragraph 0020 teaches identifying similar item embeddings using vector techniques such as k-nearest neighbor, where item listing identifiers associated with the identified item embeddings are obtained as results; Paragraph 0074 teaches the identified items in the output (hence, mapped identifiers) are determined based on an item listing identifier associated with each item embedding returned from the search]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by mapping the one or more vectors to one or more respective identifiers corresponding to one or more respective media items, as taught by Chowdhury [Paragraph 0020, 0074], because the applications are directed to management of media items; mapping the vectors to items using identifiers is a simple substitution of one known element for another to obtain predictable results, since nearest neighbor components perform search based on embeddings. Neither Huang, nor Labbé, nor Kumari, nor Chowdhury appear to expressly disclose uniform resource identifiers (URIs); and playing back, using at least one of the URIs, at least one of the media items. Brown discloses: uniform resource identifiers (URIs) [Paragraph 0067 teaches each audio item in the playback queue may comprise a uniform resource identifier (URI), a uniform resource locator (URL) or some other identifier]; and playing back, using at least one of the URIs, at least one of the media items [Paragraph 0067 each audio item in the playback queue may comprise a uniform resource identifier (URI), a uniform resource locator (URL) or some other identifier that may be used by a playback device for playback; Paragraph 0073 teaches playback devices retrieve for playback audio content (e.g. according to a corresponding URI)]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by incorporating uniform resource identifiers (URIs); and playing back, using at least one of the URIs, at least one of the media items, as taught by Brown [Paragraph 0067, 0073], because the applications are directed to management of media items; using URIs and playing the items using the URI, as opposed to a different type of item identifier is a simple substitution of one known element for another to obtain predictable results. Same rationale applies to claims 17 and 18, since they recite similar limitations, and are therefore, similarly rejected. As to claim 3: Huang as modified by Labbé discloses: wherein the NN component is configured to exclude one or more media items from the selection of the set of media items [Paragraph 0273 teaches determines which stems to reject based on the rankings generated by the existing stem ranking process]. As to claim 4: Huang discloses: providing the request to a language model component [Paragraph 0049 teaches the request can be text or natural language data, which can be processed by a machine learning model and generate a latent text embedding]; and receiving the information about the request from the language model component [Paragraph 0049 the machine learning model can generate a latent text embedding output, a textual segmentation output]. As to claim 5: Huang discloses: wherein the language model component is configured to incorporate information about the user into the information about the request [Paragraph 0045 teaches model can use user-specific data received from the user computing device, thereby personalizing the model]. As to claim 6: Huang discloses: the information about the request is provided to the DM as conditioning information [Paragraph 0015 teaches machine-learned diffusion model for generation of high-quality audio data conditioned on textual prompts; Paragraph 0029 teaches the input to a diffusion model can be a conditioning signal c]. As to claim 7: Huang as modified by Labbé discloses: the DM is conditioned based on information about media items previously played back by the user [Paragraph 0150 teaches contextual information including values stored to represent a user’s profile (e.g., personality, age, gender, etc.), taste profile (e.g., music preferences); Paragraph 0162 teaches user profile data can also be leveraged as potential inputs to the neural networks, including taste profile]. As to claim 8: Huang as modified by Chowdhury discloses: the request includes identification of at least one media item [Paragraph 0003 teaches receiving a user query or a seed item for recommendation]. As to claim 12: Huang discloses: the request to identify the set of media items comprises information about a desired media type, a desired music genre, a desired music artist, or a desired type of media artist [Paragraph 0033 teaches a user can provide a query for a particular type of music, with the user input component; Paragraph 0068 teaches obtaining a query indicating a desired type of audio content]. As to claim 14: Huang as modified by Labbé discloses: sequencing the one or more respective media items, wherein presenting the information about the one or more respective media items comprises presenting the sequenced set of media items [Paragraph 0160 teaches audio segments are included in sequence in the audio stream]. As to claim 15: Huang as modified by Labbé discloses: the one or more respective media items are sequenced based on information about the user, chronology, textual entailment, sentiment, or metadata information of the set of media items [Paragraph 0160 teaches audio segments are included in sequence in the audio stream, to enable the listener to achieve a desired affective state]. As to claim 16: Huang as modified by Labbé discloses: filtering or sorting the one or more respective media items, and presenting information about the filtered or sorted one or more respective media items [Paragraph 0160 teaches audio segments are included in sequence in the audio stream; Paragraph 0179 teaches user may be presented with a screen showing metadata corresponding to the first audio segment in a music control display]. Claims 9, 10, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Huang et al. (U.S. Publication No. 2024/0282294) hereinafter Huang, in view of Labbé (U.S. Publication No. 2023/0113072), in view of Kumari et al. (U.S. Publication No. 2024/0185588) hereinafter Kumari, in view of CHOWDHURY et al. (U.S. Publication No. 2025/0284748) hereinafter Chowdhury, in view of Brown et al. (U.S. Publication No. 2020/0174733) hereinafter Brown, and further in view of Manning et al. (U.S. Publication No. 2024/0111800) hereinafter Manning. As to claim 9: Huang as modified by Brown discloses: playing back, using at least one of the URIs, at least one of the one or more respective media items [Paragraph 0067 teaches each audio item in the playback queue may comprise a uniform resource identifier (URI), a uniform resource locator (URL) or some other identifier]. Neither Huang nor Brown appear to expressly disclose after playing back, receiving a second request to revise the one or more respective media items; providing information about the second request to the DM; receiving, from the DM, a second set of vectors corresponding to the information about the second request; and presenting information about a second set of media items to the user, the second set of media items selected using the second set of vectors. Manning discloses: after playing back, receiving a second request to revise the one or more respective media items [Paragraph 0049 teaches as a user consumes items from a pool, the pool may be modified in a number of ways; Paragraph 0050 teaches monitoring a user’s behavior and interaction with items in a pool, including receiving explicit feedback from the user]; providing information about the second request to the DM [Paragraph 0050 teaches information obtained from such monitoring can be used by the system to modify and reorder the items available in a media item pool that corresponds to the user; Paragraph 0051 teaches determine that additional items with similar qualities should be prioritized and added to the pool as the session continues]; receiving, from the DM, a second set of vectors corresponding to the information about the second request [Paragraph 0061 teaches the pool is modified dynamically based on user feedback related to media items in the pool, where if a user plays a song through completely without skipping or providing other negative feedback, similar songs may be added to the pool, where similarity may be based upon a vector distance between the played-through song and the added song]; and presenting information about a second set of media items to the user, the second set of media items selected using the second set of vectors [Paragraph 0049 teaches as a user consumes items from a pool, the pool may be modified in a number of ways; Paragraph 0061 teaches pool may be re-sorted at after addition of the new media item, or playback may continue at without re-sorting the pool]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by after playing back, receiving a second request to revise the one or more respective media items; providing information about the second request to the DM; receiving, from the DM, a second set of vectors corresponding to the information about the second request; and presenting information about a second set of media items to the user, the second set of media items selected using the second set of vectors, as taught by Manning [Paragraph 0014, 0049, 0061], because the applications are directed to selecting a set of media items for the user; adapting the model based on user feedback or selections improves the user’s experience by providing a more efficient and accurate selection of media files (See Manning Para [0014, 0070]). As to claim 10: Huang as modified by Manning discloses: wherein the second request includes identification of one or more media items from the one or more respective media items to include in the second set of media items, and wherein the identification of the one or more media items is provided to the DM as at least a portion of conditioning information [Paragraph 0049 teaches every time the user completely consumes and/or “likes” an item, similar items may be added to the pool]. As to claim 13: Huang discloses all the limitations as set forth in the rejections of claim 1 above, but does not appear to expressly disclose the request to identify the set of media items comprises information about what to exclude from the set of media items. Manning discloses: the request to identify the set of media items comprises information about what to exclude from the set of media items [Paragraph 0061 teaches feedback can include negative feedback such as a skip or dislike]. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to combine the teachings of the cited references and modify the invention as taught by Huang, by incorporating information about what to exclude from the set of media items, as taught by Manning [Paragraph 0061], because the applications are directed to selecting a set of media items for the user; adapting the model based on user feedback or selections improves the user’s experience by providing a more efficient and accurate selection of media files (See Manning Para [0014, 0070]). Response to Arguments The following is in response to arguments filed on January 26, 2026. Arguments have been carefully and respectfully considered, but are moot in view of new grounds of rejections, as necessitated by the amendments. Conclusion Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAQUEL PEREZ-ARROYO whose telephone number is (571)272-8969. The examiner can normally be reached Monday - Friday, 8:00am - 5:30pm, Alt Friday, EST. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sherief Badawi can be reached at 571-272-9782. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /RAQUEL PEREZ-ARROYO/Primary Examiner, Art Unit 2169
Read full office action

Prosecution Timeline

Jul 31, 2024
Application Filed
Mar 22, 2025
Non-Final Rejection — §103
Jun 18, 2025
Interview Requested
Jul 08, 2025
Examiner Interview Summary
Jul 08, 2025
Applicant Interview (Telephonic)
Jul 28, 2025
Response Filed
Nov 19, 2025
Final Rejection — §103
Jan 09, 2026
Interview Requested
Jan 26, 2026
Response after Non-Final Action
Feb 23, 2026
Request for Continued Examination
Mar 06, 2026
Response after Non-Final Action
Mar 07, 2026
Non-Final Rejection — §103 (current)

Precedent Cases

Applications granted by this same examiner with similar technology

Patent 12566786
NATURAL LANGUAGE PROCESSING WORKFLOW FOR RESPONDING TO CLIENT QUERIES
2y 5m to grant Granted Mar 03, 2026
Patent 12566726
ENABLING EXCLUSION OF ASSETS IN IMAGE BACKUPS
2y 5m to grant Granted Mar 03, 2026
Patent 12555109
DETERMINISTIC CONCURRENCY CONTROL FOR PRIVATE BLOCKCHAINS
2y 5m to grant Granted Feb 17, 2026
Patent 12547602
LOG ENTRY REPRESENTATION OF DATABASE CATALOG
2y 5m to grant Granted Feb 10, 2026
Patent 12517948
INFORMATION PROCESSING METHOD AND DEVICE FOR SORTING MUSIC IN A PLAYLIST
2y 5m to grant Granted Jan 06, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.

AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Powered by AI — typically takes 5-10 seconds

Prosecution Projections

3-4
Expected OA Rounds
58%
Grant Probability
90%
With Interview (+32.3%)
3y 5m
Median Time to Grant
High
PTA Risk
Based on 296 resolved cases by this examiner. Grant probability derived from career allow rate.

Sign in with your work email

Enter your email to receive a magic link. No password needed.

Personal email addresses (Gmail, Yahoo, etc.) are not accepted.

Free tier: 3 strategy analyses per month