Last updated: April 19, 2026
Application No. 18/335,060
CUEPOINT DETERMINATION SYSTEM

Non-Final OA §103§DP
Filed
Jun 14, 2023
Examiner
MAUNG, THOMAS H
Art Unit
2692
Tech Center
2600 — Communications
Assignee
Spotify AB
OA Round
1 (Non-Final)
Interview Optional

— +38.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 382 resolved cases, 2023–2026
Examiner Intelligence

MAUNG, THOMAS H View full profile →
Grants 63% of resolved cases
Career Allow Rate
242 granted / 382 resolved
+1.4% vs TC avg
Strong +38% interview lift
Without
With
+38.2%
Interview Lift
resolved cases with interview
Typical timeline
2y 11m
Avg Prosecution
24 currently pending
Career history
406
Total Applications
across all art units
Statute-Specific Performance

§101
6.4%
-33.6% vs TC avg
§103
54.5%
+14.5% vs TC avg
§102
13.7%
-26.3% vs TC avg
§112
12.9%
-27.1% vs TC avg
Black line = Tech Center average estimate • Based on career data from 382 resolved cases
Office Action

§103 §DP
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The filing of a terminal disclaimer by itself is not a complete reply to a nonstatutory double patenting (NSDP) rejection. A complete reply requires that the terminal disclaimer be accompanied by a reply requesting reconsideration of the prior Office action. Even where the NSDP rejection is provisional the reply must be complete. See MPEP § 804, subsection I.B.1. For a reply to a non-final Office action, see 37 CFR 1.111(a). For a reply to final Office action, see 37 CFR 1.113(c). A request for reconsideration while not provided for in 37 CFR 1.113(c) may be filed after final for consideration. See MPEP §§ 706.07(e) and 714.13.
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The actual filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/apply/applying-online/eterminal-disclaimer.
Claims 16-35 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-18 of U.S. Patent No. 11,714,594. Although the claims at issue are not identical, they are not patentably distinct from each other because the instant set of claims are the broader version of the patent and therefore anticipates it as provided in sample comparison below.


Instant application
Patent 11,714,594
16. (New) A system for determining a cuepoint in media content, the system comprising:one or more processors; and one or more computer-readable storage devices storing data instructions that, when executed by the one or more processors, cause the system to:




receive at least a portion of audio content of a media content item; identify a plurality of beats in the received audio content; 









extract one or more acoustic feature groups for each beat in a set of beats from the plurality of beats, the set of beats including a predetermined number of beats from a beginning of the received audio content; provide the extracted acoustic feature groups as input to a trained model; receive as output from the trained model one or more candidate cuepoint placements, each candidate cuepoint placement including a probability that a beat is a valid candidate for cuepoint placement; select a candidate cuepoint placement based on the probabilities; and determine to place the cuepoint at the selected candidate cuepoint placement, the cuepoint defining a fade in transition point for the media content item.
11. A system for placing a cuepoint in a media content item, the system comprising: a convolutional neural network (CNN); and a server communicatively coupled to the CNN, the server comprising at least one processing device and a memory coupled to the at least one processing device and storing instructions, that when executed by the at least one processing device, cause the at least one processing device to: 

receive at least a portion of audio content of the media content item; normalize the received audio content into a plurality of beats; partition the plurality of beats into temporal sections; select one or more temporal sections comprising a subset of the plurality of beats, the one or more temporal sections being selected based on whether a cuepoint placement corresponds to place of: (i) a start cuepoint that serves as a fade in transition point for the media content item; or (ii) an end cuepoint that serves as a fade out transition point for the media content item; for each respective temporal section of the one or more selected temporal sections: 
extract one or more acoustic feature groups for each beat of one or more beats within the respective temporal section; and provide the extracted acoustic feature groups for the one or more beats within the respective temporal section as input to the CNN configured to predict whether the respective temporal section is indicative of a candidate cuepoint placement; receive as output from the CNN, for each respective temporal section of the one or more selected temporal sections, a probability that a beat immediately following the respective temporal section is the candidate cuepoint placement; compare the received probability across the one or more of the temporal sections; and determine to place the cuepoint placement in the media content item, from among one or more candidate cuepoint placements received as output from the CNN, at the beat immediately following the temporal section having the highest probability based on the comparison.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 16-35 are rejected under 35 U.S.C. 103 as being unpatentable over McCallum (US 2020/0074982) in view of Conejo et al. (US 2020/0105303).
Claim 16
McCallum teaches a system for determining a cuepoint in media content, the system comprising: 
one or more processors; and one or more computer-readable storage devices storing data instructions that, when executed by the one or more processors, cause the system to: 
receive at least a portion of audio content of a media content item ([0025] FIG. 2 illustrates an example similarity analysis system 200 including the example incoming digital audio 106); 
identify a plurality of beats in the received audio content ([0018] To detect beats in incoming digital audio 106, the example training system 100 includes an example beat detector 108.); 
extract one or more acoustic feature groups for each beat in a set of beats from the plurality of beats ([0028] The example deep feature generator 122 forms a set of deep features 124 for each of the segments 202 formed by the segment extractor 204. Each set of the deep features 124 is placed in a column of a feature matrix 206 by an aggregator 208. [0029], Each of the segments 202 is passed into the example neural network 104 to form a set of deep features 124 for the beat associated with the segment 202. The example aggregator 208 forms the feature matrix 206 by placing the set of deep features 124 into a column for the beat associated with the segment 202. Thus, the feature matrix 206 has a column for each beat, and the data in each column represents the set of deep features 124 associated with the beat.), 
provide the extracted acoustic feature groups as input to a trained model ([0028] The example deep feature generator 122 forms a set of deep features 124 for each of the segments 202 formed by the segment extractor 204. Each set of the deep features 124 is placed in a column of a feature matrix 206 by an aggregator 208.  Examiner notes the neural network 104, aggregator 208, and similarity processor 212 is viewed as a trained model in combination; See also [0031]); 
receive as output from the trained model one or more candidate cuepoint placements (See similarity processor 212 of Fig. 2, that generates similarity matrix to identify similar and/or dissimilar audio segments; See also details of similarity processor in Fig. 5; [0040], The self-similarity matrix former 502 computes a distance (e.g., cosine distance, a Euclidean distance, etc.) between columns of the feature matrices 206 (e.g., sets of deep features 124) associated with two segments (e.g., each including four beats) to form a self-similarity matrix 504. The self-similarity matrix former 502 computes a distance for all pairs of segments, and stores the distances in the self-similarity matrix 504. Examiner notes, for example, portions of audio with corresponding similarity values could indicate whether the beats of the portions could be candidates for cuepoint placement based on user’s preference), 
McCallum does not explicitly detail the set of beats including a predetermined number of beats from a beginning of the received audio content; each candidate cuepoint placement including a probability that a beat is a valid candidate for cuepoint placement; select a candidate cuepoint placement based on the probabilities; and determine to place the cuepoint at the selected candidate cuepoint placement, the cuepoint defining a fade in transition point for the media content item.  
The analogous art Conejo discloses [0062] In one embodiment, the system estimates jumps derived from similarity estimation. In this embodiment, if two audio segments, A and B for instance, are perceived as similar by a listener the system 300 can jump from any part of the segment A to its corresponding part in segment B. When the jump happens, the listener will not be able to account for it since the audio content is similar. The system 300 estimates the similarity between audio segments by using deep learning techniques. 
Conejo teaches the set of beats including a predetermined number of beats from a beginning of the received audio content ([0047] In one embodiment, based on the above hypotheses, following are the three models for entry point recommendations.  [0056] Learning based: [0057] Train a neural network to predict good candidates for entry points [0058] Leverage the transition feature network that has learnt a music embedding space from transition module presented in the “Finding jumps” section. [0059] Within the embedding space, train a subnetwork to detect regions in a track that best transition from silence to music and sound like music intros. [0060] The training data comprise of a collection of short examples of music beginnings (intros). [0061] With the entry point determined for the rearranged audio track, the system 300 finds the jumps using the transition matrix. In one embodiment, the system 300 finds the jumps by using an algorithmic sub-system responsible for estimating if two audio segments of the original track sound pleasant when played one after another.); 
each candidate cuepoint placement including a probability that a beat is a valid candidate for cuepoint placement ([0063], In this embodiment, the system 300 estimates the likeliness of a transition between two audio segments leveraging using a deep learning approach.); 
select a candidate cuepoint placement based on the probabilities ([0075] If two segment form an admissible jump, then its joining probability comes from the priority score defined during by the fusion (see paragraph Fusion of estimated jumps). If the two segments do not form an admissible jump, then its joining probability is set to 0.); and 
determine to place the cuepoint at the selected candidate cuepoint placement, the cuepoint defining a fade in transition point for the media content item ([0068] In this embodiment, the system 300 has determined a set of possible jumps points within the transition matrixes. In one embodiment, the jump points are a pair of discontinuous points in the song that form a musically pleasant sounding segment if an audio segment ending with one point of the jumps sound similar to when played before an audio segment starting with the other jump point. [0069], This jump point refinement model is able to look at the pre and post jump audio signals and select optimal offset, cross-fade, and volume adjustment parameters to make a seamless progression over the jump.).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate selection of cuepoint placements as taught by Conejo with the audio segmentation method of McCallum, because doing so would have resulted in audio trach that has an aesthetically pleasing construction (e.g., intro, body, outro) that follows the intended artistic progression ([0033] of Conejo) 
Claim 17
The combination teaches the system of claim 16, wherein the probability that a beat is a valid candidate for cuepoint placement is based on the one or more acoustic feature groups for one or more beats immediately preceding the beat ([0046] of Conejo, (i) Music following long pauses or silences can make good starting frames; [0047] In one embodiment, based on the above hypotheses, following are the three models for entry point recommendations. Contiguous silence based: [0048] Compute perceptually weighted loudness. [0049] Detect contiguous silent regions (pauses and breaks) using loudness. [0050] Entry points are the beats following the detections.).  
Claim 18
The combination teaches the system of claim 16, wherein the trained model is a convolutional neural network ([0014] of McCallum, FIG. 12 an example convolutional neural network architecture.). 
Claim 19
The combination teaches the system of claim 16, wherein the beats in the set of beats are partitioned into one or more temporal sections ([0027] of McCallum, For instance, the segment extractor 204 generates a first segment 202 consisting of beats one to four inclusive, a second segment 202 of beats two to five inclusive, a third segment 202 of beats three to six inclusive, etc.).  
Claim 20
The combination teaches the system of claim 19, wherein the probability that a beat is a valid candidate for cuepoint placement is based on the one or more acoustic feature groups for each beat in a temporal section that immediately precedes the beat ([0036] of Conejo, in one embodiment, because the transition matrix is can be defined based on individual beats, the rearranged audio track can have transitions (or jumps) with audio segments. For example, and in one embodiment, there can be a jump from within one chorus into the middle of the second chorus. [0050] of Conejo, Entry points are the beats following the detections.).  
Claim 21
The combination teaches the system of claim 16, wherein the selected candidate cuepoint placement has a highest probability from among the received one or more candidate cuepoint placements ([0031] of Conejo, Points that are similar have scores that indicate the similarity and are more likely candidates for transitions or jumps for rearranged audio track. [0075] of Conejo, The probability of joining two segments is induced by the admissible set of jumps. If two segment form an admissible jump, then its joining probability comes from the priority score defined during by the fusion (see paragraph Fusion of estimated jumps). If the two segments do not form an admissible jump, then its joining probability is set to 0. See also [0089]; [0046] of McCallum,  In some examples, if there are multiple peaks within a short time window (e.g., 8 or 16 beats), then only the peak with the highest novelty value is selected.).  
Claim 22
This claim recites substantially the same limitations covered in claim 16 above and it is rejected for the same reasons, except that it is directed to beats from an ending of the audio content and determination of end cuepoint defining a fade out transition point instead of the fade-in transition point in claim 1 ([0061] of Conejo, In addition, and in one embodiment, system 300 determines exit points for the input audio track. In this embodiment, the exit point can be at the end of the input audio track,). Claim limitations as follows:
The combination teaches the system of claim 16, wherein the one or more computer-readable storage devices further store data instructions that, when executed by the one or more processors, cause the system to: 
extract one or more acoustic feature groups for each beat in a second set of beats from the plurality of beats, the second set of beats including a predetermined number of beats from an ending of the received audio content; provide the extracted acoustic feature groups for each beat in the second set of beats as input to the trained model; receive as output from the trained model one or more candidate end cuepoint placements, each candidate end cuepoint placement including a probability that a beat is a valid candidate for end cuepoint placement; select a candidate end cuepoint placement based on the probabilities of the one or more candidate end cuepoint placements; and determine to place an end cuepoint at the selected end candidate cuepoint placement, the end cuepoint defining a fade out transition point for the media content item. 
 
Claim 23
This claim recites substantially the same limitations as those provided in claim 16, except, similar to claim 22 above, the claim is directed to beats from an ending of the audio content and determination of end cuepoint defining a fade out transition point instead of the fade-in transition point in claim1. Therefore it is rejected for the same reasons as in claim 1 and claim 22. ([0061] of Conejo, In addition, and in one embodiment, system 300 determines exit points for the input audio track. In this embodiment, the exit point can be at the end of the input audio track,).
Claim 24
This claim recites substantially the same limitations as those provided in claim 17 above, and therefore it is rejected for the same reasons.
Claim 25
The combination teaches the system of claim 23, wherein to identify a plurality of beats in the received audio content includes to: normalize the received audio content into the plurality of beats ([0018] of McCallum, The example beat detector 108 of FIG. 1 generates an example stream of beat markers 110 representing the detected beats (e.g., a stream, list, etc. of timestamps for the detected beats).).  
Claim 26
This claim recites substantially the same limitations as those provided in claim 19 above, and therefore it is rejected for the same reasons.
Claim 27
This claim recites substantially the same limitations as those provided in claim 20 above, and therefore it is rejected for the same reasons.
Claim 28
The combination teaches the system of claim 23, wherein the one or more acoustic feature groups include at least one of downbeat confidence, position in bar, loudness, timbre, pitch, and vocal activity ([0022] of McCallum, For example, the deep feature generator 122 may generate deep features 124 that are representative of pitch, melodies, chords, rhythms, timbre modulation, instruments, production methods and/or effects (e.g., filtering, compression, panning), vocalists, dynamics etc. ).
Claim 29
This claim recites substantially the same limitations as those provided in claim 16 above, and therefore it is rejected for the same reasons.
Claim 30
This claim recites substantially the same limitations as those provided in claim 16 above, and therefore it is rejected for the same reasons, except now further directed to partitioning the plurality of beats into one or more temporal sections ([0027] of McCallum, For instance, the segment extractor 204 generates a first segment 202 consisting of beats one to four inclusive, a second segment 202 of beats two to five inclusive, a third segment 202 of beats three to six inclusive, etc. [0036] of Conejo, In one embodiment, because the transition matrix is can be defined based on individual beats, the rearranged audio track can have transitions (or jumps) with audio segments. For example, and in one embodiment, there can be a jump from within one chorus into the middle of the second chorus. Thus, in this example, the transition are not limited to be between similar sounding audio segments (e.g., intro, verse, chorus, solo, outro, etc.), but can be between beats in different parts of the audio track. This gives additional flexibility to determine a wider variety of rearranged audio tracks than available using human curated audio segments.).
Claim 31
This claim recites substantially the same limitations as those provided in claim 18 above, and therefore it is rejected for the same reasons.
Claim 32
The combination teaches the system of claim 30, wherein the beat associated with the respective temporal section is a beat immediately following the respective temporal section (Similar to claim 17 above: [0046] of Conejo, (i) Music following long pauses or silences can make good starting frames; [0047] In one embodiment, based on the above hypotheses, following are the three models for entry point recommendations. Contiguous silence based: [0048] Compute perceptually weighted loudness. [0049] Detect contiguous silent regions (pauses and breaks) using loudness. [0050] Entry points are the beats following the detections.).  
Claim 33
This claim recites substantially the same limitations as those provided in claim 28 above, and therefore it is rejected for the same reasons.
Claim 34
The combination teaches the system of claim 30, wherein to determine to place the cuepoint based on the selected candidate cuepoint placement includes to: determine to place the cuepoint at the beat associated with the respective temporal section for which the selected candidate cuepoint is received ([0036] of Conejo, Thus, in this example, the transition are not limited to be between similar sounding audio segments (e.g., intro, verse, chorus, solo, outro, etc.), but can be between beats in different parts of the audio track. ).
Claim 35
This claim recites substantially the same limitations covered in claim 30 above and it is rejected for the same reasons, except that it is directed to beats from an ending of the audio content and determination of end cuepoint defining a fade out transition point instead of the fade-in transition point in claim 1 ([0061] of Conejo, In addition, and in one embodiment, system 300 determines exit points for the input audio track. In this embodiment, the exit point can be at the end of the input audio track,).

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS H MAUNG whose telephone number is (571)270-5690. The examiner can normally be reached Monday-Friday, 9am-6pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Carolyn R. Edwards can be reached at 1-(571) 2707136. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/THOMAS H MAUNG/Primary Examiner, Art Unit 2692
/CAROLYN R EDWARDS/Supervisory Patent Examiner, Art Unit 2692
Read full office action
Prosecution Timeline

Jun 14, 2023
Application Filed
Jan 13, 2026
Non-Final Rejection — §103, §DP
Apr 13, 2026
Applicant Interview (Telephonic)
Apr 13, 2026
Examiner Interview Summary
Precedent Cases

Applications granted by this same examiner with similar technology

17/508,120
Patent 12602446
DATA COMMUNICATION SYSTEM
2y 5m to grant Granted Apr 14, 2026
17/878,697
Patent 12602196
Audio Playback Adjustment
2y 5m to grant Granted Apr 14, 2026
17/478,948
Patent 12585653
PARSING IMPLICIT TABLES
2y 5m to grant Granted Mar 24, 2026
17/658,807
Patent 12586562
ANIMATED SPEECH REFINEMENT USING MACHINE LEARNING
2y 5m to grant Granted Mar 24, 2026
18/136,779
Patent 12578918
STREAMING AUDIO TO DEVICE CONNECTED TO EXTERNAL DEVICE
2y 5m to grant Granted Mar 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

1-2
Expected OA Rounds
63%
Grant Probability
99%
With Interview (+38.2%)
2y 11m
Median Time to Grant
Low
PTA Risk
Based on 382 resolved cases by this examiner. Grant probability derived from career allow rate.