Last updated: April 19, 2026
Application No. 18/790,730
AUTOMATED COVER SONG IDENTIFICATION

Non-Final OA §103§DP
Filed
Jul 31, 2024
Examiner
LU, KUEN S
Art Unit
2165
Tech Center
2100 — Computer Architecture & Software
Assignee
Gracenote Inc.
OA Round
3 (Non-Final)
Interview Optional

— +15.2% interview lift. This examiner has a relatively high allow rate; a written response may suffice.
Based on 914 resolved cases, 2023–2026
Examiner Intelligence

LU, KUEN S View full profile →
Grants 85% — above average
Career Allow Rate
781 granted / 914 resolved
+30.4% vs TC avg
Strong +15% interview lift
Without
With
+15.2%
Interview Lift
resolved cases with interview
Typical timeline
3y 3m
Avg Prosecution
16 currently pending
Career history
930
Total Applications
across all art units
Statute-Specific Performance

§101
12.1%
-27.9% vs TC avg
§103
46.2%
+6.2% vs TC avg
§102
18.5%
-21.5% vs TC avg
§112
8.8%
-31.2% vs TC avg
Black line = Tech Center average estimate • Based on career data from 914 resolved cases
Office Action

§103 §DP
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This action is in response to the below Request for Continued Examination (RCE) filed on 11/17/2025. 
Claims 1-21 stand rejected and are pending in which claims 1, 8 and 15 are independent. 
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 11/17/2025 has been entered. 
Response to Arguments
The Applicant's arguments filed 11/17/2025 have been fully and respectfully considered. As per the Examiner’s responses, please refer to below discussions.
With respect to rejections made to claims 1-21 under 35 U.S.C. § 101, the Applicant argued that the claims are eligible based on meeting the prong two of Step 2A of the revised subject matter eligibility test and further based on the  improvement the application made over prior audio identification systems that lack these hardware and software components, the Examiner agreed that the arguments are of merits and persuasive and the Examiner hereby respectfully withdraws the rejections.
As per the non-statutory Double Patenting Rejections made to claims 1-20, the Applicant expressed the rejections be held in abeyance pending determination of the allowable scope of the claims, the Examiner respectfully submits that the rejections is reiterated, including claim 21, in the instant action for making records complete and for expediting the prosecution. When and should the allowable scope of the claims determined, a follow up decision on the rejections would be made accordingly.
Concerning the rejections made to claims 1-21 under 35 U.S.C. § 103, in light of the amendment made to the independent claims 1, 8 and 15, a new reference published to Serra is now incorporated into the instant action. Serra also replaced the THAPLIYA reference.
Double Patenting Rejection
The non-statutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). 

See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).

A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 

The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claim 1-21 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-20 of U.S. Patent No. 12105753. Although the claims at issue are not identical, they are not patentably distinct from each other because they are substantially similar in scope and they use the same limitations. Especially, the U.S. Patent No. 12105753 discloses more details in logic assets with the application scenario. Therefore, it would have been obvious to one of ordinary skill in the art to realize that claims 1-21 of the instant application is fully disclosed by the U.S. Patent No. 12105753.

The following table shows the claims in Instant Application that are rejected by corresponding claim(s) in U.S. Patent No. 12105753.
12105753
Instant Application

8. At least one non-transitory computer readable storage medium comprising instructions that, when executed, cause one or more processors to at least:

execute a constant Q transform on time slices of first audio data to output constant Q transformed time slices;

binarize the constant Q transformed time slices to output binarized and constant Q transformed time slices;
execute a two-dimensional Fourier transform on time windows within the binarized and constant Q transformed time slices to output two-dimensional Fourier transforms of the time windows;

generate a reference data structure based on a sequential order of the two-dimensional Fourier transforms;
store the reference data structure in a database; and

identify a query data structure associated with query audio data as a cover rendition of the audio data based on a comparison of the query data structure and the reference data structure using the similarity matrix, 

wherein the similarity matrix indicates degrees to which reference portions of the reference data structure are associated with query portions of the query data structure, and

wherein the at least one degree satisfies a corresponding threshold.

9. The at least one non-transitory computer readable storage medium of claim 8, wherein the instructions cause the one or more processors to:
determine a distance measure between the query data structure and the reference data structure based on the similarity matrix; and
store an association in the database between the reference audio and the query audio based on the distance measure, the association to identify the query audio as the cover rendition. 

10. The at least one non-transitory computer readable storage medium of claim 9, wherein the instructions are to cause the one or more processors to:
convolve the similarity matrix with a checkerboard kernel to generate a first convolved similarity matrix, the first convolved similarity matrix to include one or more positive elements and one or more negative elements; and
replace the one or more negative elements with zeros to generate a second convolved similarity matrix; and
 wherein: the determination of the distance measure between the query data structure and the reference data structure is based on the second convolved similarity matrix.

11. The at least one non-transitory computer readable storage medium of claim 8, wherein the instructions are to cause the one or more processors to:
group the binarized and constant Q transformed time slices into the time windows prior to the execution of the two-dimensional Fourier transform on the time windows, the time windows to include overlapping time windows of uniform duration; and

apply a blur algorithm to the two-dimensional Fourier transforms of the time windows prior to the sequential ordering of the two-dimensional Fourier transforms in the reference data structure.

12. The at least one non-transitory computer readable storage medium of claim 8, wherein the instructions are to cause the one or more processors to, for respective ones of the constant Q transformed time slices, determine a median value of a range of constant Q transformed time slices that encompasses the respective ones of the constant Q transformed time slices and binarizing the constant Q transformed time slices based on the median value.

13. The at least one non-transitory computer readable storage medium of claim 8, wherein the instructions are to cause the one or more processors to:
obtain metadata associated with the query audio from a content source;
access the database using the metadata to identify a plurality of reference data structures including the reference data structure;
determine a rank of the reference data structure with respect to the plurality of reference data structures; and
after a determination that the rank of the reference data structure satisfies a threshold, identify the reference data structure for comparison with the query data structure.

14. The at least one non-transitory computer readable storage medium of claim 13, wherein the content source is at least one of (i) a stream of a live broadcast, (ii) a music sharing site, (iii) a video sharing site, or (iv) a social networking feed, a post, update, or a tweet of a social network.

1. An apparatus comprising:
at least one memory;
machine-readable instructions; and
one or more processors to execute the machine-readable instructions to at least:
execute a constant Q transform on time slices of first audio data to output constant Q transformed time slices;
binarize the constant Q transformed time slices to output binarized and constant Q transformed time slices;
execute a two-dimensional Fourier transform on time windows within the binarized and constant Q transformed time slices to output two-dimensional Fourier transforms of the time windows;
generate a reference data structure based on a sequential order of the two-dimensional Fourier transforms;
store the reference data structure in a database; and
identify a query data structure associated with query audio data as a cover rendition of the audio data based on a comparison of the query data structure and the reference data structure using a similarity matrix, 

wherein the similarity matrix indicates at least one degree to which reference portions of the reference data structure are associated with query portions of the query data structure, and
wherein the at least one degree satisfies a corresponding threshold.

2. The apparatus of claim 1, wherein the one or more processors are to:
determine a distance measure between the query data structure and the reference data structure based on the similarity matrix; and
store an association in the database between the reference audio and the query audio based on the distance measure, the association to identify the query audio as the cover rendition. 

3. The apparatus of claim 2, wherein the one or more processors are to:
convolve the similarity matrix with a checkerboard kernel to generate a first convolved similarity matrix, the first convolved similarity matrix to include one or more positive elements and one or more negative elements; and
replace the one or more negative elements with zeros to generate a second convolved similarity matrix; and wherein: the determination of the distance measure between the query data structure and the reference data structure is based on the second convolved similarity matrix.

4. The apparatus of claim 1, wherein the one or more processors are to:
group the binarized and constant Q transformed time slices into the time windows prior to the execution of the two-dimensional Fourier transform on the time windows, the time windows to include overlapping time windows of uniform duration; and
apply a blur algorithm to the two-dimensional Fourier transforms of the time windows prior to the sequential ordering of the two-dimensional Fourier transforms in the reference data structure.

5. The apparatus of claim 1, wherein the one or more processors are to, for respective ones of the constant Q transformed time slices, 
determine a median value of a range of constant Q transformed time slices that encompasses the respective ones of the constant Q transformed time slices and binarizing the constant Q transformed time slices based on the median value.

6. The apparatus of claim 1, wherein the one or more processors are to:
obtain metadata associated with the query audio from a content source;
access the database using the metadata to identify a plurality of reference data structures including the reference data structure;
determine a rank of the reference data structure with respect to the plurality of reference data structures; and
after a determination that the rank of the reference data structure satisfies a threshold, identify the reference data structure for comparison with the query data structure.

7. The apparatus of claim 6, wherein the content source is at least one of (i) a stream of a live broadcast, (ii) a music sharing site, (iii) a video sharing site, or (iv) a social networking feed, a post, update, or a tweet of a social network.

15. A method comprising:
executing a constant Q transform on time slices of first audio data to output constant Q transformed time slices;
binarizing the constant Q transformed time slices to output binarized and constant Q transformed time slices;
executing a two-dimensional Fourier transform on time windows within the binarized and constant Q transformed time slices to output two-dimensional Fourier transforms of the time windows;
generating a reference data structure based on a sequential order of the two-dimensional Fourier transforms;
storing the reference data structure in a database; and

identifying a query data structure associated with query audio data as a cover rendition of the audio data based on a comparison of the query data structure and the reference data structure using the similarity matrix, 

wherein the similarity matrix indicates degrees to which reference portions of the reference data structure are associated with query portions of the query data structure, and
wherein the at least one degree satisfies a corresponding threshold.

16. The method of claim 15, further including:
generating a similarity matrix that indicates degrees to which reference portions of the reference data structure are associated with query portions of the query data structure;
determining a distance measure between the query data structure and the reference data structure based on the similarity matrix; and
storing an association in the database between the reference audio and the query audio based on the distance measure, the association to identify the query audio as the cover rendition. 

21. The method of claim 20, wherein the content source is at least one of (i) a stream of a live broadcast, (ii) a music sharing site, (iii) a video sharing site, or (iv) a social networking feed, a post, update, or a tweet of a social network.

17. The method of claim 16, further including:
convolving the similarity matrix with a checkerboard kernel to generate a first convolved similarity matrix, the first convolved similarity matrix to include one or more positive elements and one or more negative elements; and
replacing the one or more negative elements with zeros to generate a second convolved similarity matrix; and wherein:
the determining of the distance measure between the query data structure and the reference data structure is based on the second convolved similarity matrix.

18. The method of claim 15, further including:
arranging the binarized and constant Q transformed time slices into the time windows prior to the execution of the two-dimensional Fourier transform on the time windows, the time windows to include overlapping time windows of uniform duration; and
executing a blur algorithm to the two-dimensional Fourier transforms of the time windows prior to the sequential ordering of the two-dimensional Fourier transforms in the reference data structure.

19. The method of claim 15, further including, for respective ones of the constant Q transformed time slices, determining a median value of a range of constant Q transformed time slices that encompasses the respective ones of the constant Q transformed time slices and binarizing the constant Q transformed time slices based on the median value.

20. The method of claim 15, further including:
obtaining metadata associated with the query audio from a content source;
querying the database using the metadata to identify a plurality of reference data structures including the reference data structure;
determining a rank of the reference data structure with respect to the plurality of reference data structures; and
after a determination that the rank of the reference data structure satisfies a threshold, identifying the reference data structure for comparison with the query data structure. 

1. A tangible non-transitory, computer-readable storage medium comprising instructions, that, when executed by one or more processors, causes the one or more processors to perform performance of a set of operations comprising: 
retrieving rights metadata associated with query audio from a content source; 
identifying the query audio based on a search query, wherein the search query comprises the rights metadata; 
generating a query data structure associated with the query audio by executing a two-dimensional Fourier transform on a representation of the query audio; and 
identifying the query audio as a cover rendition of a reference audio based on comparing the query data structure and a reference data structure generated from one or more tempo-adjusted versions of the reference audio.

2. The tangible non-transitory, computer-readable storage medium of claim 1, wherein the rights metadata comprises one or more of: (i) an artist, (ii) a title, (iii) a publisher; (iv) license information, (v) right holder information, and (v) royalty information, associated with the query audio.

3. The tangible non-transitory, computer-readable storage medium of claim 1, wherein the content source comprises one or more of: (i) a stream of a live broadcast, (ii) a music sharing site, (iii) a video sharing site, (iv) a social networking feed of a social network, (v) a post of a social network, (vi) an update of a social network, and (vii) a tweet of a social network.

4. The tangible non-transitory, computer-readable storage medium of claim 1, wherein generating a query data structure associated with query audio comprises: 
generating the representation of the query audio by:
executing a constant Q transform on query time slices of the query audio; and

binarizing the constant Q transformed query time slices; and 

generating the query data structure based on a sequential order of the two-dimensional Fourier transforms.

21. The tangible non-transitory, computer-readable storage medium of claim 1, 
wherein the representation comprises a constant Q transformed representation of the query audio, and 
wherein the two-dimensional Fourier transform is applied to a time window of the representation.

5. The tangible non-transitory, computer-readable storage medium of claim 1, 
wherein comparing the query data structure and a reference data structure comprises generating a similarity matrix, 
wherein the similarity matrix indicates at least one degree to which reference portions of the reference data structure are associated with query portions of the query data structure, and 
wherein the at least one degree satisfies a corresponding threshold.

6. The tangible non-transitory, computer-readable storage medium of claim 1, wherein the search query further comprises content source metadata.

7. The tangible non-transitory, computer-readable storage medium of claim 1, 
wherein the set of operations further comprises selecting a subset of a reference audio content based on the rights metadata, 
wherein the subset of reference audio content comprises the reference audio.

8. A computing device comprising: 
one or more processors; and 
a tangible non-transitory, computer-readable storage medium comprising instructions, that, when executed by the one or more processors, causes the one or more processors to perform of a set of operations comprising: 
retrieving rights metadata associated with query audio from a content source; 
identifying the query audio based on a search query, wherein the search query comprises the rights metadata; 
generating a query data structure associated with the query audio by executing a two-dimensional Fourier transform on a representation of the query audio; and 
identifying the query audio as a cover rendition of a reference audio based on comparing the query data structure and a reference data structure generated from one or more tempo-adjusted versions of the reference audio.

9. The computing device of claim 8, wherein the rights metadata comprises one or more of: (i) an artist, (ii) a title, (iii) a publisher; (iv) license information, (v) right holder information, and (v) royalty information, associated with the query audio.

10. The computing device of claim 8, wherein the content source comprises one or more of: (i) a stream of a live broadcast, (ii) a music sharing site, (iii) a video sharing site, (iv) a social networking feed of a social network, (v) a post of a social network, (vi) an update of a social network, and (vii) a tweet of a social network.

11. The computing device of claim 8, wherein generating a query data structure associated with query audio comprises: 
generating the representation of the query audio by:

executing a constant Q transform on query time slices of the query audio; and

binarizing the constant Q transformed query time slices; and 

generating the query data structure based on a sequential order of the two-dimensional Fourier transforms.

12. The computing device of claim 8, wherein comparing the query data structure and a reference data structure comprises generating a similarity matrix, 

wherein the similarity matrix indicates at least one degree to which reference portions of the reference data structure are associated with query portions of the query data structure, and 
wherein the at least one degree satisfies a corresponding threshold.

13. The computing device of claim 8, wherein the search query further comprises content source metadata.

14. The computing device of claim 8, wherein the set of operations further comprises selecting a subset of a reference audio content based on the rights metadata, 
wherein the subset of reference audio content comprises the reference audio.

15. A computer-implemented method comprising: 
retrieving rights metadata associated with query audio from a content source; 
identifying the query audio based on a search query, wherein the search query comprises the rights metadata; 
generating a query data structure associated with query audio by executing a two-dimensional Fourier transform on a representation of the query audio; and 
identifying the query audio as a cover rendition of a reference audio based on comparing the query data structure and a reference data structure generated from one or more tempo-adjusted versions of the reference audio.

16. The computer-implemented method of claim 15, wherein the rights metadata comprises one or more of: (i) an artist, (ii) a title, (iii) a publisher; (iv) license information, (v) right holder information, and (v) royalty information, associated with the query audio.

17. The computer-implemented method of claim 15, wherein the content source comprises one or more of: (i) a stream of a live broadcast, (ii) a music sharing site, (iii) a video sharing site, (iv) a social networking feed of a social network, (v) a post of a social network, (vi) an update of a social network, and (vii) a tweet of a social network.

18. The computer-implemented method of claim 15, wherein generating a query data structure associated with query audio comprises: 
executing a constant Q transform on query time slices of the query audio; and
binarizing the constant Q transformed query time slices; and

generating the query data structure based on a sequential order of the two-dimensional Fourier transforms.

19. The computer-implemented method of claim 15, 

wherein comparing the query data structure and a reference data structure comprises generating a similarity matrix, 
wherein the similarity matrix indicates at least one degree to which reference portions of the reference data structure are associated with query portions of the query data structure, and 
wherein the at least one degree satisfies a corresponding threshold.

20. The computer-implemented method of claim 15, wherein the computer-implemented method further comprises selecting a subset of a reference audio content based on the rights metadata, 
wherein the subset of reference audio content comprises the reference audio.

“Omission of element and its function in combination is obvious expedient if the remaining elements perform same functions as before.” See In re Karlson (CCPA) 136 USPQ 184, decide Jan 16, 1963, Appl. No. 6857, U.S. Court of Customs and Patent Appeals.
Claim Rejections - 35 USC § 103
The following is a quotation of - 35 USC § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-3, 6-10, 13-17 and 20 are rejected under 35 USC § 103 as being unpatentable over
Galuten et al.: “ELECTRONIC MUSIC/MEDIA DISTRIBUTION SYSTEM” (United States Patent US 7209892 B1, DATE PUBLISHED 2007-04-24; and DATE FILED 2022-06-28, hereafter “Galuten”), in view of 
Serra et al.: “Audio cover song identification and similarity: background, approaches, evaluation, and beyond”, (Year: 2010, hereafter “Serra”), and further in view of
Mohajer et al.: “SYSTEM AND METHOD FOR TARGETING CONTENT BASED ON IDENTIFIED AUDIO AND MULTIMEDIA” (United States Patent US 9035163 B1, DATE PUBLISHED 2015-05-19; and DATE FILED 2012-05-10, hereafter “Mohajer”).

As per claim 1, Galuten teaches a tangible non-transitory, computer-readable storage medium comprising instructions, that, when executed by one or more processors, causes the one or more processors to perform a set of operations (See col. 37, line 9-11, a system storing computer-readable instructions thereon for execution by a processor for distributing electronic information) comprising: 
retrieving rights metadata associated with query audio from a content source (See col. 6, lines 24-26 and col. 21, lines 6-7, the Delivery Service 118 downloads the actual content and associated rights from its database to the Consumer Player, and player functions includes content rendering the audio content. Here the rights is self a metadata to the content and downloading teaches retrieving and downloading reads on a query to select content to download); and
identifying the query audio based on a search query, wherein the search query comprises the rights metadata (See col. 21, lines 6-7 and 12-14, player functions includes content rendering the audio and consumers have the ability to acquire the rights and to consume the content according to those rights); 
generating a query data structure associated with the query audio (See Col. 18, lines 15-27, the consumer browses a full-featured retailer's site, finds the query page, and defines and submits a query. The query page is a front end interface to the Query Engine and its actual presentation is at the retailer's discretion. The retailer's Query Engine formats a query and submits it to the Content Catalog to firstly verify that the request is valid and then searches its database of content descriptions based on the search attributes and returns a list of references to the retailer's Query Engine. Here query made to the query page is formatted to search content database teaches a data structure generated to perform the search).
Galuten does not explicitly teach the generating by executing a two-dimensional Fourier transform on a representation of the query audio.
Serra teaches executing a two-dimensional Fourier transform on a representation of the audio including the query audio as the audio signal obtained (See Page 14 (Note: Page count includes two pages of Google Patents and Scholar header), Autocorrelation is a well-known operator for converting signals into a delay or shift-invariant representation [58]. Therefore, the power spectrum (or power spectral density), which is formally defined as the Fourier transform of the autocorrelation, is also shift-invariant. Other 2D transforms could be also used, specially shift-invariant operators derived from higher-order spectra [33].). 
It would have been obvious to one having ordinary skill in the art at the time the Applicant’s application was filed to combine Serra’s teaching with Galuten because Galuten is dedicated to facilitating the distribution of media to consumers over a network, while achieving commercial business objectives and protecting the intellectual property rights associated with the media being distributed, and Serra is dedicated to identifying cover songs in a given music collection, the combined teaching of Galuten and Serra references would have enabled Galuten to overcome many difficulties on identifying cover songs with respect to its difference of timbre, tempo, structure, key, arrangement, or language of the vocals from the original song.
Although Galuten in view of Serra of teaches extensively on audio rendition and reference audio, however, Galuten in view of Serra does not explicitly teach identifying the query audio as a cover rendition of a reference audio based on comparing the query data structure and a reference data structure associated with the reference audio.
On the other hand, as an analogous art on audio query, Mohajer teaches identifying the query audio as a cover rendition of a reference audio based on comparing the query data structure and a reference data structure (See col. 11, lines 47-51, identifying multiple renditions by multiple artists of the selected song and linking the selected song aggregate experience category to delivery of promotional content responsive to recognition of an audio query that matches any audio reference of the multiple renditions of a particular song.).
It would have been obvious to one having ordinary skill in the art at the time the Applicant’s application was filed to combine Mohajer’s teaching with Galuten in view of Serra because Galuten is dedicated to facilitating the distribution of media to consumers over a network, while achieving commercial business objectives and protecting the intellectual property rights associated with the media being distributed, Serra is dedicated to identifying cover songs in a given music collection, and Mohajer is dedicated to recognizing audio queries and select related information to return in response to recognition of the audio queries, the combined teaching of Galuten, SERRA and Mohajer references would have enabled Galuten in view of Serra to facilitate linking and return of selected information in response to recognition of audio queries to expedite the distribution of media to consumers.
Galuten in view of Serra and further in view of Mohajer further teaches identifying the query audio as a cover rendition of a reference audio based on comparing the query data structure and a reference data structure generated from one or more tempo-adjusted versions of the reference audio (See Serra: Pages 5 and 15, a cover song can mean any new version, performance, rendition, or recording of a previously recorded track  in which the new version reads on new data structure; and achieving tempo invariance is to estimate the tempo and then aggregate the information contained within comparable units of time; and Mohajer: col. 11, lines 47-51, identifying multiple renditions by multiple artists of the selected song and linking the selected song aggregate experience category to delivery of promotional content responsive to recognition of an audio query that matches any audio reference of the multiple renditions of a particular song).

As per claim 2, Galuten in view of Serra and further in view of Mohajer teaches the tangible non-transitory, computer-readable storage medium of claim 1, wherein the rights metadata comprises one or more of: (i) an artist, (ii) a title, (iii) a publisher (See ); (iv) license information, (v) right holder information (See Mohajer: col. 5, lines 36-40target their promotional content when the device detects audio multimedia from a competitor. In this case, the information provider can bid on the "fingerprint" of the multimedia if they don't own the rights to the original content.), and (v) royalty information, associated with the query audio.

As per claim 3, Galuten in view of Serra and further in view of Mohajer teaches the tangible non-transitory, computer-readable storage medium of claim 1, wherein the content source comprises one or more of: (i) a stream of a live broadcast, (ii) a music sharing site (See Mohajer: col. 4, lines 21-27, the network information provider can select the type of multimedia from an existing database such as a particular song, all songs by an artist or all songs in a given set of genres. Alternatively, the network information provider may upload or provide a link to a new multimedia item, such as an audio file to the database.), (iii) a video sharing site, (iv) a social networking feed of a social network, (v) a post of a social network, (vi) an update of a social network, and (vii) a tweet of a social network.

As per claim 6, Galuten in view of Serra and further in view of Mohajer teaches the tangible non-transitory, computer-readable storage medium of claim 1, wherein the search query further comprises content source metadata (See Galuten: col. 2, lines 58-60, the Delivery Service 118 downloads the actual content and associated rights from its database to the Consumer Player).

As per claim 7, Galuten in view of Serra and further in view of Mohajer teaches the tangible non-transitory, computer-readable storage medium of claim 1, 
wherein the set of operations further comprises selecting a subset of a reference audio content based on the rights metadata (See Galuten: Fig. 9, col. 17, lines 32-37, the Reference Service supplies one based on the default rights or offers associated with the content requested (step 918). Next, the Reference Service checks the offer against the rights associated with the content and against the business rules for the retailer to determine whether the offer is valid (step 920)), 
wherein the subset of reference audio content comprises the reference audio (See Galuten: col. 19, lines 6-15, the consumer is asked if they would like the Consumer Player downloaded. The content reference may specify which of the various versions of the Consumer Player to download where there is a preferred player for the owner of this song. The content reference contains information on how to find content objects. In the example above, the primary function of the content reference is to describe and find a song for the consumer, but there is a conditional function to assist in downloading the Consumer Player.).

As per claims 8-10 and 13-14, the claims recite a computing device comprising: 
one or more processors (Galuten: col. 2, lines 15-17, the system provides a designated module (i.e. instructions executable by a computer processor)); and 
a tangible non-transitory, computer-readable storage medium, that, when executed by the one or more processors, causes performance of a set of operations (See Galuten:  col. 37, line 9-11, a system storing computer-readable instructions thereon for execution by a processor for distributing electronic information) comprising steps of the methods as recited in the claims 1-3 and 6-7, respectively, and rejected above under 35 USC § 103 as being unpatentable over Galuten in view of Serra and further in view of Mohajer.
Therefore, claims 8-10 and 13-14 are rejected along the same rationale that rejected claims 1-3 and 6-7, respectively.

As per claims 15-17 and 20, the claims recite computer-implemented methods comprising a set of operations comprising steps of the methods as recited in the claims 1-3 and 7, respectively, and rejected above under 35 USC § 103 as being unpatentable over Galuten in view of Serra and further in view of Mohajer.
Therefore, claims  15-17 and 20 are rejected along the same rationale that rejected claims 1-3 and 7, respectively.

Claims 4, 11, 18 and 21 are rejected under 35 USC § 103 as being unpatentable over
Galuten in view of Serra and further in view of Mohajer, as applied to claims 1-3, 6-10, 13-17 and 20 above, and further in view of 
Mysore et al.: "SEMI-SUPERVISED SOURCE SEPARATION USING NON-NEGATIVE TECHNIQUES", (U.S. Patent Application Publication 20130132077 A1, DATE PUBLISHED 2013-05-23; and DATE FILED 2011-05-27, hereafter "Mysore").

As per claim 4, Galuten in view of Serra and further in view of Mohajer teaches the tangible non-transitory, computer-readable storage medium of claim 1, wherein generating a query data structure associated with query audio as described above in rejections made to claim 1.
However, Galuten in view of Serra and further in view of Mohajer does not explicitly teach that the generating a query data structure associated with query audio comprises generating the representation of the query audio by: executing a constant Q transform on query time slices of the query audio.
On the other hand, as an analog art on query audio, Mysore teaches hat the generating a query data structure associated with query audio comprises executing a constant Q transform on query time slices of the query audio (See FIGS. 7 A-E and [0086], a spectrogram of a synthesized saxophone playing a C major arpeggio four times. Therefore, four repetitions of the sequence C-E-G may be identified. The spectrogram was computed using an STFT with a window size of 100 ms and a hop size of 25 ms (a constant -Q transform was used for displaying the fundamental frequencies of the different notes and the relation between the fundamental frequencies purposes).).
It would have been obvious to one having ordinary skill in the art at the time the Applicant’s application was filed to combine Mysore’s teaching with Galuten in view of Serra and further in view of Mohajer because Galuten is dedicated to facilitating the distribution of media to consumers over a network, while achieving commercial business objectives and protecting the intellectual property rights associated with the media being distributed, Mohajer is dedicated to recognizing audio queries and select related information to return in response to recognition of the audio queries, Serra is dedicated to identifying cover songs in a given music collection, and Mysore is dedicated to differentiating between constituent sound sources, and the combined teaching of Mysore with Galuten in view of Serra and further in view of Mohajer references would have enabled Galuten in view of Serra and further in view of Mohajer to facilitate recognition of audio queries return by separating noise signals from the audio stream to further expedite the distribution of media to consumers.
Galuten in view of Serra and further in view of Mohajer and Mysore further teaches the following:
binarizing the constant Q transformed query time slices (See Mysore: [0032], algorithms or symbolic representations of operations on binary digital signals stored within a memory and manipulating or transforming signals, typically represented as physical electronic or magnetic quantities within memories); and 
generating the query data structure based on a sequential order of the two-dimensional Fourier transforms (See Mysore: [0044]-[0045], the spectrogram may be a spectrogram generated in sequentially ordering as the magnitude of the short time Fourier transform (STFT) of a signal and construct a dictionary for each segment of the spectrogram. The various segments may be, for example, time frames of the spectrogram. Here the dictionary for each segment of the spectrogram is interpreted the data structure).

As per claim 11 , the claim recites a computing device comprising: 
one or more processors (Galuten: col. 2, lines 15-17, the system provides a designated module (i.e. instructions executable by a computer processor)); and 
a tangible non-transitory, computer-readable storage medium, that, when executed by the one or more processors, causes performance of a set of operations (See Galuten:  col. 37, line 9-11, a system storing computer-readable instructions thereon for execution by a processor for distributing electronic information) comprising steps of the methods as recited in the claim 4, and rejected above under 35 USC § 103 as being unpatentable over Galuten in view of Serra and further in view of Mohajer and Mysore.
Therefore, claim 11 is rejected along the same rationale that rejected claim 4.

As per claim 18, the claim recites computer-implemented methods comprising a set of operations comprising steps of the methods as recited in the claim 4, and rejected above under 35 USC § 103 as being unpatentable over Galuten in view of Serra and further in view of Mohajer and Mysore.
Therefore, claim 18 is rejected along the same rationale that rejected claim 4.

As per claim 21, Galuten in view of Serra and further in view of Mohajer and Mysore teaches the tangible non-transitory, computer-readable storage medium of claim 1, 
wherein the representation comprises a constant Q transformed representation of the query audio (See Mysore: [0086], the spectrogram was computed using an STFT with a window size of 100 ms and a hop size of 25 ms (a constant-Q transform was used for displaying the fundamental frequencies of the different notes and the relation between the fundamental frequencies purposes), and 
wherein the two-dimensional Fourier transform is applied to a time window of the representation (See Serra: Page 14 (Note: Page count includes two pages of Google Patents and Scholar header), Autocorrelation is a well-known operator for converting signals into a delay or shift-invariant representation [58]. Therefore, the power spectrum (or power spectral density), which is formally defined as the Fourier transform of the autocorrelation, is also shift-invariant. Other 2D transforms could be also used, specially shift-invariant operators derived from higher-order spectra [33].; and Mysore: [0044]-[0045], the spectrogram may be a spectrogram generated in sequentially ordering as the magnitude of the short time Fourier transform (STFT) of a signal and construct a dictionary for each segment of the spectrogram. The various segments may be, for example, time frames of the spectrogram.).

Claims 5, 12 and 19 are rejected under 35 USC § 103 as being unpatentable over 
Galuten in view of Serra and further in view of Mohajer, as applied to claims 1-3, 6-10, 13-16 and 20 above, and further in view of 
Harutyunyan et al.: " METHODS AND SYSTEMS TO IDENTIFY ANOMALOUS BEHAVING COMPONENTS OF A DISTRIBUTED COMPUTING SYSTEM", (U.S. Patent Application Publication 20180165142 A1, DATE PUBLISHED 2018-06-14; and DATE FILED 2016-12-12, hereafter "Harutyunyan").

As per claim 5, Galuten in view of Serra and further in view of Mohajer teaches the tangible non-transitory, computer-readable storage medium of claim 1, wherein comparing the query data structure and a reference data structure as described above in rejections made to claim 1.
However, Galuten in view of Serra and further in view of Mohajer does not explicitly teach that the comparing the query data structure and a reference data structure comprises generating a similarity matrix.
However, Harutyunyan teaches comprises generating a similarity matrix (See FIG. 27C and [0107], similarity matrix of similarities calculated for each pair of the seven event sources, for example, the similarity between (ES.sub.B, ES.sub.F} and ES.sub.A is 0.5 and the similarity between (ES.sub.E, ES.sub.E) and ES.sub.Eis 0.333 as revealed by the corresponding matrix elements). 
It would have been obvious to one having ordinary skill in the art at the time the Applicant’s application was filed to combine Harutyunyan’s teaching with Galuten in view of Serra and further in view of Mohajer because Galuten is dedicated to facilitating the distribution of media to consumers over a network, while achieving commercial business objectives and protecting the intellectual property rights associated with the media being distributed, Serra is dedicated to identifying cover songs in a given music collection, Mohajer is dedicated to recognizing audio queries and select related information to return in response to recognition of the audio queries, and Harutyunyan is dedicated to identifying anomalous behaving components of a distributed computing system, and the combined teaching of Harutyunyan with Galuten in view of Serra and further in view of Mohajer references would have enabled Galuten in view of Serra and further in view of Mohajer to use similarity matrix techniques to differentiate between constituent sound sources.
Galuten in view of Serra and further in view of Mohajer and Harutyunyan further teaches the following:
wherein the similarity matrix indicates at least one degree to which reference portions of the reference data structure are associated with query portions of the query data structure (See Harutyunyan: Fig. 26A and [0103], similarities calculated for each pair of M event sources denoted by ES.sub.1, ES.sub.2, . . . , ES.sub.M. The similarity matrix elements are denoted by Sim(m, k), where 1≤k, m≤M. Here the similarity calculated teaches the degree of similarity of the portions between the compared), and 
wherein the at least one degree satisfies a corresponding threshold (See Harutyunyan: Fig. 26B and [0106], dashed line 2626 represents a dissimilarity threshold that corresponds to a minimum similarity. Event sources connected by branch point 2628 are less than the dissimilarity threshold 2626 (i.e., minimum similarity) are separated into event source clusters C.sub.1 and C.sub.2. In other words, event sources that are connected by branch points (i.e., similarities) that are greater than the dissimilarity threshold 2626 (i.e., minimum similarity) form event source clusters.).

As per claim 12 , the claim recites a computing device comprising: 
one or more processors (Galuten: col. 2, lines 15-17, the system provides a designated module (i.e. instructions executable by a computer processor)); and 
a tangible non-transitory, computer-readable storage medium, that, when executed by the one or more processors, causes performance of a set of operations (See Galuten:  col. 37, line 9-11, a system storing computer-readable instructions thereon for execution by a processor for distributing electronic information) comprising steps of the methods as recited in the claim 5, and rejected above under 35 USC § 103 as being unpatentable over Galuten in view of Serra and further in view of Mohajer and Harutyunyan.
Therefore, claim 12 is rejected along the same rationale that rejected claim 5.

As per claim 19, the claim recites computer-implemented methods comprising a set of operations comprising steps of the methods as recited in the claim 5, and rejected above under 35 USC § 103 as being unpatentable over Galuten in view of Serra and further in view of Mohajer and Harutyunyan.
Therefore, claim 19 is rejected along the same rationale that rejected claim 5.
Related Prior Arts
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure can be found in the PTO-892 Notice of Reference Cited. 
Conclusion
Examiner has cited particular columns and line numbers in the references applied to the claims above for the convenience of the applicant. Although the specified citations are representative of the teachings of the art and are applied to specific limitations within the individual claim, other passages and figures may apply as well. It is respectfully requested from the applicant in preparing responses, to fully consider the references in entirety as potentially teaching all or part of the claimed invention, as well as the context of the passage as taught by the prior art or disclosed by the Examiner. SEE MPEP 2141.02 [R-5] VI. PRIOR ART MUST BE CONSIDERED IN ITS ENTIRETY, INCLUDING DISCLOSURES THAT TEACH AWAY FROM THE CLAIMS: A prior art reference must be considered in its entirety, i.e., as a whole, including portions that would lead away from the claimed invention. W.L. Gore & Associates, Inc. v. Garlock, Inc., 721 F.2d 1540, 220 USPQ 303 (Fed. Cir. 1983), cert. denied, 469 U.S. 851 (1984) In re Fulton, 391 F.3d 1195, 1201, 73 USPQ2d 1141, 1146 (Fed. Cir. 2004). >See also MPEP §2123. 
In the case of amending the Claimed invention, Applicant is respectfully requested to indicate the portion(s) of the specification which dictate(s) the structure relied on for proper interpretation and also to verify and ascertain the metes and bounds of the claimed invention. 
Contact Information
Any inquiry concerning this communication or earlier communications from the Examiner should be directed to KUEN S LU whose telephone number is (571)272-4114. The examiner can normally be reached on M-F, 8-19, Mid-Flex 2 hours.
If attempts to reach the examiner by telephone pre unsuccessful, the examiner's Supervisor, Mr. Ajay Bhatia can be reached on 5712723906. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for Page 13 Published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http: “//pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system; contact the Electronic Business Center (EBC) at 866-217-9197 (toll free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, please call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
KUEN S LU /Kuen S Lu/
Art Unit 2156
Primary Patent Examiner
February 7, 2026
Read full office action
Prosecution Timeline

Jul 31, 2024
Application Filed
Mar 15, 2025
Non-Final Rejection — §103, §DP
Jul 21, 2025
Response Filed
Aug 13, 2025
Final Rejection — §103, §DP
Aug 18, 2025
Applicant Interview (Telephonic)
Aug 18, 2025
Examiner Interview Summary
Nov 17, 2025
Request for Continued Examination
Nov 24, 2025
Response after Non-Final Action
Feb 07, 2026
Non-Final Rejection — §103, §DP (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/482,614
Patent 12566775
MANAGING CONTENT ACROSS DISCRETE SYSTEMS
2y 5m to grant Granted Mar 03, 2026
18/182,953
Patent 12561282
DYNAMIC CLUSTERING BASED ON ATTRIBUTE RELATIONSHIPS
2y 5m to grant Granted Feb 24, 2026
18/808,324
Patent 12561343
SYSTEM AND METHOD FOR STRUCTURING AND ACCESSING TENANT DATA IN A HIERARCHICAL MULTI-TENANT ENVIRONMENT
2y 5m to grant Granted Feb 24, 2026
18/985,592
Patent 12561292
METHODS AND APPARATUS TO ESTIMATE CARDINALITY THROUGH ORDERED STATISTICS
2y 5m to grant Granted Feb 24, 2026
18/431,473
Patent 12554687
GATEWAY SYSTEM THAT MAPS POINTS INTO A GRAPH SCHEMA
2y 5m to grant Granted Feb 17, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
AI Strategy Recommendation

Get an AI-powered prosecution strategy using examiner precedents, rejection analysis, and claim mapping.
Prosecution Projections

3-4
Expected OA Rounds
85%
Grant Probability
99%
With Interview (+15.2%)
3y 3m
Median Time to Grant
High
PTA Risk
Based on 914 resolved cases by this examiner. Grant probability derived from career allow rate.
AUTOMATED COVER SONG IDENTIFICATION

Interview Optional

Examiner Intelligence

Statute-Specific Performance

Office Action

Prosecution Timeline

Precedent Cases

Applications granted by this same examiner with similar technology

AI Strategy Recommendation

Prosecution Projections

Ready to respond to this office action?

Sign in with your work email