Last updated: May 29, 2026
Application No. 18/677,371
SYSTEMS, METHODS, AND APPARATUS TO IMPROVE MEDIA IDENTIFICATION

Non-Final OA §103§112
Filed
May 29, 2024
Priority
Sep 06, 2018 — provisional 62/727,905 +2 more
Examiner
PATEL, YOGESHKUMAR G
Art Unit
2691
Tech Center
2600 — Communications
Assignee
Gracenote Inc.
OA Round
1 (Non-Final)
Interview Optional

— +3.3% interview lift. Interview lift (+3.3%) is below the 15.0% threshold. A written response is recommended.
Based on 655 resolved cases, 2023–2026
Examiner Intelligence

PATEL, YOGESHKUMAR G View full profile →
Grants 83% — above average
Career Allowance Rate
543 granted / 655 resolved
+20.9% vs TC avg
Minimal +3% lift
Without
With
+3.3%
Interview Lift
resolved cases with interview
Typical timeline
2y 3m
Avg Prosecution
20 currently pending
Career history
672
Total Applications
across all art units
Statute-Specific Performance

§101
1.5%
-38.5% vs TC avg
§103
91.9%
+51.9% vs TC avg
§102
3.6%
-36.4% vs TC avg
§112
1.0%
-39.0% vs TC avg
Black line = Tech Center average estimate • Based on career data from 655 resolved cases
Office Action

§103 §112
DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 6 and 14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 6 and 14 recites “wherein second threshold of hits comprises a second quantity of hits”. There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Vogel (US #2011/0173185) in view of He et al. (US #2011/0299721) further in view of Han et al. (US #2016/0217799) and Scherf et al. (US #2014/0280304).

Regarding Claim 1, Vogel discloses a tangible, non-transitory computer readable storage medium (title, abstract, figs. 1-8; ¶0129) comprising instructions that, when executed, cause at least one processor to perform set of operations comprising:
generating hashed media data values, wherein generating the hashed media data values comprises performing a hash function on reference audio data values (Vogel figs. 2A-2B; ¶0088: the hash values can be generated);
generating a first candidate list (Vogel ¶0055 discloses the fingerprint of the audio sample, or "query fingerprint" is processed by using the hash function to generate "query hash values." The query hash values are compared against the set of hash values of the known recordings, or "known hash values" to locate a set of possible matches. The set of possible matches are referred to herein as "a candidate set." In the second stage, the query fingerprint is compared to the full-recording fingerprints corresponding to the candidate set), wherein generating the first candidate list comprises mapping the hashed media data values to a first set of buckets in a hash table (Vogel ¶0087 discloses for each hash value, the corresponding key in a hash map corresponds to a list of (unique ID, Time Offset of peak) pairs. ¶0096 discloses the hash table, can thus be used to map a hash value hi to a list of (unique ID, thi) pairs);
identifying a first set of reference matches from the first candidate list (Vogel ¶0060 discloses the server 205 uses a recognition engine 210 that attempts to match the audio fingerprint to one or more audio fingerprints stored in the fingerprint database 140. ¶0111 discloses the density hash table is used to keep track, for each candidate match, the number of hash values generated from the query fingerprint that correspond to the candidate match), wherein the first set of reference matches satisfy a first threshold of hits (Vogel ¶0083 discloses if the difference between the mean and the value of the local maximum, (max-m) exceeds some predetermined threshold, the local maximum is retained. Otherwise, the local maximum is discarded. ¶0111 discloses once the hits counter reaches a predetermined threshold for a candidate match, the (unique ID, matchTimeOffset) is added to the list of candidate matches, where "matchTimeOffset" is the time offset within the known recording associated with the unique ID where the hash values were found to match).
Vogel may not explicitly disclose wherein the first set of buckets comprises a first bucket size;  generating a second candidate list, wherein generating the second candidate list comprises mapping a subset of the first set of reference matches that do not satisfy the first threshold of hits to a second set of buckets in the hash table, wherein the second set of buckets comprises a second bucket size; identifying a second set of reference matches from the second candidate list, wherein the second set of reference matches satisfy a second threshold of hits; and identifying one or more candidate matches based on at least one of the identified first set of reference matches and the second set of reference matches.
However, He (title, abstract, figs. 1-2B) teaches wherein the first set of buckets comprises a first bucket size (He ¶0035 bucket i; equations 2-3);
generating a second candidate list (He ¶0023 discloses fig. 1 depicts an example extraction of candidate fingerprint bits from media content using multiple types of features and multiple types of projections. A training set D contains both reference video, e.g., an original or unmodified instance of audio and/or video content, and modified versions of the reference video. For each fingerprint codeword that is derived from the reference content, a corresponding fingerprint codeword is derived from the modified content. ¶0027 discloses each feature onto multiple sets of projection matrices and generates candidate fingerprint bits. The candidate fingerprint bits are generated with quantization of each of the projected values. ¶0037 discloses given the robustness measure Rj(j = 1, 2, …n) for each of the n candidate projections, the selection of an optimal subset of projections I is computed as in equation 5), wherein generating the second candidate list comprises mapping a subset of the first set of reference matches that do not satisfy the first threshold of hits to a second set of buckets in the hash table (He ¶0027 discloses the example projection matrices can thus be considered as hash functions that map a feature vector to one or more fingerprint bits. In general, there can be other hash functions that map features to fingerprint bits. We use the term projection matrices interchangeably with hash functions to include any function that map features to fingerprint bits), wherein the second set of buckets comprises a second bucket size (He ¶0044 discloses similarly, I(j, Y) represents the chosen subset [subgroup] corresponding to A(j, Y). S(j, Y) represents a structure to store information that relates to or identifies which bucket  contains those points [i.e., bucket 1 or bucket 2]; table 1. ¶0050 discloses with a number K of bits per fingerprint codeword, the database has a total of 2K buckets [hash bins]; table 1).
Vogel and He are analogous art as they pertain to audio fingerprinting. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify query fingerprint (as taught by Vogel) to distribute among 2K buckets, e.g., has bins (as taught by He, ¶0035) for projection based hashing that balances robustness and sensitivity of media fingerprints is (He, ¶0014).
Also Han (title, abstract, figs. 1-12) teaches generating a second candidate list (Han fig. 11: steps 1140, 1150), wherein generating the second candidate list comprises mapping a subset of the first set of reference matches that do not satisfy the first threshold of hits to a second set of buckets in the hash table (Han ¶0030 discloses these ordered subsets 520, 530, and 540 may be stored in the database 115 within their respective hash tables 521,531, and 541, all of which may be associated with [e.g., assigned to, correlated with, or mapped to] the timestamp 550 for the segment 310. In example embodiments, a single hash table [e.g., hash table 541 that stores the ordered subset 540] and the timestamp 550 may be stored as a partial fingerprint 660 [fig. 6] of the segment 310. The partial fingerprint 660 may therefore function as an even more lightweight representation [e.g., compared to the fingerprint 560] of the segment 310. Such a very lightweight representation may be especially suitable [e.g., in real-time applications] for comparing with similarly generated partial fingerprints of segments of an audio data [e.g., in determining a likelihood that the audio data 300 matches other audio data]).
Vogel, He, and Han are analogous art as they pertain to audio fingerprinting. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the teaching of Vogel in view of He in light of the teachings of Han for comparing with similarly generated partial fingerprints of segments of an audio data (as taught by Han, ¶0030) to determine a likelihood that candidate audio data [e.g., an unidentified song submitted as a candidate to be identified] matches reference audio data [e.g., a known song] (Han, ¶0014).
And Scherf (title, abstract, figs. 1-8) teaches identifying a second set of reference matches from the second candidate list (Scherf fig. 1: blocks 110, 115 and fig. 2: block 230: difference comparison module; block 240: match module), wherein the second set of reference matches satisfy a second threshold of hits (Scherf fig. 4: step 430; ¶0044 discloses in order to match a fingerprint block of a query fingerprint to a fingerprint block of one or more reference fingerprints, the query module 210 compares the fingerprints until it locates one or more similar fingerprint blocks in the reference fingerprint database 117 [e.g., positions within in the 250 million sub-fingerprints where the bit error rate between fingerprints is minimal or below a threshold value]. ¶0049 discloses the difference comparison module 220 may utilize an error comparison module 222 that is configured to compare the bit error rates between two or more versions of the known media content item. For example, the error comparison module 222 may be configured and/or programmed to calculate an average bit error rate between a query fingerprint and a reference fingerprint, identify an outlier bit error rate for the portion of the reference fingerprint by applying a median filter to the calculated average bit error rate, and determine the identified outlier bit error rate is above a threshold bit error rate associated with a difference between the versions of the known media content item that is associated with a word change between the versions of the known media content item); and
identifying one or more candidate matches based on at least one of the identified first set of reference matches and the second set of reference matches (Scherf ¶0048 discloses the query module 210 may perform the methods described herein and determine a match of an unknown media content item to two or more versions of a known media content item, such as a clean version and an explicit version of the known media content item. ¶0058 discloses the match module 230 is configured and/or programmed to match the at least one query fingerprint to a subset of the first reference fingerprint and a subset of the second reference fingerprint associated with the portion of the known media content item that differs between the first reference fingerprint and the second reference fingerprint, and identify the unknown media content item based on a match between the at least one query fingerprint and one of the first reference fingerprint and the second reference fingerprint. ¶0059 discloses the match module 230 may determine that one version of a known media content item is of a better quality than other versions of the known media content item, and match the unknown media content item to the better quality version. For example, the match module 230 may access information determined by the difference comparison module 220 and select a version to match to the unknown media content item based on the information indicating the selected version is a high quality version of the known media content item, and therefore a high quality and matching version of the unknown media content item).
Vogel, He, Han, and Scherf are analogous art as they pertain to audio fingerprinting. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the teaching of Vogel in view of He and Han in light of the teachings of Scherf to determine that a result of the query identifies at least two versions of a known media content item (as taught by Scherf, ¶0043) to match at least one query fingerprint to a subset of the first reference fingerprint and a subset of the second reference fingerprint associated with the portion of the known media content item that differs between the first reference fingerprint and the second reference fingerprint, and identify the unknown media content item based on a match between the at least one query fingerprint and one of the first reference fingerprint and the second reference fingerprint (Scherf, ¶0014).

Regarding Claim 2, Vogel in view of He, Han, and Scherf discloses the tangible, non-transitory computer readable storage medium of claim 1. But Vogel in view of He may not explicitly disclose wherein the reference audio data values comprise one or more reference energy values.
However, Han (title, abstract, figs. 1-12) teaches wherein the reference audio data values comprise one or more reference energy values (Han ¶0038 discloses in operation 810, the vector module 220 multiplies each energy value in the spectral representation 320 by a corresponding weight factor. The weight factor for an energy value may be determined based on a position [ordinal position] of the energy value's corresponding frequency [e.g., frequency bin] within a set of frequencies represented in the spectral representation 320. With respect to fig. 3 [¶0025], the position of the frequency for an energy value may be expressed as a frequency bin number. For example, the vector module 220 may multiply each energy value by its frequency bin number [e.g., 1 for Frequency Bin 1, or 1982 for Frequency Bin 1982]. As another example, the vector module 220 may multiply each energy value by the square root of its frequency bin number [e.g., 1 for Frequency Bin 1, or sqrt (1982) for Frequency Bin 1982]. ¶0046 discloses figs. 9-10 illustrates operations in determining a likelihood of a match between reference audio data 910 and candidate audio data 920. As noted above, the audio processing machine 110 may form all or part of an audio identification system and may be configured to determine a likelihood that the candidate audio data 920 [e.g., an unidentified song] matches the reference audio data 910 [e.g., a known song]. In some example embodiments, however, one or more of the devices 130 and 150 is configured to perform such operations. Fig. 9 illustrates an example of determining a high likelihood that the candidate audio data 920 matches the reference audio data 910, while fig. 10 illustrates an example of a low likelihood that the candidate audio data 920 matches the reference audio data 910).
Vogel, He, and Han are analogous art as they pertain to audio fingerprinting. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify the teaching of Vogel in view of He in light of the teachings of Han for comparing with similarly generated partial fingerprints of segments of an audio data (as taught by Han, ¶0030) to determine a likelihood that candidate audio data [e.g., an unidentified song submitted as a candidate to be identified] matches reference audio data [e.g., a known song] (Han, ¶0014).

Regarding Claim 3, Vogel in view of He, Han, and Scherf discloses the tangible, non-transitory computer readable storage medium of claim 1. But Vogel may not explicitly disclose wherein the second bucket size is different than the first bucket size.
However, He (title, abstract, figs. 1-2B) teaches wherein the second bucket size is different than the first bucket size (He ¶0035 - bucket i; equations 2-3 [i.e., more than one bucket]. ¶0044 discloses similarly, I(j, Y) represents the chosen subset [subgroup] corresponding to A(j, Y). S(j, Y) represents a structure to store information that relates to or identifies which bucket  contains those points [i.e., bucket 1 or bucket 2]; table 1. ¶0050 discloses with a number K of bits per fingerprint codeword, the database has a total of 2K buckets [hash bins]).
Vogel and He are analogous art as they pertain to audio fingerprinting. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify query fingerprint (as taught by Vogel) to distribute among 2K buckets, e.g., has bins (as taught by He, ¶0035) for projection based hashing that balances robustness and sensitivity of media fingerprints is (He, ¶0014).

Regarding Claim 4, Vogel in view of He, Han, and Scherf discloses the tangible, non-transitory computer readable storage medium of claim 3. But Vogel may not explicitly disclose wherein the second bucket size is greater than the first bucket size.
However, He (title, abstract, figs. 1-2B) teaches wherein the second bucket size is greater than the first bucket size (He ¶0050 discloses the number of samples in a bucket i is represented herein with Ni. As Ni is effectively the number of collisions in bucket i, the probability p, of collision in bucket i is given by the equation).
Vogel and He are analogous art as they pertain to audio fingerprinting. Therefore it would have been obvious to someone of ordinary skill in the art before the effective filing date of the invention was made to modify query fingerprint (as taught by Vogel) to distribute among 2K buckets, e.g., has bins (as taught by He, ¶0035) for projection based hashing that balances robustness and sensitivity of media fingerprints is (He, ¶0014).

Regarding Claim 5, Vogel in view of He, Han, and Scherf discloses the tangible, non-transitory computer readable storage medium of claim 1,
wherein the first threshold of hits comprises a first quantity of hits (Vogel ¶0083 discloses if the difference between the mean and the value of the local maximum, (max-m) exceeds some predetermined threshold, the local maximum is retained. Otherwise, the local maximum is discarded. ¶0111 discloses once the hits counter reaches a predetermined threshold for a candidate match, the (unique ID, matchTimeOffset) is added to the list of candidate matches, where "matchTimeOffset" is the time offset within the known recording associated with the unique ID where the hash values were found to match), and
wherein identifying the one or more candidate matches is based on the first set of reference matches having the first quantity of hits that satisfy the threshold (Vogel ¶0060 discloses the server 205 uses a recognition engine 210 that attempts to match the audio fingerprint to one or more audio fingerprints stored in the fingerprint database 140. ¶0111 discloses the density hash table is used to keep track, for each candidate match, the number of hash values generated from the query fingerprint that correspond to the candidate match. ¶0116 discloses a linear scan is then performed forward in the fingerprint array, comparing the features to the query fingerprint, stopping when the time index exceeds the end of the query fingerprint. A match score is computed based on the number of matching maxima found per unit time. If the score exceeds a preset threshold, a match is declared. Otherwise, the candidate match is pruned).

Regarding Claim 6, Vogel in view of He, Han, and Scherf discloses the tangible, non-transitory computer readable storage medium of claim 5,
wherein second threshold of hits comprises a second quantity of hits (Vogel ¶0110 discloses for each query fingerprint hash value h, occurring at time offset "tQueryhi" within the query fingerprint, the hash table stored in hash database 140-1 is used to look up the list of (unique ID, thi) pairs, whereas explained above, "unique ID" is an identifier for a known recording and " thi" is a time offset within the full-recording fingerprint. In other words, each (unique ID, thi) specifies that the hash value h, occurs in a recording with unique ID at a time offset thi within the recording. For each item in the list, a density hash key is computed as the concatenation of the values in the pair (unique ID, thi -tQueryh,) and a corresponding hits counter value is incremented. A temporary density hash table maps a key in the table, also referred to as a "density map key", to a hits counter value. Each time the same density map key is encountered, the corresponding hits counter value is incremented), and
wherein identifying the one or more candidate matches is based on the first set of reference matches having the first quantity of hits that satisfy the threshold and the second set of reference matches having the second quantity of hits that do not satisfy the threshold (Vogel ¶0118 discloses as described above, a match score is computed based on the number of matching maxima per unit time. In one exemplary implementation, this match score is computed by forming a matrix, query Matrix, from the query fingerprint, where the matrix is constructed such that element (i,j) in queryMatrix takes the value one for each maximum in the query fingerprint at frequency bin "i" and time offset "j". All other elements of query Matrix are equal to zero. The length of the query fingerprint queryLength, [e.g., in time slices], corresponds to time, queryTimeSeconds [e.g., in seconds] of audio).

Regarding Claim 7, Vogel in view of He, Han, and Scherf discloses the tangible, non-transitory computer readable storage medium of claim 1, wherein the set of operations further comprises:
comparing the one or more candidate matches to the first set of the reference matches and the second set of the reference matches to determine quantities of matching peaks (Vogel ¶0055 discloses in the first stage, the fingerprint of the audio sample, or "query fingerprint" is processed by using the hash function to generate "query hash values." The query hash values are compared against the set of hash values of the known recordings, or "known hash values" to locate a set of possible matches. The set of possible matches are referred to herein as "a candidate set." In the second stage, the query fingerprint is compared to the full-recording fingerprints corresponding to the candidate set. figs. 5: 508-518: Maxima [e.g., Peaks]); and
identify query media based on one of the one or more candidate matches having a highest quantity of matching peaks of the one or more candidate matches (Vogel ¶0077 discloses at block 508 the most prominent local maxima or "peaks" in the spectrogram are selected. For each element in the spectrogram, if its magnitude value is greater than the values of its surrounding neighbors, e.g., its surrounding eight neighbors, then it is chosen as a [local] maximum value. ¶0079 disclose at block 512 the audio fingerprint is constructed from the list of maxima. A set of the peaks over a particular time-interval, e.g., one second, are collected. In tum, an audio fingerprint for the entire audio recording is generated from the collected sets of peaks. The fewer peaks selected on average for a particular time-interval of audio, the more compact becomes the audio fingerprint).

Regarding Claim 8, Vogel in view of He, Han, and Scherf discloses the tangible, non-transitory computer readable storage medium of claim 7, wherein the set of operations further comprises:
transmitting instructions that cause presentation of the identified query media to a media device that accessed the query media (Vogel ¶0061 discloses after the fingerprint generation module generates an audio fingerprint 310, the client device 300 transmits the audio fingerprint 310 onto the network 360 and/or to a recognition server 350. ¶0062 discloses a query of the audio fingerprint 310 takes place on the recognition server 350 by matching the audio fingerprint 310 to one or more fingerprints stored in a fingerprint database [not shown] which is also accessible by the recognition server 350. Upon recognition of the audio fingerprint 310, the recognition server 350 transmits an identifier (ID) associated with the recording and metadata 340 via the network 360 to the client device 300).

Claims 9-20 are rejected for the same reasons as set forth in Claims 1-8.






Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOGESHKUMAR G PATEL whose telephone number is (571)272-3957. The examiner can normally be reached 7:30 AM-4 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at (571) 272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOGESHKUMAR PATEL/Primary Examiner, Art Unit 2691
Read full office action
Prosecution Timeline

May 29, 2024
Application Filed
Mar 23, 2026
Non-Final Rejection mailed — §103, §112 (current)
Precedent Cases

Applications granted by this same examiner with similar technology

18/417,674
Patent 12640163
Method and System for Identifying Similarity Between Two Audio Tracks
2y 4m to grant Granted May 26, 2026
18/367,316
Patent 12626711
HIGH-QUALITY VOICE SIGNAL PROCESSING DEVICE AND METHOD THROUGH REMOVAL OF AMBIENT NOISE BASED ON MULTI-SENSOR SIGNAL FUSION
2y 8m to grant Granted May 12, 2026
18/401,292
Patent 12610167
DIRECTIONAL BILATERAL SOUND INTAKE-BASED MIC ASSEMBLY AND ELECTRONIC DEVICE
2y 3m to grant Granted Apr 21, 2026
18/420,157
Patent 12598426
CHANGE OF A MODE FOR CAPTURING IMMERSIVE AUDIO
2y 2m to grant Granted Apr 07, 2026
18/534,033
Patent 12596525
METHOD TO DETERMINE INTENDED DIRECTION OF A VOCAL COMMAND AND TARGET FOR VOCAL INTERACTION
2y 4m to grant Granted Apr 07, 2026
Study what changed to get past this examiner. Based on 5 most recent grants.
Strategy Recommendation AI-generated — please review before filing

Get a prosecution strategy drawn from examiner precedents, rejection analysis, and claim mapping.
Typically takes 5-10 seconds — AI-generated, attorney review required before filing
Prosecution Projections

1-2
Expected OA Rounds
83%
Grant Probability
86%
With Interview (+3.3%)
2y 3m (~3m remaining)
Median Time to Grant
Low
PTA Risk
Based on 655 resolved cases by this examiner. Grant probability derived from career allowance rate.